Data Pre-processing of Web usage Mining-A Vital Stage in Information Retrieval

Author(s):  G. Rajakumari, N. Venkatesan

Abstract:   The World Wide Web has an ocean of information helping every feature of our lives in modern era. Having enormous data, mining the right information becomes a challenging task. Web mining uses many data mining techniques but it is not an application of usual data mining due to heterogeneity and shapeless nature of the data on web. Web mining tasks can be characterized into three types: Web Content Mining, Web Structure Mining and Web Usage Mining. The goal of Web Usage Mining is to capture, model and analyse the behavioural patterns and profiles of users interacting with a Website. Web Usage Mining consists of many stages but our study focuses on first stage i.e., Data Pre-processing and its techniques for an efficient information retrieval mechanism. Data Pre-processing consists of data cleaning, page view identification, sessionization, data integration and data transformation.