Advances in information technology extend the use of the Internet in various areas such as
communication, research, education, financial transaction, real-time updates, online
booking, job search, blogging and shopping. This widespread use of the Internet has
turned the World Wide Web (WWW) into the largest information source where individuals
or organizations publish their ideas to people or end users through the Web. The Web is a
collection of documents, images, sounds, videos, and animations interrelated by hyperlinks
and accessed through Uniform Resource Identifiers (URIs). The Web is mostly semistructured
and/or unstructured data repository and data available on the Web is a huge,
diverse, and dynamic in nature .
Web mining is the application of data mining techniques to discover hidden or
interesting knowledge from WWW. Web mining is divided into three categories as
Web Usage Mining (WUM), Web Content Mining (WCM) and Web Structure Mining
(WSM). In Web content mining, the data used for the mining process is actually
present in the Web pages which conveys information to the users. The contents of Web
pages may be varied e.g. text, HTML, audio, video, images, etc. In Web structure
mining, the data used for the mining process is the organization of the Web pages connected
through hyperlinks. Various HTML tags are used to link one page to another page and one
website to another website.
In Web usage mining, the data used for the mining process is usage data of the website. The usage data reflects the usage of website which can be collected at various sources like Web servers, proxy server, and client browser with various attributes. The variations in the sources of data are due to the place where it has been collected. Web usage data is also referred to as Web log data or files.