Explain web usage mining.

10 b] Explain web usage mining.

The phases are:

  1. Pre-processing – Converts the usage information collected from the various data sources into the data abstractions necessary
    for pattern discovery.
  2. Pattern discovery – Exploits methods and algorithms developed from fields, such as statistics, data mining, ML and pattern
    recognition.
  3. Pattern analysis – Filter outs uninteresting rules or patterns from the set found during the pattern discovery phase.
    Usage data are collected at server, client and proxy levels. The usage data collected at the different sources represent the
    navigation patterns of the overall web traffic. This includes single-user, multi-user, single-site access and multi-site access patterns.
1 Pre-processing

The common data mining techniques apply on the results of pre-processing using vector space model
Pre-processing is the data preparation task, which is required to identify:
(i)User through cookies, logins or URL information
(ii) Session of a single user using all the web pages of an application
(iii) Content from server logs to obtain state variables for each active session
(iv) Page references.
The subsequent phases of web usage mining are closely related to the smooth execution of data preparation task in pre-processing
phase. The process deals with (i) extracting of the data, (ii) finding the accuracy of data, (iii) putting the data together from different
sources, (iv) transforming the data into the required format and (iv) structure the data as per the input requirements of pattern
discovery algorithm.
Pre-processing involves several steps, such as data cleaning, feature extraction, feature reduction, user identification, session
identification, page identification, formatting and finally data summarization.

2 Pattern Discovery

The pre-processed data enable the application of knowledge extraction algorithms based on statistics, ML and data mining algorithms. Mining algorithms, such as path analysis, association rules, sequential patterns, clustering and classification enable effective processing of web usages. The choice of mining techniques depends on the requirement of the analyst. Pre-processed data of the web access logs transform into knowledge to uncover the potential patterns and are further provided to pattern analysis
phase.
Some of the techniques used for pattern discovery of web usage mining are:
Statistical techniques They are the most common methods which extract the knowledge about users. They perform different kinds
of descriptive statistical analysis (frequency, mean, median) on variables such as page views, viewing time and length of path for
navigational.
Statistical techniques enable discovering:
(i)The most frequently accessed pages
(ii) Average view time of a page or average length of a path through a site
(iii) Providing support for marketing decisions
Association rule The rules enable relating the pages, which are most often referenced together in a single server session. These
pages may not be directly connected to one another using the hyperlinks.
Other uses of association rule mining are:
(i)Reveal a correlation between users who visited a page containing similar information. For example, a user visited a web page related to admission in an undergraduate course to those who search an eBook related to any subject.
(ii) Provide recommendations to purchase other products. For example, recommend to user who visited a web page related to a book on data analytics, the books on ML and Big Data analytics also.
(iii) Provide help to web designers to restructure their websites.
(iv) Retrieve the documents in prior in order to reduce the access time when loading a page from a remote site.
Clustering
(i)is the technique that groups together a set of items having similar features. Clustering can be used to:
Establish groups of users showing similar browsing behaviors
(ii) Acquire customer sub-groups in e-commerce applications
(iii) Provide personalized web content to users
(iv) Discover groups of pages having related content. This information is valuable for search engines and web assistance
providers.

3 Pattern Analysis

The objective of pattern analysis is to filter out uninteresting rules or patterns from the rules, patterns or statistics obtained in the
pattern discovery phase.
The most common form of pattern analysis consists of:
(i) A knowledge query mechanism such as SQL
(ii) Another method is to load usage data into a data cube in order to perform Online Analytical Processing (OLAP) operations
(iii) Visualization techniques, such as graphing patterns or assigning the colors to different values, can often highlight overall
patterns or trends in the data
(iv) Content and structure information can filter out patterns containing pages of a certain usage type, content type or pages that
match a certain hyperlink structure.
Data cube enables visualizing data from different angles. For example, toys data visualization using category, colour and children preferences. Another example, news from category, such as sports, success stories, films or targeted readers (children, college
students, etc).
Self-Assessment Exercise linked to LO 9.Z

  1. Defineweb mining. Discussthe broad classifications of web mining and their applications.
  2. List the tasks in pre-processing of web contents.
  3. How are web-content mining tasks performed using machine learning algorithms?
  4. How are topic identification, tracking and drift analysis done?
  5. List and explain three phases of web-usage mining.
  6. Highlight the techniques used for pattern discovery in web-usage mining giving an example of each.

Leave a Reply

Your email address will not be published. Required fields are marked *