Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore [email protected].

25
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore [email protected]

Transcript of Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore [email protected].

Page 1: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Prof. Vishnuprasad NagadevaraIndian Institute of Management Bangalore

[email protected]

Page 2: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

DefinitionWeb Analytics as defined by Web Analytics Association : “ Web Analytics is the measurement, collection, analysis and reporting of

Internet data for the purposes of understanding and optimizing Web usage.”

Clickstream as defined by Internet Advertising Bureau (IAB) : “The electronic path a user takes while navigating from site to site, and from

page to page within a site. It is a comprehensive body of data describing the sequence of activity between a user’s browser and any other Internet resource, such as a Web site or third party ad server”

http://www.webanalyticsassociation.org/aboutus/

Page 3: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Information from Web Analytics

How many visitors visit the page daily? Who are the regular visitors? What percentage of the visitors to the page are registered users? What are the top pages that are visited on the web page? What is the average visit time on the website? How often does the visitor return to the site? What is the average page depth of a visitor? What is the geographic distribution of users of the website?

Web Analytics

PersonilizationSystem

ImprovementSite

ModificationBusiness

IntelligenceUsage

characteristics

Page 4: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Measures Clicks: The interaction between the user and the web server is

measured by the click of a mouse. Visits: The number of times a user visits a specific web site. Every

new session is counted as a new visit. Hits: Total number of server requests serviced by the server Exits: Site exits, counted by site inactivity for more than 30 minutes Unique Visitors: A Unique User who accesses the site in a specified

period of time. Repeated Visitor: The average number of times a user returns to a

site over a specific time period. Page views: The view of any page by the user. A page may contain

text, images, and other online elements and may be statically or dynamically generated and could contain single or multiple frames or screens.

Sessions: IAB defines it to be an “A sequence of Internet activity made by one user at one site. If a user makes no request from a site during a 30 minute period of time, the next content or ad request would then constitute the beginning of a new visit “

Unique authenticated visitors: A unique visitor who logs on to a site via a registration method using his/her user id and password.

Page 5: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

MetricsPage views per visit: Average number of page

views per visit.Page views per session: Average number of page

views per session. Page views per hour/day: Average number of

page views per hour/day.Clicks per session: Average number or clicks per

session.Clicks per hour: Average number of clicks per

hour.Time between clicks: The average duration of

time spent between two clicks.Hits per hour: Average number of hits to the web

server per hour.Busy hour of the day: The highest number of hits

to the web server in a particular hour of a day.

Page 6: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

IMPLEMENTING WEB ANALYTICS

Define your business objectivesDefine the KPIs that are important for your

business based on objectives and goals of business.

Identify the data that needs to be collected. Identify the process to collect the dataPrepare the data, analyze and interpret the

dataDesign and implement the plan of action Monitor the data for continuous feedback

Page 7: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Objectives of the StudyThe objectives of this study are to

Explore Web analytics and its usefulness to web based business.

Identify the techniques used in click stream analysis.

Identify the application of click stream analysis through analyzing click stream data obtained from a particular website using appropriate click stream analysis techniques.

Page 8: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

MethodologyThis study analyzes the click stream data obtained from a

web site, which specializes in an online information exchange service to facilitate identification of suitable partners, in India and other countries.

The site has a very different revenue model. The visitors are allowed to browse through the site without any initial payment. The visitors are allowed to look at the profiles of prospective partners free of charge. The visitors will have to become members by making a one-time payment only when they need to contact the prospective brides or grooms.

Users can search for profiles through advanced search options on the site on various preferences ranging from basic details of preferred partner to lifestyle, career, education, profession etc.

Page 9: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

MethodologyMembers can make initial contact with each other through

services available via Chat, SMS, and e-mail. Users can avail free registration on the website and are

assured of exclusive privacy and confidentiality. The website allows the users to create their profiles, search for other profiles, and express interest in other profiles and contact others. Registration and creating a profile is free of cost.

Registered users can become paid members that will allow them to contact others, view contact details of other members, write personalized messages, initiate chats and let other members view their contact details. Paid memberships are provided for a specified duration.

Page 10: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

MethodologyThe click stream data is analyzed to

identify different paths taken by the visitors and the sequence of pages that lead to payment of membership fee. Based on this analysis, specific strategies are recommended to maximize the revenue for the website.

Page 11: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

DATA PREPARATIONProblem : Format of data

Clickstream data files are neither delimited nor fixed length files

Solution: Used the date in the clickstream as the delimiter to import data to

database Have to perform string handling in database to separate out the fields

10.208.65.96 172.16.8.37, 124.124.35.130 - - [23/May/2008:00:00:00 -0400] "GET /billing/billing.php?user=&cid=22401528da14a61c43512fa025b59578i353273 HTTP/1.0" 200 183210.208.65.96 68.126.193.219 - - [23/May/2008:00:00:00 -0400] "GET /profile/js/common.js HTTP/1.1" 200 1246210.208.65.96 59.95.71.32 - - [23/May/2008:00:00:00 -0400] "GET /P/css/comm_style.css HTTP/1.1" 200 264010.208.65.96 122.163.70.145 - - [23/May/2008:00:00:00 -0400] "GET /P/search.php?checksum=&searchchecksum=16465054&j=300&newsearch=&inf_checksum=&castemapping=&crmback=&searchorder=T&label_select_no=&savesearch=&from_index=&viewall=&save_search_redirect=&hide_search_bar=y HTTP/1.1" 200 2156110.208.65.96 61.1.81.153 - - [23/May/2008:00:00:00 -0400] "GET /P/css/homestyle.css HTTP/1.1" 304 2610.208.65.96 68.197.236.117 - - [23/May/2008:00:00:00 -0400] "GET /profile/mainmenu.php?checksum=3590208069017f9d75933dfa9ac9005d|i|537f26ca181f05c308393257397ab261i2810388 HTTP/1.1" 200 333310.208.65.96 172.16.25.60, 59.145.189.43 - - [23/May/2008:00:00:00 -0400] "GET /P/css/homestyle.css HTTP/1.0" 304 2610.208.65.96 10.232.65.96, 10.232.49.1, 203.126.136.220 - - [23/May/2008:00:00:00 -0400] "GET /profile/mainmenu.php?checksum= HTTP/1.1" 200 3329

Page 12: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

DataData is obtained from the site in the form of click

stream records. Each record consists of the details of clicks by the visitors and each record contains the following details:Server IP Client IP Time stamp with Date Status: HTTP Status codeURL requested: has three subfields namely The request

method, resource requested and the protocol used No. of bytes transferred

The country of origin for a specific request is identified using the IP address.

Page 13: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

DataURL is used to identify the information/web page

browsed by the visitors. Time stamp of each click is used to sequence the

movement of the visitors across different pages in the website.

Identifying a unique user session is an important step in the analysis of click stream data. Inactivity for more than 30 minutes is considered as a break of session.

This is an approximation since there could be multiple users accessing from the same IP, or the same user accessing from different IPs.

Due to lack of more data available we consider hits from each unique IP as belonging to a unique user for a unique session.

Page 14: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

No of Sessions

DayNumber of

sessions

Number of

clicks

Day 1 23,440 460,211

Day 2 22,717 453,977

Day 3 24,694 461,518

Page 15: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 16: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 17: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 18: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Last action performed in a session

0

1000

2000

3000

4000

5000

6000

7000

Page 19: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 20: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 21: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.
Page 22: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

AssociationsConsequent Antecedent

1

Antecedent 2 Antecedent 3 Antecedent 4 Support

%

Confidence

%

Payment = T Photorequest

=T

memcomp=T 100 73.1

Payment = T Country =

India

Photorequest=

T

memcomp=T 80 73

Payment = T Login=T Photorequest=

T

memcomp=T 60 73

Payment = T ViewProfile=

T

Photorequest=

T

memcomp=T 90 72.8

Payment = T ViewProfile=

T

Login=T Photorequest=T memcomp=T 60 72.5

Payment = T Country =

India

ViewProfile=T Photorequest=T memcomp=T 70 71.4

Payment = T Mmshowmsg

= T

Photorequest=

T

memcomp=T 50 67.2

Payment = T ViewProfile=

T

Mmshowmsg

= T

Photorequest=T memcomp=T 50 66.4

Page 23: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Summary and ConclusionsUsage of the website by time of the day.

This will help busy hour identification, and provide information of the server capacity required for the website, and when maintenance window can be scheduled.

Usage of website from different geographic location. This can provide the data of the distribution of

users across geographical locationsExit screens

provide information on where the users exit from the website. This input can help redesign the webpage if it provides information on which pages are breaking the flow of the user session.

Page 24: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Summary and ConclusionsMost accessed and least accessed pages

This can be used for variable pricing of advertisings on the web page. This can also be used for better user interface design and space utilization, by removing or repositioning the links that are infrequently accessed.

Associations Provide information on unique actions on the website

and the sequence in which the user has performed these actions. This can be used in better user interface design.

Web diagrams Gives information on co-occurrence of actions on the

webpage and their significance – also provides inputs on user interface design.

Page 25: Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore nagadev@iimb.ernet.in.

Questions?Suggestions?Comments?