Clickstream Analysis Using Hadoop
Transcript of Clickstream Analysis Using Hadoop
-
7/25/2019 Clickstream Analysis Using Hadoop
1/16
Click Stream Analysis usingHadoop
-
7/25/2019 Clickstream Analysis Using Hadoop
2/16
ClickStream Analysis
Click Stream are records of users interaction with
a website or other compute application. Eachrow of the Click Stream contains a timestamp andan indication of what the user did. Every click orother action is loggedhence the term ClickStream!. "his is useful when the website doesdi#erent things for di#erent users$ such as postrecommendations.
-
7/25/2019 Clickstream Analysis Using Hadoop
3/16
%ata
%ata is obtained from the site in the form of clickstream records. Each record consists of thedetails of clicks by the visitors and each recordcontains the following details&
Server '(
Client '(
"ime stamp with %ate
)*+ visted
,o. of bytes transferred
Custom record-s
"he country of origin for a speci/c re0uest isidenti/ed using the '( address.
-
7/25/2019 Clickstream Analysis Using Hadoop
4/16
1ethodology
'n order to perform our algorithm on a larger data set$ wemust know big data and Hadoop.
2hat is 3ig data4
A dataset that has the three characteristics can be called bigdata&5
6olume 5 of order tera byte to peta byte
6ariety 5 Structured 5 *elational database
)nstructured 5 te7t$ pdf$ word$ image
Semistructured 5 7ml$ log /les
6elocity 5 How fast data comes -eg more than 8999 tweets
per second
-
7/25/2019 Clickstream Analysis Using Hadoop
5/16
"o handle such big amount of data best
solution is the distributed approach$where system can process all types ofdata in a distributed manner.
:oogles solution 5 %ivide the task$assign to many computers$ collectresult$ integrate and form /nal result."his is called the 1ap5*educe algorithm.
-
7/25/2019 Clickstream Analysis Using Hadoop
6/16
H%;S-Hadoop %istributed ;ile System
"his uses a block si13$ or ?@13 -recommended."hismay even use block si
-
7/25/2019 Clickstream Analysis Using Hadoop
7/16
Hadoop Architecture
Hadoop follows a master5slave architecture.
,ame,ode &"his acts as the master and so isthe most vital component of hadoop. "his is
the book keeper of the H%;S 5 how /les arebroken $ and which node contains those etc.
%ata,ode &"his act as slave and there aremany slave nodes in a hadoop cluster unlikenamenode i.e. oneDcluster. "his can directlyaccess local /le system and performreadDwrite operations
-
7/25/2019 Clickstream Analysis Using Hadoop
8/16
Secondary Name Node(SNN): 't takes
snapshot of the H%;S metadata atintervals and communicates withnamenode so as to minimi
-
7/25/2019 Clickstream Analysis Using Hadoop
9/16
Simulation *esult
"he obective of this simulation is to
collect Click Stream data of )SA:overnment websites which is high involume and velocity$ and store it foranalysis in a cost e#ective manner forenhanced insight and decision making
-
7/25/2019 Clickstream Analysis Using Hadoop
10/16
1ost Clicked 2ebsites
-
7/25/2019 Clickstream Analysis Using Hadoop
11/16
2ebsite 6isitedDCountry
-
7/25/2019 Clickstream Analysis Using Hadoop
12/16
2ebsite 6isitedD1onth
-
7/25/2019 Clickstream Analysis Using Hadoop
13/16
Conclusion
2e need to create our own mapper class andreduce class for clickstream analysis which can be
applied to business models.
-
7/25/2019 Clickstream Analysis Using Hadoop
14/16
;uture 2ork
pen"racker Clickstream Analysis "ool
o An interactive tool that lets you see all the visitors on
your site in real5time$ those both online and oFine.o Every visitor will be represented by an icon.
o 'f you click on any visitors icon$ you will see a graphicrepresentation of their clickstream. Gou will also seethat visitors pro/le$ which consists of their country oforigin$ their 'S($ technical specs$ the fre0uency of visitsthey have made to your site and search terms thatthey might have used.
o Gou will also know if they are a /rst5time visitor$ andview the details of their visit$ i.e. the times theyentered and left.
-
7/25/2019 Clickstream Analysis Using Hadoop
15/16
*eferences
http&DDhadoop.apache.orgD
http&DDwww.usa.govDAboutDdeveloper5resourcesD?usagovt.shtml
http&DDwww.cloudera.comDcontentDclouderaDenDaboutDhadoop5and5big5data.html
-
7/25/2019 Clickstream Analysis Using Hadoop
16/16
"hank Gou