Blogs (web logs) contain online stamped entries

Click here to load reader

download Blogs (web logs) contain online stamped entries

of 31

  • date post

  • Category


  • view

  • download


Embed Size (px)


Implicit Structure and Dynamics of BlogSpace Eytan Adar, Li Zhang, Lada Adamic, & Rajan Lukose HP Labs, Palo Alto, CA. list of read blogs. date and time stamps. URL that is being commented on. via link. Blogs (web logs) contain online stamped entries. - PowerPoint PPT Presentation

Transcript of Blogs (web logs) contain online stamped entries

  • Implicit Structure and Dynamics of BlogSpace

    Eytan Adar, Li Zhang, Lada Adamic, & Rajan LukoseHP Labs, Palo Alto, CA

  • Blogs (weblogs) containonline stampedentries

  • Blogs: structure and transmission

    Blog use:Record real-world and virtual experiencesNote and discuss things seen on the net

    Blog structure: blog-to-blog linking

    Use + StructureGreat to track memes (catchy ideas)

    Patterns of information flowHow does the popularity of a topic evolve over time?Who is getting information from whom?

    Ranking algorithms that take advantage of transmission patterns

  • Related WorkLink prediction in social networks:Butts, C. Network Inference, Error, and Information (In)Accuracy: A Bayesian Approach, Social Networks, 25(2):103-140.Dombroski, M., P. Fischbeck, and K. Carley, An Empirically-Based Model for Network Estimation and Prediction, NAACSOS conference proceeding, Pittsburgh, PA, 2003.OMadadhain J., Smyth P., Adamic L., Learning Predictive Models for Link Formation, Sunbelt 2005 (hope you were there!)Getoor, L., N. Friedman, D. Koller, and B. Taskar, Learning Probabilistic Models of Link Structure, Journal of Machine Learning Research, vol. 3(2002), pp. 690-707.Adamic L., Adar E., Friends and neighbors on the Web, Social Networks, 2003.Kleinberg, J., and .D. Liben-Nowell, The Link Prediction Problem for Social Networks, in Proceedings of CIKM 03 (New Orleans, LA, November 2003), ACM Press.

    Blog ranking:Technorati, BlogPulse, Daypop

    Blog epidemic tracking:Blogdex at MIT media lab, Cameron Marlow, Sunbelt 2003BlogPulse

  • Intelliseeks BlogPulseService for tracking trends in the blogosphere:popular URLs, phrases, people

  • BlogPulse Data analyzed

    37,153 blogs

    Differential daily crawls (to find new posts) for May 2003Full page crawl for May 18, 2003 to capture blogrolls

    175,712 URLs occurring on > 2 blogs

  • PopularityTimeTracking popularity over time Blogdex, BlogPulse, etc. track the most popular links/phrases of the day

  • Election MapCartogramsMichael Gastner, Cosma Shalizi, and Mark Newman University of Michigan

  • PopularityTimeTracking popularity over time

  • Clustering information popularity profilesMay 2003

  • K-means clustering259 URLs in the sample satisfy criteriaTake normalized cumulative profilesall mentionsdayK-means minimizes the sum of the differences within each cluster

    4 clusters captured most of the differences

  • Different kinds of information have differentpopularity profilesProducts, etc. Major-news site (editorial content) back of the paper 51015510155101500. postingsFront-page news1234

  • Popularity profiles24681012141618202200.

    Cluster Profile# urlsexamples1Sharp peak on day 1 followed by fast decay38Slashdot postings2Day 1 peak followed by decay46Front page news3Day 2 peak followed by gradual decay51Editorial content,Sun java release4Sustained interest124iPod, iTunes, quizzila

  • Micro example: Giant Microbes

  • Microscale DynamicsWhat do we need track specific info epidemics?TimingsUnderlying network

    b1Time of infectiont0t1

  • Microscale DynamicsChallengesRoot may be unknownMultiple possible pathsUncrawled space, alternate media (email, voice)No links

    b1Time of infectiont0t1??bn

  • Microscale Dynamics who is getting info from whomVia Links (< 2 % of links, 50% within sample) unambiguous

    Multiple explicit links: which link is more likely

    No explicit links (70%) which implicit path is more likely

  • Link InferenceUse machine learning algorithms:

    A) Support Vector Machine (SVM)B) Logistic Regression

    What we can use

    Full text

    Blogs in common

    Links in common

    History of infectionBoingBoingWIRED

  • Percentage of blog pairs sharing at least one link

    link typesame dayA after BA before BA B17.4%24.5%24.5%

    A B10.9%22.9%17.0%

    A,B unlinked0.6%1.5%1.3%

  • Similarity in links between reciprocated, unreciprocated, and non-linked blog pairs

  • Blog ABlog B+Tinfection(Blog B) > Tinfection(Blog A)Blog ABlog B-Positive ExampleNegative ExampleInfectedUninfectedTraining on positive and negative examples of infection

  • Prediction resultsLink Inference:SVM 91% accuracyregression 92% accuracy (blog-blog links most predictive)

    Infection inference:SVM 71.5% accuracy:using blog and non-blog link similarity+ timing features(AbeforeB)/nA, (BbeforeA)/nA, (A same day B)/nA,,

    Regression:75% accuracy using only timing features

  • timeinferredactualuncrawled blogor media sourceSources of errorCoarseness and sparseness of timing data (1 day resolution)

    Mirror URLS (actually helps)Incomplete crawlsBAC

  • Visualizationby Eytan AdarGUESS tool (build your own, see demo @ 5:30!)Using GraphViz (by AT&T) layoutsSimple algorithmIf single, explicit link exists, draw it (add node if needed)Otherwise use ML algorithmPick the most likely explicit linkPick the most likely possible link

    Tool lets you zoom around space, control threshold, link types, etc.

  • Giant Microbes epidemic visualizationvia linkexplicit linkinferred linkblog

  • iRankFind early sources of good informationusing inferred information paths or timing b1b2b3b4b5bnTrue sourcePopular site

  • iRank AlgorithmDraw a weighted edge for all pairs of blogs that cite the same URLhigher weight for mentions closer togetherrun PageRankcontrol for spam

    Time of infectiont0t1

  • Do Bloggers Kill Kittens?02:00 AM Friday Mar. 05, 2004 PST Wired publishes: "Warning: Blogs Can Be Infectious.

    7:25 AM Friday Mar. 05, 2004 PST Slashdot posts: "Bloggers' Plagiarism Scientifically Proven"

    9:55 AM Friday Mar. 05, 2004 PST Metafilter announces "A good amount of bloggers are outright thieves."

  • For more infoInformation Dynamics Lab @ HP

    Blog Epidemic Analyzer

    Eytan, Li, Lada & Rajan

  • CNN: Wal-Mart banishes bawdy mags