HCS lab January 19, 2005Weblog research Weblogs for Research(ers) Anjo Anjewierden Human Computer...
-
date post
18-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of HCS lab January 19, 2005Weblog research Weblogs for Research(ers) Anjo Anjewierden Human Computer...
January 19, 2005 Weblog research
HCS lab
Weblogs for Research(ers)Anjo Anjewierden
Human Computer Studies laboratoryFaculty of Science
University of Amsterdamhttp://anjo.blogs.com
Many thanks to Lilia Efimova, Rogier Brussee, Robert de Hoog, Stephanie Hendrick
and the blogosphere in general
January 19, 2005 Weblog research
HCS lab
What is a weblog (1)?
• Most common descriptive definition: a weblog is– a personal journal,– updated regularly,– published on the internet; and– posts (entries) appear in reverse
chronological order
January 19, 2005 Weblog research
HCS lab
What is a weblog (2)?
• Weblogs are social as they encourage others to participate using two mechanisms:– Posts have an explicit point of reference
called a permalink– Permalinks make it possible for people to
link to each other’s posts: share and discuss– Readers, possibly without a weblog, are
invited to join as all posts have a comment link
January 19, 2005 Weblog research
HCS lab
Anatomy of Weblogs
• For example: my weblog
January 19, 2005 Weblog research
HCS lab
Weblog Research is about …
• Humans who share findings, thoughts, ideas and sometimes feelings in their weblogs
• Computers which make it possible to create weblogs, read weblogs, and to comment and to link
• Studies which analyse why and how people blog about what and to whom
• Laboratory: weblog researchers need a stable environment in which to conduct their research
January 19, 2005 Weblog research
HCS lab
Do we want to research weblogs …
• Blog (short for weblog, we-blog) was word of the year 2004 by Merriam Webster. To blog, blogger, blogging, blogosphere, etc.
• Communications of the ACM (CACM) carried a special issue on weblogs (December 2004)
• Unfiltered and Public For the first time we get access to a large body of material on a particular person, written by that same person
• Research relevance Social studies, Knowledge Management (for professional weblogs), education, linguistics … and even Semantic Blogging (combining Semantic Web and blogging) has been coined
• Compare Digital Cities research by Beckers / Van den Bersselaar (at SWI)
January 19, 2005 Weblog research
HCS lab
BlogTrace the Laboratory (1)
• Weblogs are represented as HTML pages– Complex layout, difficult to find the posts– Manual research is extremely labour
intensive– There is a serious lack of tools that support
weblog research
January 19, 2005 Weblog research
HCS lab
BlogTrace the Laboratory (2)
• BlogTrace spider makes data collection and research a lot easier– Automatically extracts posts from the HTML– Generates the link structure of the weblog
and represents it as RDF/OWL– Generates an RSS feed that contains all
posts for a weblog– Implemented using induction algorithms,
which learn what are posts and what is layout
January 19, 2005 Weblog research
HCS lab
Ontologies used in BlogTrace
• DC: Dublin core (names, dates, descriptions)• FOAF: Friend of a friend (documents, people)• RSS 1.0 (RDF): Really simple syndication
(representation of full posts)• Link ontology, for example a link (href in
HTML) becomes:– Link link:sourceDocument <http://…/>;– Link link:targetDocument <http://…/>;– Link link:anchorText “interesting site”;– Etc.
January 19, 2005 Weblog research
HCS lab
Weblogs can now be studied …
• Even using Semantic Web technology (RDF/OWL)
link:WeblogPostLink rdfs:subClassOf link:SimpleLink; rdfs:comment "A WeblogPostLink is a SimpleLink if and only if both the source and the target documents are weblog posts (RSS items)."; rdfs:label "WeblogPostLink"; owl:intersectionOf (link:SimpleLink [ a owl:Restriction;
owl:onProperty link:sourceDocument; owl:someValuesFrom rss:item ] [ a owl:Restriction; owl:onProperty link:targetDocument; owl:someValuesFrom rss:item ]).
link:WeblogPostLink rdfs:subClassOf link:Link; rdfs:comment "A WeblogPostLink is a Link if and only if
both the source and the target documents areweblog posts (RSS items)";
owl:intersectionOf (link:Link[ a owl:Restriction; owl:onProperty link:sourceDocument; owl:someValuesFrom rss:item
][ a owl:Restriction; owl:onProperty link:targetDocument; owl:someValuesFrom rss:item]).
January 19, 2005 Weblog research
HCS lab
Some Weblog Research Questions
• Weblog communities– Do they exist?– How can they be defined and found?– What is the social structure?– What are the conventions in the community?
• Text analysis of weblogs– What do people blog about (terms, topics)?– Do they share terminology?– Can personal conceptualisations be extracted?
• Conversations– Can linked weblog posts be seen as conversations?– Can we identify when there is a “knowledge flow”?
January 19, 2005 Weblog research
HCS lab
Implementations and Papers
• Weblog communities:– Visual Settlements– Graphically displays weblog community linkage based on a
“weblog is a city” metaphor– Community determined by “Virtual Settlements” paper
(Efimova & Hendrick, 2005)• Text analysis of weblogs:
– Sigmund (Anjewierden, Brussee & Efimova, 2004)– Co-occurrence based statistical algorithm that identifies
concepts and their relations for a weblog • Conversations:
– Knowledge flows (Anjewierden, De Hoog, Brussee & Efimova, 2005)
– Hypothesis: chance of a knowledge flow is greater when the sender and receiver share conceptualisations
January 19, 2005 Weblog research
HCS lab
Visual Settlements
• Idea– Can we compress a weblog to a single picture?– Such that we can use the picture to compare it to
other weblogs in a community– And, of course, learn something …
• Inspiration– Maps in general– Books by Edward Tufte on “Information Design”
• The Visual Display of Quantitative Information (1983)• Envisioning Information (1990)• Beautiful Evidence (2005; forthcoming)
– (Discovered Tufte by blog reading)
My blog as a Visual Settlement
Anatomy of Visual Settlements
Without links in the community (house)
I link to someone (I’m at work)
Someone links to me (I’m in the park)
Size: number of words in the post
Layout: if I link to earlier posts they are close
Time: early post in center, radiate outwards
January 19, 2005 Weblog research
HCS lab
Sigmund
• Idea– Using co-occurrence to determine whether terms are
related– Related terms might point to conceptualisations of
the blogger– And, these conceptualisations might be shared by
other bloggers
• Supported by– Tools that are part of my regular research on
methods to support ontology development from documents
– In particular: term extraction and named entity recognition
January 19, 2005 Weblog research
HCS lab
Making a Difference
• Idea– In a community of bloggers it is likely terminology is
shared– Finding the shared terms is interesting (see
Sigmund)– But a blogger is a person and not a web page– So, what makes them different?
• Implementation– Run Sigmund on all blogs in a community– Find terms that are common for a particular blog
and not common for others in the community– Example: Making a Difference post
January 19, 2005 Weblog research
HCS lab
Knowledge Flows
• Idea and Motivation– When bloggers link to a post of other bloggers– Could it be a “knowledge flow”?– Motivated by potential use as a knowledge
management tool
• Implementation– Use Sigmund’s co-occurrence algorithm– Term overlap in linked posts is the main metric– Make a distinction between shared and agreed
terms (used by both bloggers) and private terms (used by one of blogger)
January 19, 2005 Weblog research
HCS lab
Knowledge Flows
• Idea and Motivation– When bloggers link to a post of other bloggers– Could it be a “knowledge flow”?– Motivated by potential use as a knowledge
management tool
• Implementation– Use Sigmund’s co-occurrence algorithm– Term overlap in linked posts is the main metric– Make a distinction between shared and agreed
terms (used by both bloggers) and private terms (used by one of blogger)
January 19, 2005 Weblog research
HCS lab
Weblogs for Researchers
• Experiment (Metis project)– Six researchers (previously non-bloggers) started a
weblog to get hands-on experience– Two gave up rather early– One thinks about underpants when blogging– Three (includes myself) continue after the
experiment finished
• Evaluation– Posts are not emails (everybody can read them!)– Posts are not academic papers– Developing a blogging style (how and about what
you blog) is difficult and different for everybody
January 19, 2005 Weblog research
HCS lab
Conclusions (1)
• Blogging as a tool for researchers– Try it!– Works for me, both reading and writing– By sharing ideas on your blog, you may get
help!
January 19, 2005 Weblog research
HCS lab
Conclusions (2)
– Enormous amount of data (paradise for someone like me)
– Tempting to continue my own weblog research
– If others have better ideas than I have, and some do, I gladly return to my role as supporting others to do their weblog research