Ethics Objective: Describe ethical considerations resulting from technological advances.
Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges
-
Upload
axel-bruns -
Category
Education
-
view
1.695 -
download
1
description
Transcript of Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges
![Page 1: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/1.jpg)
Mapping Australian User-Created Content: Methodological,
Technological and Ethical Challenges
Axel Bruns / Jean BurgessARC Centre of Excellence for Creative Industries and Innovation, [email protected] – @[email protected] – @jeanburgesshttp://mappingonlinepublics.net – http://cci.edu.au/
Thomas Nicolai / Lars KirchhoffSociomantic Labs, [email protected] / [email protected] http://sociomantic.com/
Image by campoalto
![Page 2: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/2.jpg)
Project: New Media and Public Communication
• ARC Discovery (2010-12) – A$410.000
• Axel Bruns (CI), Jean Burgess (SRF) – QUT, Brisbane
• Lars Kirchhoff, Thomas Nicolai (PIs) – Sociomantic Labs, Berlin
• Project blog: http://mappingonlinepublics.net/
Year 1 Year 2 Year 3
Social network sources:
· YouTube· Flickr· Twitter· blogs
Research tools:
· network crawler· content scraper· content analysis· network analysis
Research tool development and baseline data
Baseline information:
· data extraction· content creation
statistics· patterns in terms
and themes· baseline social
networking map· interconnections
between social network spaces
Content creation patterns
Changes over time:
· short-term statistics· regular / seasonal
patterns
Cluster profiling:
· common themes / patterns
· lead users
Focus on specific events
Cultural dynamics:
· rapid spread of new ideas
· communication across clusters
· thematic discourse analysis
· relationship with main- stream media coverage
![Page 3: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/3.jpg)
Methodology – Blogs
Analysis
Capture
Identification
Known Blogger
Population / Blog
Directories
Post Statistics and
Embedded Links
Patterns of Activity over
Time
Networks of Interlinkage (short/long
term)
Post Texts
Thematic Clusters and
Keyword Mapping
![Page 4: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/4.jpg)
Analysis – Blogs
•Volume over time
•Comparison across clusters
Patterns of Activity over Time
•Interlinkage between known blogs
•Outlinks to external sources
Networks of Interlinkage (short/long term)
•Keyword analysis by cluster
•Keyword co-occurrence maps
Thematic Clusters and Keyword Mapping
0
200
400
600
800
1000
1200
1400
1600
1800
2000
5.11.2007 12.11.2007 19.11.2007 26.11.2007 3.12.2007 10.12.2007 17.12.2007 24.12.2007 31.12.2007 7.01.2008 14.01.2008 21.01.2008
Posts
Outgoing Links
![Page 5: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/5.jpg)
Blog Network (between known blogs only)(~8500 blogs / 17 July to 25 Aug. 2010 / All page links / Node size: Indegree)
politics food
parenting
arts & crafts
design and style
![Page 6: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/6.jpg)
Methodology – Twitter
Analysis
Capture
Identification
Australian Twitter Users (by Location)
and Their Networks
Tweet Statistics and @Replies
Patterns of Activity over
Time
Networks of @Replies
(short/long term)
Tweet Texts
Keyword /Key PhraseMapping
![Page 7: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/7.jpg)
Analysis – Twitter
•Volume over time
•Keyword frequencies
Patterns of Activity over Time
•Conversation vs. follower network
•Dissemination of RTs vs. @replies
Networks of @Replies(short/long term)
•Keyword analysis over time
•Keyword co-occurrence maps
Keyword / Key Phrase Mapping
![Page 8: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/8.jpg)
Data Processing – Twitter
• Typical data structure (#ausvotes):
![Page 9: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/9.jpg)
Data Processing – Twitter
• Tools:
• Gawk – Scripting tool für CSV processing (open source)
• Excel – Data aggregation, pivot tables and charts
• Leximancer / WordStat – Keyword extraction, co-occurence matrices
• Gephi – Network analysis and visualisation (open source)
# Extract @replies for network visualisation## this script takes a CSV archive of tweets, and reworks it into network data for visualisation## expected data format:# text,to_user_id,from_user,id,from_user_id,iso_language_code,source,profile_image_url,geo_type, # geo_coordinates_0,geo_coordinates_1,created_at,time## output format:# from,to,tweet,time,timestamp## the script extracts @replies from tweets, and creates duplicates where multiple @replies are# present in the same tweet - e.g. the tweet "@one @two hello" from user @user results in# @user,@one,"@one @two hello" and @user,@two,"@one @two hello"## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]
BEGIN {print "from,to,tweet,time,timestamp"
}
/@([A-Za-z0-9_]+)/ {
a=0 do {
match(substr($1, a),/@([A-Za-z0-9_]+)?/,atArray)a=a+atArray[1, "start"]+atArray[1, "length"]
if (atArray[1] != 0) print $3 "," atArray[1] "," $1 "," $12 "," $13
} while(atArray[1, "start"] != 0)
}
# filter.awk - Filter list of tweets## this script takes a CSV or other list of tweets, and removes any lines that don't include RT @username# the script preserves the first line, expecting that it contains header information## script expects command-line argument search={searchcriteria} _before_ the input CSV filename# enclose the search term in quotation marks if it contains any special characters## e.g.: gawk -F , -f filter.awk search="(julia|gillard)" tweets.csv >filteredtweets.csv## expected data format:# CSV or simple list of tweets, line-by-line## output format:# same as above, listing only retweets## Released under Creative Commons (BY, NC, SA) by Axel Bruns - [email protected]
BEGIN { getline print $0
}
tolower($0) ~ search {
print $0
}
![Page 10: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/10.jpg)
#ausvotes: Overall Activity (17 July – 24 Aug. 2010)
![Page 11: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/11.jpg)
#ausvotes: Discussion Network17 July to 25 Aug. 2010 / All @replies / Node size: Indegree / Node colours: betweenness centrality)
![Page 12: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/12.jpg)
Keyword Co-Occurrence
![Page 13: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/13.jpg)
#ausvotes: Mentions of the Leaders (cumulative)
![Page 14: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/14.jpg)
#ausvotes: Key Themes
![Page 15: Mapping Australian User-Created Content: Methodological, Technological and Ethical Challenges](https://reader036.fdocuments.net/reader036/viewer/2022081602/554edec5b4c905911d8b4a99/html5/thumbnails/15.jpg)
Challenges
• Twapperkeeper relies on #hashtags
• Problem if #hashtags are inconsistent/unclear
• Follow-on @replies and retweets may not continue to use #hashtags
• May miss early developments – e.g. #hashtag standardisation
• Need to look at overall user activity / Twitter firehose for more comprehensive picture
• Need to track baseline activity to understand how exceptional acute events are
• Ethical considerations:
• Using only publicly available data (no protected tweets, no firewalled blogs)
• But technical publicness not enough – ‘publicly available’ ≠ ‘meant to be public’
• No easy answers – #hashtags probably indicate intention to be public, but may not
• Need to consider data storage and publication carefully, too
• See more at mappingonlinepublics.net – up next: time-based animations...
• Or find us at @snurb_dot_info and @jeanburgess