Building and Managing Social Media Collections
-
Upload
jason-casden -
Category
Technology
-
view
1.232 -
download
0
Transcript of Building and Managing Social Media Collections
Building and Managing Social Media Collections
Laura Wrubel @liblauraJason Casden @cazzerson
Slides: http://j.mp/DLF_Social_Media
DLF ForumOctober 27, 2015
Outline
1. Introductions2. Tour of social media archives3. Ethical and legal discussion4. Questions for cultural heritage organizations5. Technical tools review6. Collecting workflows demo7. Wrap-up
Introductions
● Have you done any work related to social media archives?
● What are you hoping to get out of this workshop?
Social Media in Collections• 50% Social media data in collections, but not in
significant amounts
• 39% No social media in their collections
NCSU Social Media Archives Toolkit North Carolina C.H.O. survey
“I strongly believe in the relevance of this information because it is the "front lines" of movement development--this is where the important ideas and debates are happening. Traditional academic spaces are usually behind (it takes 1-3 years for articles and books to be published) and, again, they tend to bias in favor of whites, men, and long-standing leaders. Ignoring social media means ignoring marginalized voices and it thus provides an incomplete picture of the movement.”
NCSU Social Media Archives Toolkit Researcher Survey
Future value• 71% of surveyed researchers saw future value
in using social media as a source for research
• Only 51% of surveyed cultural heritage organizations thought it was likely that their institution would archive social media in the future
NCSU Social Media Archives Toolkit NCSU researcher survey
More representative collections
Cases
“Twitter has been a public and open communications platform since its beginning. Twitter is donating an archive of what it determines to be public. Private account information and deleted tweets will not be part of the archive. Linked information such as pictures and websites is not part of the archive, and the Library has no plans to collect the linked sites. There will be at least a six-month window between the original date of a tweet and its date of availability for research use.”
The Library and Twitter: An FAQ, April 28, 2010
Student life
Community
Donated accounts
Institutional record
“Archiving social media,” UK National Archives.http://blog.nationalarchives.gov.uk/blog/archiving-social-media/
WUSTL: Documenting Ferguson
GWU Researchers
● Media and public affairs faculty and graduate students researching how media outlets and journalists use Twitter, how members of Congress tweet
● International relations graduate students studying how ISIS tweets
● Freshman writing seminar student analyzing tweets with hashtag #YesAllWomen and #BringBackOurGirls
● Business graduate students studying social media use by Korean companies
What potential do you see for the archival and research value of social media?
Funding
● Institute for Museum and Library Services○ National Leadership Grants [ODU/Archive-It]○ Library Services and Technology Act (LSTA)
Grants [NCSU]○ Sparks Innovation Grants
● NEH / ODH - Digital Humanities Start-Up Grants [Univ. of Florida]
● National Historical Publications and Records Commission (NHPRC) [GWU]
● Council on East Asian Libraries (from Mellon) [JHU, GWU, Georgetown]
Ethical and legal issues
Ethical and legal discussion scenarios
Form a group of 2-3 people who have selected the same scenario as you.
1. What legal and ethical issues do you see arising in these scenarios?
2. What are some ways you might address and manage these issues?
Scenario #1
A researcher writing a book visits your library to use the university archives and study student activism related to the environment. The university archives has collected tweets by several university-sponsored environmental clubs and has around 5,000 tweets from eight clubs over two years. The researcher would like to use the social media collection as part of her research.
Scenario #2
A local person of prominence has donated their personal papers to your library’s archives. They also have exported their Facebook account data and would like to include that in their donation. This data includes their posts, messages, photos, and videos as well as all other information in the Facebook-supported account download feature.
Scenario #3
A faculty member is using Twitter as a discussion medium in her class on public policy. Students are asked to tweet as part of their class participation. The professor knows that your library is able to collect tweets and asks if you can help her in collecting tweets by her students for the purpose of class evaluation.
Scenario #4
Your university has a well-regarded political science department. To support faculty and students exploring the role of social media in elections, your library has been proactively collecting tweets by presidential candidates and tweets using particular hashtags during the presidential debates. The collection currently contains close to a million tweets over two years. A faculty member is researching differences in communication patterns by party and requests your dataset.
Developing a collecting program
“If we are to begin actively archiving and using social media content, plans need to be developed as to what we are saving and who social media portrays and how it portrays individuals and large communities.”NCSU Social Media Toolkit Researcher Survey
Collecting strategies
● Hashtags● Searches● Account targeting● Friend networks● Geolocation● Donations
Account spreadsheet
Hashtag calendar
Role of the institution
● How do we handle consent?● These items are ephemeral, but not unique, right?● How do we determine what to collect?● Are there special preservation considerations?
What is the item?
Web content?
Text?
“(・_・ヾ "Study the feasibility of a public space to house a permanent collection of UNC-Chapel Hill’s history" http://www.unc.edu/campus-updates/message-from-chancellor-folt-update-on-the-task-force-on-unc-chapel-hill-history/”
- @cazzerson
Images?
Associated content?
● Linked web pages● Responses● Videos and other media● Retweeting accounts● Engagement metrics
Vendor API responses?
{ contributors: null, truncated: false, text: "We love it when artists like @cyndilauper speak up for our youth!
#EndYouthHomelessness u2013 On C-SPAN http://t.co/Gw17OHyTiO #edchat", in_reply_to_status_id: null, id: 524985632775741440, favorite_count: 7, source: "<a href='http://twitter.com' rel='nofollow'>Twitter Web Client</a>", retweeted: false, coordinates: null, entities: {
symbols: [ ], user_mentions: [
{ id: 74501824, indices: [
29, 41
], id_str: "74501824", screen_name: "cyndilauper", name: "Cyndi Lauper"
}
], hashtags: [
{ indices: [
66, 87
], text: "EndYouthHomelessness"
}, {}
], urls: [
{ url: "http://t.co/Gw17OHyTiO", indices: [
101, 123
], expanded_url: "http://cs.pn/1FCx6KY", display_url: "cs.pn/1FCx6KY"
} ]
},
in_reply_to_screen_name: null, in_reply_to_user_id: null, retweet_count: 7, id_str: "524985632775741440", favorited: false,
geo: null, in_reply_to_user_id_str: null, possibly_sensitive: false, lang: "en", created_at: "Wed Oct 22 18:08:23 +0000 2014", in_reply_to_status_id_str: null, place: null,
user: {
follow_request_sent: false, profile_use_background_image: false, profile_text_color: "333333", default_profile_image: false, id: 22789766, profile_background_image_url_https: "https://pbs.twimg.
com/profile_background_images/70908209/NYLono_MercerCo_LarchmontElem_182.jpg_twitter.jpg",
verified: true, profile_location: null, profile_image_url_https: "https://pbs.twimg.
com/profile_images/502152204040425472/eVCt0lz8_normal.jpeg", profile_sidebar_fill_color: "DDEEF6",
{ "data": { "type": "image", "users_in_photo": [{ "user": { "username": "kevin", "full_name": "Kevin S", "id": "3", "profile_picture": "..." }, "position": { "x": 0.315, "y": 0.9111 } }], "filter": "Walden", "tags": [], "comments": { "data": [{ "created_time": "1279332030", "text": "Love the sign here", "from": { "username": "mikeyk",
{ "created_time": "1279341004", "text": "Chilako taco", "from": { "username": "kevin", "full_name": "Kevin S", "id": "3", "profile_picture": "..." }, "id": "3" }], "count": 2 }, "caption": null, "likes": { "count": 1, "data": [{ "username": "mikeyk", "full_name": "Mikeyk", "id": "4", "profile_picture": "..." }] }, "link": "http://instagr.am/p/D/", "user": { "username": "kevin", "full_name": "Kevin S", "profile_picture": "...", "id": "3" }, "created_time": "1279340983", "images": { "low_resolution": { "url": "http://distillery.s3.amazonaws.com/media/2010/07/16/4de37e03aa4b4372843a7eb33fa41cad_6.jpg", "width": 306, "height": 306 },
Metadata from harvesting software
What is the container?
● Should we mix content from multiple platforms?
● How do we define container boundaries?● How do we describe containers?
What is the collection?
● To what extent are these artificial collections?● Should these materials be integrated into existing
collections?
Access policies
● Can we balance privacy and research value?● Can we provide research access while adhering
to the Terms of Service?○ “Hydration?”
● How do researchers browse materials?
Building research datasets
● Dataset stability and decay○ Snapshots○ Deletion
■ All Tweets will eventually be deleted● Reproducibility● Data sharing● Research area restrictions
Tools and methods
“Along with email, social media will probably provide the main source of information for researchers studying our current time. However, our institution just does not have the resources right now to collect and store the social media of other people or organizations.”
NCSU Social Media Archives Toolkit C.H.O. survey
What are your goals? ● create archival / special collections● support current faculty research ● support students with class projects
What data do you need?● current and going forward; recent or far past● metadata● images and other media referenced● comments, responses, conversation
What do you want to do? ● analyze, visualize● archive, locally accession● play back ● hydrate
What are your financial resources?
What are your technical resources?
Some of the many optionsCommercial
● Gnip● Texifter● Crimson Hexagon● Sysomos● Archive-It● ArchiveSocial● Radian6, Sprout Social,
HootSuite
Free / open source
● TAGS● NodeXL● IFTTT● R (twitteR)● twarc● Social Feed Manager
[Twitter, Tumblr*, Flickr*]● lentil [Instagram]● youtube-dl [YouTube]● MassMine* [Twitter, Tumblr]
*pre-release
Twitter Collection Options
Commercial:Bulk data purchase(Gnip, Texifter)
Commercial: Firehose access(Gnip)
Commercial:Value-added platform (ArchiveSocial, Texifter)
TAGS, NodeXL
twarc Social Feed Manager
Real-time data x x x x x x
Historical data x* x* past week
user data only, limited
Collect by user handle x x x x x x
Collect by filter / hashtag x x x x x x
Collect sample stream x x x x
High reliability (backfill and redundancy)
x x x
Built-in analysis or visualization tools
x x x
CSV export x x x x (user data)
Free x x x
Requires some local technical expertise
x depends x x
twarc
twarc-report visualizations with d3
twarc-report visualizations with d3
lentil
“The Shawu150 Project: Viewing DH from an HBCU,” Desiree Dighton.
Collecting demonstrationSocial Media Combine
Wrap-up
Bibliography“National Archives and Records Administration White Paper on Best Practices for the Capture of Social Media Records,” May 2013. http://www.archives.gov/records-mgmt/resources/socialmediacapture.pdf.
Beckles, Julian, Samuel Collins, Glenn Daniels, Natalie Demyan, Matthew Durington, Cara Heasley, and David Rico. “Tagging Culture: Building a Public Anthropology through Social Media.” Human Organization 72, no. 4 (December 1, 2013): 358–68.
boyd, danah, and Kate Crawford. “Critical Questions for Big Data.” Information, Communication & Society 15, no. 5 (June 1, 2012): 662–79. doi:10.1080/1369118X.2012.678878.
Bruns, Axel, and Tim Highfield. “POLITICAL NETWORKS ON TWITTER: Tweeting the Queensland State Election.” Information, Communication & Society 16, no. 5 (June 2013): 667–91. doi:10.1080/1369118X.2013.782328.
Casden, Jason and Brian Dietz (co-PI). Social Media Archives Toolkit. http://www.lib.ncsu.edu/social-media-archives-toolkit
Cohen, Dan. “Digital Ephemera and the Calculus of Importance.” Dan Cohen, May 17, 2010. http://www.dancohen.org/2010/05/17/digital-ephemera-and-the-calculus-of-importance/.
Collins, Samuel, Matthew Durington, Glenn Daniels, Natalie Demyan, David Rico, Julian Beckles, and Cara Heasley. “Tagging Culture: Building a Public Anthropology through Social Media.” Human Organization 72, no. 4 (November 13, 2013): 358–68. doi:10.17730/humo.72.4.v5x0205248427516.
Dash, Anil. “What Is Public? — The Message.” Medium. Accessed August 12, 2014. https://medium.com/message/what-is-public-f33b16d780f9.
Dixon, Kitsy. “Feminist Online Identity: Analyzing the Presence of Hashtag Feminism.” Journal of Arts & Humanities 3, no. 7 (2014): 34–40.
“Ethical Decision-Making and Internet Research Recommendations from the AoIR Ethics Working Committee (Version 2.0),” 2012. http://aoir.org/reports/ethics2.pdf
Halegoua, Germaine R., and Raz Schwartz. “The Spatial Self: Location-Based Identity Performance on Social Media.” New Media & Society, April 9, 2014, 1–18. doi:10.1177/1461444814531364.
Jules, Bergis. “Documenting the Now: #Ferguson in the Archives — On Archivy.” Medium, April 8, 2015. https://medium.com/on-archivy/documenting-the-now-ferguson-in-the-archives-adcdbe1d5788.
Lomborg, Stine. “Personal Internet Archives and Ethics.” Research Ethics 9, no. 20 (2013). doi:10.1177/1747016112459450.
Marshall, Catherine C. “Rethinking Personal Digital Archiving, Part 1: Four Challenges from the Field.” D-Lib Magazine, April 2008. http://www.dlib.org/dlib/march08/marshall/03marshall-pt1.html#Top.
Nathan, Lisa P., and Elizabeth Shaffer. “Preserving Social Media: Opening a Multi-Disciplinary Dialogue.” UNESCO, n.d. http://www.unesco.org/new/fileadmin/MULTIMEDIA/HQ/CI/CI/pdf/mow/VC_Nathan_Shaffer_27_B_1140.pdf.
Rivero, Enrique. “Twitter ‘Big Data’ Can Be Used to Monitor HIV and Drug-Related Behavior, UCLA Study Shows.” UCLA Newsroom, February 26, 2014. http://newsroom.ucla.edu/portal/ucla/twitter-big-data-can-be-used-to-250162.aspx.
Storrar, Tom. “Archiving Social Media.” The National Archives, May 8, 2014. http://blog.nationalarchives.gov.uk/blog/archiving-social-media/.
Summers, Ed. “An Invitation to Study Ferguson — On Archivy.” Medium, December 3, 2014. https://medium.com/on-archivy/an-invitation-to-study-ferguson-367b423cff29.
Tufekci, Zeynep. “Big Questions for Social Media Big Data: Representativeness, Validity and Other Methodological Pitfalls.” arXiv:1403.7400 [physics], March 28, 2014. http://arxiv.org/abs/1403.7400.
Zimmer, Michael, and Nicholas John Proferes. “A Topology of Twitter Research: Disciplines, Methods, and Ethics.” Aslib Journal of Information Management 66, no. 3 (2014): 250–61.
Zimmer, Michael. “The Twitter Archive at the Library of Congress: Challenges for Information Practice and Information Policy.” First Monday 20, no. 7 (June 21, 2015). http://firstmonday.org/ojs/index.php/fm/article/view/5619.
Social Feed Manager is supported by the National Historical Publications & Records Commission
Grant NAR14-DI-50017-14 (2014-2017)