Computational Techniques for Public Health...
Transcript of Computational Techniques for Public Health...
![Page 1: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/1.jpg)
Computational Techniques
for Public Health Surveillance Scott H. Burton Ph.D. Dissertation Proposal Department of Computer Science Brigham Young University April 26, 2012
![Page 2: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/2.jpg)
Overview
• Problem overview
• Research area overview
▫ Health research in social media
▫ Data mining
Social network analysis
Collective classification
Text mining
• Dissertation proposal
![Page 3: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/3.jpg)
Health is Important
• U.S. 2010 total health expenditures:
▫ $2.6 trillion (17.9% of GDP)
• Millions of lives affected each year
National Health Expenditures 2010 Highlights.
http://www.cms.gov/NationalHealthExpendData/downloads/highlights.pdf
Image: http://health-ins.us/
![Page 4: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/4.jpg)
Public Health Surveillance
“Public health surveillance is the continuous, systematic collection, analysis and interpretation of health-related data needed for the planning, implementation, and evaluation of public health practice.”
– World Health Organization
• Epidemiology
• Health promotion
• Substance abuse prevention
• Public policy
World Health Organization
http://www.who.int/topics/public_health_surveillance/en/
![Page 5: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/5.jpg)
Traditional Methods
• Health Department Labs
• Focus Groups
• Questionnaires
• Clinical Trials
![Page 6: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/6.jpg)
Limitations of Traditional Methods
Traditional Methods
• Cost
• Delay
• Isolated individuals
• Reported vs. actual behavior
• Often small samples
![Page 7: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/7.jpg)
Social Media Opportunities
Traditional Methods Online Social Media
• Cost
• Delay
• Isolated individuals
• Reported vs. actual behavior
• Often small samples
• Inexpensive
• Real-time posting
• Near real-time analysis
• Relational data / social structures
• True feelings and behaviors
• Large samples
• Geo-located
• Reach under-represented countries and groups
![Page 8: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/8.jpg)
Computational Health Science
“Developing computational techniques to build systems or applications to understand and influence individual health
and measure relevant outcomes.”
Computer Science
Sociology Health Science
![Page 9: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/9.jpg)
The CHS Difference
• Community identification
• Data set size
• Relational classification
• Inductive models
• Text mining and automated analysis
![Page 10: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/10.jpg)
Search Query Monitoring
• Influenza outbreak detection
Polgreen, P., Chen, Y., Pennock, D., Nelson, F., and Weinstein, R.
Using Internet Searches for Influenza Surveillance
Clinical Infectious Diseases, 47(11):1443-1448, 2008.
![Page 11: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/11.jpg)
More Outbreak Detection
• Influenza outbreak detection (Ginsberg, et al.)
• 2009 H1N1 Influenza (Brownstein, et al.)
• Listeriosis (Wilson and Brownstein)
• Gastroenteritis and Chickenpox (Pelat, et al.)
Ginsberg, J., Mohebbi, M., Patel, R., Brammer, L., Smolinski, M., and Brilliant, L.
Detecting Influenza Epidemics using Search Engine Query Data.
Nature, 457(7232):1012-1014, 2008.
Brownstein, J. S., et al.
Information Technology and Global Surveillance of Cases of 2009 H1N1 Influenza
New England Journal of Medicine, 362(18):1731-1735, 2010.
Wilson, K. and Brownstein, J.
Early Detection of Disease Outbreaks using the Internet.
Canadian Medical Association Journal, 180(8):829, 2009.
Pelat, C., Turbelin, C., Bar-Hen, A., Flahault, A., and Valleron, A.
More Diseases Tracked by using Google Trends.
Emerging Infectious Diseases, 15(8):1327, 2009.
![Page 12: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/12.jpg)
Health on YouTube
• Immunizations (N=153) (Keelan, et al.)
• Tanning Bed Use (N=72) (Hossler and Conroy)
• Tobacco (N=50) (Freeman and Chapman)
• Stop Smoking (N=191) (Backinger, et al.)
Keelan, J., Pavri-Garcia, V., Tomlinson, G., and Wilson, K.
YouTube as a Source of Information on Immunization: A Content Analysis.
Journal of the American Medical Association, 298(21):2482, 2007.
Hossler, E. and Conroy, M.
YouTube as a Source of Information on Tanning Bed Use.
Archives of Dermatology, 144(10):1395{1396, 2008.
Freeman, B. and Chapman, S.
Is “YouTube” Telling or Selling you Something? Tobacco Content on the YouTube Video-sharing Website.
Tobacco Control, 16(3):207, 2007.
Backinger, C. L., Pilsner, A. M., Augustson, E. M., Frydl, A., Phillips, T., and Rowden, J.
YouTube as a Source of Quitting Smoking Information.
Tobacco Control, 20(2):119-122, 2011.
![Page 13: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/13.jpg)
Health on Facebook
• General Non-Communicable Disease Groups (N=757)
▫ Farmer, et al.
• Diabetes Groups (N=15)
▫ Greene, et al.
• Ethical Issues (N=202)
▫ Moubarak, et al.
Greene, J., Choudhry, N., Kilabuk, E., and Shrank, W.
Online Social Networking by Patients with Diabetes: A Qualitative Evaluation of Communication with Facebook.
Journal of General Internal Medicine, 26:287-292, 2011.
Moubarak, G., Guiot, A., Benhamou, Y., Benhamou, A., and Hariri, S.
Facebook Activity of Residents and Fellows and its Impact on the Doctor-Patient Relationship.
Journal of Medical Ethics, 37(2):101-104, 2011.
Farmer, A. D., Bruckner Holt, C. E. M., Cook, M. J., and D., H. S.
Social Networking Sites: A Novel Portal for Communication.
Postgraduate Medical Journal, 85:455-459, 2009.
![Page 14: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/14.jpg)
Health on Blogs
• Health-related Blogs (N=951)
▫ Miller and Pole
• Breastfeeding and Blogging (32 blogs, 354 posts, 881 comments)
▫ West et al.
Miller, E. and Pole, A.
Diagnosis Blog: Checking up on Health Blogs in the Blogosphere.
American Journal of Public Health, 100(8):1514-1519, 2010.
West, J., Hall, P., Hanson, C., Thackeray, R., Barnes, M., Neiger, B., and McIntyre, E.
Breastfeeding and Blogging: Exploring the Utility of Blogs to Promote Breastfeeding.
American Journal of Health Education, 42(2):106-115, 2011.
![Page 15: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/15.jpg)
Health on Twitter
• Dental Pain (N=772)
▫ Heaivilin, et al.
• Tobacco (N=5.9 million tweets, 5,000 tobacco-related)
▫ Prier, et al.
• Problem Drinking (N=5.5 million tweets, 21,000 alcohol-related)
▫ West et al.
Heaivilin, N., Gerbert, B., Page, J., and Gibbs, J.
Public Health Surveillance of Dental Pain via Twitter.
Journal of Dental Research, 90(9):1047-1051, 2011.
Prier, K. W., Smith, M. S., Giraud-Carrier, C., and Hanson, C. L.
Identifying Health-Related Topics on Twitter: An Exploration of Tobacco-related Tweets as a Test Topic.
In Proceedings of the 4th International Conference on Social Computing,
Behavioral-Cultural Modeling, and Prediction, pages 18-25. 2011.
West, J., Hall, P., Prier, K., Hanson, C., Giraud-Carrier, C., Neeley, S., Barnes, M.
Temporal Variability of Problem Drinking on Twitter
Open Journal of Preventive Medicine, 2(1):43-48. 2012.
![Page 16: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/16.jpg)
Geo-Location in Twitter
• Pew Institute reports:
▫ 14% of users said they used automatic GPS tagging
• In our study, the data said:
▫ 2.0% of Tweets
▫ 2.7% of unique users
K. Zickuhr and A. Smith.
28% of American Adults Use Mobile and Social Location-based Services.
http://pewinternet.org/~/media//Files/Reports/2011/PIP_Locationbased-services.pdf, 2011.
Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., and Barnes, M. D.
Right Time, Right Place Health Communication in Twitter: How Good Is Location Information?
In Submission.
![Page 17: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/17.jpg)
Tweets Around the World
Burton, S. H., Tanner, K. W., Giraud-Carrier, C. G., West, J. H., and Barnes, M. D.
Right Time, Right Place Health Communication in Twitter: How Good Is Location Information?
In Submission.
![Page 18: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/18.jpg)
Data Mining
• “the process of discovering interesting and useful patterns and relationships in large volumes of data” – Christopher Clifton
• Algorithms
▫ Supervised
▫ Unsupervised
• Types of data
▫ Tabular
▫ Relational
▫ Text
Clifton, C.
Encyclopedia Britannica: Data Mining
http://www.britannica.com/EBchecked/topic/1056150/data-mining
![Page 19: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/19.jpg)
Social Network Analysis
• Relational data
• Not just networks of “people”
Wasserman, S. and Faust, K.
Social Network Analysis: Methods and Applications. Cambridge University Press, 1994.
Scott, J.
Social Network Analysis: A Handbook. Sage Publications, Second Edition, 2000.
![Page 20: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/20.jpg)
Community Mining
• “Dense subnetwork within a larger network”
Newman, M. E. J.
Communities, Modules and Large-scale Structure in Networks.
Nature Physics, 8:25-31. 2012
![Page 21: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/21.jpg)
Community Mining Techniques
• Label Propagation
▫ Cordasco and Gargano
• Random Walks
▫ Rosvall and Bergstrom
• Rolling k-Cliques
▫ Palla et al.
Cordasco, G. and Gargano, L.
Community Detection via Semi-Synchronous Label Propagation Algorithms
IEEE International Workshop on Business Applications of Social Network Analysis, 2010
Rosvall, R. and Bergstrom, C. T.
Maps of Random Walks on Complex Networks Reveal Community Structure
Proceedings of the National Academy of Sciences 105(4):1118-1123. 2008
Palla, G., Dereneyi, I., Farkas, I., and Vicsek, T.
Uncovering the Overlapping Community Structure of Complex Networks in Nature and Society
Nature, 435(7043):814-818, 2005.
![Page 22: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/22.jpg)
Modularity
• Actual edges minus expected
• Undirected
• Requires complete graph
Newman, M. E. J. and Girvan, M.
Finding and evaluating community structure in networks.
Physical Review E, 69(2):026113, Feb 2004.
![Page 23: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/23.jpg)
Modularity Challenges
• Algorithm efficiency
• Varying sizes
• Overlapping
• Directed graphs
• Local discovery
![Page 24: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/24.jpg)
Directed Community Mining
• Lost information by ignoring direction
• Directed Modularity
▫ Leicht and Newman
• Random Walks
▫ Kim, et al.
Leicht, E. A. and Newman, M. E. J.
Community Structure in Directed Networks.
Physical Review Letters, 100(11):118703, 2008.
Kim, Y., Son, S.-W., Jeong, H.
Finding Communities in Directed Networks
Physical Review E, 81(1):016103, 2010.
![Page 25: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/25.jpg)
Clauset’s Local Modularity
• Steepness of boundary
• Greedily add nodes
Clauset, A.
Finding Local Community Structure in Networks.
Physical Review E, 72(2):026132, Aug 2005.
![Page 26: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/26.jpg)
Collective Classification
• “Typical” classification
▫ Internal attributes
• Relational classification
▫ Neighbor classes
• Collective classification
▫ Both
Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T.
Collective Classification in Network Data.
AI Magazine, 29(3):93, 2008.
Jensen, D., Neville, J., and Gallagher, B.
Why Collective Inference Improves Relational Classification.
In Proceedings of the International Conference on Knowledge Discovery and Data Mining, 2004.
![Page 27: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/27.jpg)
Inferring Properties from Friends
• Location
▫ Backstrom, et al.
• Private information (politics, religion, etc.)
▫ Lindamood, et al.
Backstrom, L., Sun, E., and Marlow, C.
Find Me if You Can: Improving Geographical Prediction with Social and Spatial Proximity.
In Proceedings of the 19th International World Wide Web Conference, pages 61-70. 2010.
Lindamood, J., Heatherly, R., Kantarcioglu, M., and Thuraisingham, B.
Inferring private information using social network data.
In Proceedings of the 18th International World Wide Web Conference, pages 1145-1146. 2009.
![Page 28: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/28.jpg)
Text Classification
• Different classes of documents
• Learn patterns from the words in each class
Sebastiani, F.
Machine Learning in Automated Text Categorization.
ACM Computing Surveys, 34(1):1-47, 2002.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
Lorem
ipsum
sit
doler.
Etc.
Etc.
![Page 29: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/29.jpg)
Text Classification Algorithms
• Naïve Bayes
▫ McCallum and Nigam, 1998
• k-Nearest Neighbor
▫ Yang, 1999
• Support Vector Machines
▫ Joachims, 1998
• Rule-learning
▫ Cohen and Singer, 1996
• Maximum Entropy
▫ Nigam, et al., 1999
![Page 30: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/30.jpg)
Topic Modeling
• Latent Dirichlet allocation (LDA)
▫ User chooses a topic (z)
▫ Given the topic, user chooses a word
Blei, D. M., Ng, A. Y., and Jordan, M. I.
Latent Dirichlet Allocation.
Journal of Machine Learning Research, 3:993-1022, March 2003.
![Page 31: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/31.jpg)
Labeled LDA
• Supervised LDA
• Incorporates a document label
Ramage, D., Hall, D., Nallapati, R., and Manning, C.
Labeled LDA: A Supervised Topic Model for Credit Attribution in Multi-Labeled Corpora
In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 248-256
![Page 32: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/32.jpg)
“Author-LDA”
• Small document challenges for LDA
• One approach:
▫ Combine all of an author’s tweets
Hong, L. and Davison, B.
Empirical Study of Topic Modeling in Twitter.
In Proceedings of the First Workshop on Social Media Analytics, pages 80-88. 2010.
Zhao, W., Jiang, J., Weng, J., He, J., Lim, E., Yan, H., and Li, X.
Comparing Twitter and Traditional Media using Topic Models.
In Proceedings of the 33rd European Conference on Advances in Information Retrieval, pages 338-349. 2011.
![Page 33: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/33.jpg)
Ailment Topic Aspect Model (ATAM)
• Looking for specific health ailments in Twitter
• For each ailment:
▫ General words
▫ Symptoms
▫ Treatments
Paul, M. and Dredze, M.
You are what you Tweet: Analyzing Twitter for Public Health.
In International AAAI Conference on Weblogs and Social Media (ICWSM), 2011.
![Page 34: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/34.jpg)
Identifying Questions in Micro-Text
• Survey (N=624), Questions characterization
▫ Morris, et al.
• I wonder, I’d like to know, etc.
▫ Efron and Winget
• Part of Speech Tagging
▫ Dent and Paul
Dent, K. and Paul, S.
Through the Twitter Glass: Detecting Questions in Micro-text.
In Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011.
Efron, M. and Winget, M.
Questions are Content: A Taxonomy of Questions in a Microblogging Environment.
In Proceedings of the American Society for Information Science and Technology, 47(1):1-10, 2010.
Morris, M. R., Teevan, J., and Panovich, K.
What do People Ask their Social Networks, and Why?: A Survey Study of Status Message Q&A Behavior.
In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI), pages 1739-1748, 2010.
![Page 35: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/35.jpg)
Questions in Twitter
• Finding questions
▫ Look for “?”
▫ Use Mechanical Turk service
• 1152 Questions
▫ 18% Response rate
Paul, S., Hong, L., and Chi, E.
Is Twitter a Good Place for Asking Questions? A Characterization Study.
In Proceedings of the 5th International Conference on Weblogs and Social Media, pages 578-581, 2011.
![Page 36: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/36.jpg)
Research Area Overview
• Health research in social media
• Data mining
▫ Social network analysis
▫ Collective classification
▫ Text mining
![Page 37: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/37.jpg)
Dissertation Proposal
• Develop and improve computational techniques to better enable public health surveillance in online social media
![Page 38: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/38.jpg)
Public Health Surveillance
in Social Media
Observe
Predict
Discover
![Page 39: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/39.jpg)
Social Media Space
Micro-blogs
Video-sharing
Full-length blogs
![Page 40: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/40.jpg)
Mining Communities
• People in their social structures
• Complete graph not feasible
• Direction matters
Observe
Predict
Discover
![Page 41: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/41.jpg)
Community Mining
• “Dense subnetwork within a larger network”
Newman, M. E. J.
Communities, Modules and Large-scale Structure in Networks.
Nature Physics 8:25-31. 2012
![Page 42: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/42.jpg)
Does Direction Really Matter?
![Page 43: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/43.jpg)
Does Direction Really Matter?
![Page 44: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/44.jpg)
Implications of Discovery
![Page 45: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/45.jpg)
Local, Directed Modularity
Complete Graph Local Discovery
Undirected Modularity • Newman and Girvan (2004)
Local Modularity • Clauset (2005)
Directed Directed Modularity • Leicht and Newman (2008)
Local, Directed Modularity
![Page 46: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/46.jpg)
Clauset’s Local Modularity
• Steepness of boundary
• Greedily add nodes
Clauset, A.
Finding Local Community Structure in Networks.
Physical Review E, 72(2):026132, Aug 2005.
![Page 47: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/47.jpg)
Degrees of Freedom
• Expanding new nodes
▫ Which outside nodes are considered?
• Calculation of local modularity
▫ Which edges to outside nodes count?
▫ Which edges to core nodes count?
![Page 48: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/48.jpg)
Conclusions
• Edge direction is important
• Algorithm extension requires assumptions
• Different assumptions lead to different communities
![Page 49: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/49.jpg)
Public Health Surveillance
in YouTube • What are people:
▫ Sharing?
▫ Seeing?
▫ Saying?
• Implications for communication
Observe
Predict
Discover
![Page 50: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/50.jpg)
YouTube Communities
• Users ▫ Friends
▫ Author – Subscribers
▫ Author – Commenters
▫ Co-commenters
• Videos ▫ Similar titles/keywords
▫ YouTube’s “related videos”
▫ Videos commented on by common users
▫ Videos “in-response-to” others
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
![Page 51: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/51.jpg)
Anti-smoking Communities in YouTube
• “Tobacco Free Florida – Kid Tossing Ball”
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
http://www.youtube.com/watch?v=Ow-D9gCp-UA
![Page 52: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/52.jpg)
Beam Search
• Quickly diverges to other topics
• Depth 4: as many sex-related videos as tobacco
Depth Unique Videos Smoking-related Sex-related
0 1 1 0
1 5 4 1
2 19 9 5
3 70 18 17
4 268 41 42
Total 363 73 65
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
![Page 53: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/53.jpg)
Multiple Sub-Community Expansion
(MSCE) Algorithm 1. Given initial start video
2. Build sub-community
a. Add video most increasing local modularity
b. Continue until no increase
3. Choose next start video based on:
a. Links to existing community
b. Keyword matching
4. Repeat 2-3, until sufficient community built
• Videos more related to the topic than Beam Search (70% vs. 20%)
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
![Page 54: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/54.jpg)
MSCE: Anti-Smoking Video Community
A
B
C
D
A. “Graphic Australian Anti-Smoking Ad”
▫ 2.5 million views
B. “How to quit smoking”
▫ Bridge between 3 sub-communities
C. Superhero sub-community
D. Superhero bridge videos
▫ “Star Wars Anti Smoking Ad”
▫ “Anti-Smoking : Superman versus Nick O’Teen (1981)”
![Page 55: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/55.jpg)
Sampling on YouTube
• Current work:
▫ Search terms
▫ First N results
▫ YouTube limit of 1,000
• Typical users don’t page through search lists
iProspect.com. iProspect Search Engine User Behavior.
Technical report, iProspect.com, Inc., 2006.
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
![Page 56: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/56.jpg)
Conclusions and
Public Health Implications
Conclusion Implication
Users leave health topics within a few clicks One chance to communicate message
Influential authors are involved in the community Simply posting a video is not sufficient
Users with affinities to the topic can be found Surveillance and communication is possible
Communities can be used for sampling Keyword-based approaches can be augmented
Burton, S., et al.
Public Health Community Mining in YouTube
In Proceedings of the ACM International Health Informatics Symposium, pages 81-90, 2012.
![Page 57: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/57.jpg)
Horizontal Health Communication
Abroms, L. and Lefebvre, R. C.
Obama's Wired Campaign: Lessons for Public Health Communication.
Journal of Health Communication, 14(5):415-423, 2009
1. Dissemination
2. Feedback
![Page 58: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/58.jpg)
Comparison of Communities
and Information Dissemination
• What health topics are dicussed?
• How do they spread?
Observe
Predict
Discover
![Page 59: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/59.jpg)
Public Health Surveillance
in the Blogosphere • Everyone is a publisher
• Link to other blogs
• Establish credibility
Image: http://datamining.typepad.com/gallery/blog-map-gallery.html
![Page 60: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/60.jpg)
Mommy-Blogs
• Mothers are highly influential in health decisions (Daniel 2009)
• Blog communities influence social norms (Wei 2004)
Daniel, K.
The Power of Mom in Communicating Health.
American Journal of Public Health, 99(12):2119, 2009.
Wei, C.
Formation of Norms in a Blog Community.
Into the Blogosphere: Rhetoric, Community, and Culture in Weblogs. 2004.
![Page 61: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/61.jpg)
Health Topics on Mommy-Blogs
• Community of 450 blogs
Topic Count Percent
Autism 113 0.34
CMV 1 0.00
Down Syndrome 31 0.09
FAS 2 0.01
SIDS 17 0.05
Pregnancy 1,008 3.01
All Entries 33,527 100.00
![Page 62: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/62.jpg)
Parallel Mommy-verses
• Build mommy-communities in Twitter and the Blogosphere
• Evaluate differences
▫ Network structure
▫ Health topics frequency
▫ Likelihood of reiterating
Image: http://www.psychedelicjunction.com/2011/04/what-are-parallel-universes.html
![Page 63: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/63.jpg)
Implications for Health Communication
• Know what is being said
• Identify influential users
▫ Popular/respected
▫ Bridge nodes
• How to best get messages “passed along”
![Page 64: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/64.jpg)
Surveillance of Health
Advice • Do people seek health advice?
• Are they receiving answers?
Observe
Predict
Discover
![Page 65: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/65.jpg)
Is Health Data Too Private?
• Would you post that online?
• Our hypothesis:
▫ People are asking questions and receiving answers
▫ More social capital = Better leverage for advice
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 66: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/66.jpg)
Benefits of Social Media
• No search result list
• Personalization
• Versatility
• Credibility
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 67: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/67.jpg)
Our Study
• Platform: Twitter
▫ Public data
• Health topic: Dental advice
▫ Everyone manages dental health
▫ Not too private
▫ Easy vocabulary
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 68: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/68.jpg)
Mining Dental Advice – Step 1
• Identify dental tweets
▫ Observe all tweets
▫ Filter by:
Tooth, teeth, dental, dentist, gums, molar, moler, floss, toothache
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
“Ugh I have the worst tooth ache every…#CantDeal” [sic.]
“I got a massive sweet tooth”
![Page 69: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/69.jpg)
Mining Dental Advice – Step 2
• Identify advice-seeking questions
▫ Look for: “anybody”, “anyone”, “any1” and “?”
▫ Human raters fine-tune
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
“Can anyone suggest some home remedies for a #toothache?”
“does anyone know how long it takes for swelling on your mouth to go
down after getting teeth out?”
![Page 70: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/70.jpg)
Mining Dental Advice – Step 3
• Identify answers
▫ Search for: @user-name
▫ Within 48 hours
▫ Verify “in-reply-to” original tweet
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
“@Dray_Z try gurgling with warmm salt water or put a tea bag btween
the ones that hurt” [sic.]
![Page 71: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/71.jpg)
Results
• 2 weeks of tweets
▫ 1 million dental tweets (74,000 per day)
▫ 2,035 likely advice seeking (anyone … ?)
▫ 432 genuine advice-seeking
▫ 140 (32%) received at least one response
▫ 5.5 minutes to response (median)
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 72: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/72.jpg)
Benefits of Social Capital
• More like to receive response
• Receive responses faster
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 73: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/73.jpg)
Who is Answering?
• Answers come from people you know
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
Relationship Percent
No relation 6.6
Responder following asker 93.0
Asker following responder 70.0
Mutual following and follower 69.5
![Page 74: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/74.jpg)
Conclusions
• People are seeking dental advice in Twitter
• Answers come frequently and quickly
• Users with more social capital are more likely to receive answers
Burton, S. H., Tanner, K. W., and Giraud-Carrier, C. G.
Leveraging Social Networks for Anytime-Anyplace Health Information.
In Submission.
![Page 75: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/75.jpg)
Predicting Substance Abuse
• Identifying Trends
▫ Content of tweets
▫ Social network
Observe
Predict
Discover
![Page 76: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/76.jpg)
Do People Tweet About That?
“So my family knows I smoke weed. The only one that doesn't really care or seem to concern is my pops” [sic.]
“if u dont like that i smoke weed then u dont like me... Weed is BIG part of my laugh. now pass me the blunt” [sic.]
“No wonder I smoke weed. Stupid people stress me out.”
![Page 77: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/77.jpg)
Mining Process
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 78: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/78.jpg)
Collecting Users
• Keyword filters
• Pilot study: “I smoke weed” (50 users)
▫ 36% - Definitely marijuana users
▫ 25% - Explicitly said it, but possible joking
▫ 19% - At least positive sentiment
▫ 78% - These three combined
• Non-marijuana users
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 79: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/79.jpg)
Building User Profiles
• Complete tweet history (up to 3200)
• Follower List
• Following List
• User-supplied description
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 80: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/80.jpg)
Feature Extraction
• Author-LDA
▫ “day today good time tonight happy”
▫ “real tho man gotta life twitter yo hit”
• Personal pronouns
▫ “My step-mom…”
▫ Bootstrap training set
• Traits from theoretical models
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
Hawkins, J., Catalano, R., and Miller, J.
Risk and Protective Factors for Alcohol and Other Drug Problems in Adolescence and
Early Adulthood: Implications for Substance Abuse Prevention.
Psychological Bulletin, 112(1):64, 1992.
![Page 81: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/81.jpg)
The Predictive Model
• Comprehensibility
• Collective classification
▫ Predict personal traits
▫ Predict traits of friends
▫ Weighted, directed edges
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 82: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/82.jpg)
Analysis and Validation
• Compare to theory
▫ “Risk and protective factors”
• Subjective validation
• Objective validation of easily-labeled traits
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 83: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/83.jpg)
Future Work
• Personalized communication
• Intervention
• Communication with family/friends
A. Collect Marijuana Users
B. Collect Non-Marijuana Users
C. Build User Profiles
D. Induce Predictive Model
E. Analyze Model
Intervention (Future Work)
F. Predict Likely Users
![Page 84: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/84.jpg)
Proposed Schedule
Sec. Topic Venue Target
2 Public Health Community Mining in YouTube ACM International Health Informatics
Symposium (IHI)
Published
4 Leveraging Social Networks for Anytime-
Anyplace Health Information
Network Modeling Analysis in Health
Informatics and Bioinformatics
(NetMAHIB)
In Submission
1 Local Community Mining in Directed Graphs Journal of Social Network Analysis and
Mining (SNAM)
June 2012
3 Mining the Spread of Health Content in
Social Media
International Conference on Social
Computing, Behavioral-Cultural
Modeling, and Prediction (SBP)
August 2012
5 Mining Social Media for Trends among
Substance Abusers
ACM Transactions of Knowledge
Discovery from Data (TKDD)
February 2013
![Page 85: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/85.jpg)
Contributions
• Computational techniques
▫ Local, directed community mining
▫ Community mining for sampling
▫ Mining rare and meaningful traits in short text
▫ Combination of text mining and social network
analysis for prediction
• Implications for Health Surveillance
▫ YouTube as a source of communities
▫ Health differences across platforms
▫ Health advice in social media
▫ Prediction of high risk individuals
Observe
Predict
Discover
![Page 86: Computational Techniques for Public Health …dml.cs.byu.edu/~sburton/presentations/2012-04_26...Topic Modeling •Latent Dirichlet allocation (LDA) User chooses a topic (z) Given](https://reader035.fdocuments.net/reader035/viewer/2022070815/5f0ec4937e708231d440d7b8/html5/thumbnails/86.jpg)
Questions