What is mental illness? How do you define a mental disorder?
Mental Disorder Detection on Twitter
-
Upload
chun-hao-chang -
Category
Data & Analytics
-
view
1.423 -
download
0
Transcript of Mental Disorder Detection on Twitter
Mental Disorder Detection on Twitter: Bipolar Disorder and Borderline
Personality Disorder
National Tsing Hua UniversityDepartment of Information System and
Application
Advisor: Prof. Yi-Shin ChenStudent: Chun-Hao Chang
Introduction
18.1% people suffer from mental disorder in United States (*)
Using Social Network to research on Mental Disorder
National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml
Analyze
Challenges
How to efficiently collect the tweets of patients?
How to correctly detect mental disorder patients?
Related Works
Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))
Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder
Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)
Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis
Two important related works are as following:
Related Works
Quantifying Mental Health Signals in Twitter - John Hopkins University(Coppersmith, G., Dredze, M., & Harman, C. (2014))
Automatic collecting patients by matching: “I was diagnosed with X” in tweetsPrediction of 4 kinds of disorder
Predicting Depression via Social Media - Microsoft (M De Choudhury, M Gamon, S Counts, E Horvitz - ICWSM, 2013)
Collecting data from Amazon Turks and purchased Twitter data.Able to predict an user having depression disorder before a formal diagnosis
Two important related works are as following:
700 Million Tweets from Oct 2014 to Aug 2015
Only 8 Bipolar and 3 BPD patients are found
Background
Bipolar Disorder:
*Unstable and impulsive emotions
Cycling between Maniac and Depression episodes
Borderline Personality Disorder:
*Unstable and impulsive emotions
Impaired social interactions
Framework
Data Collecting
A community portals is a Twitter account which is followed by a lot of patients.
The community portals can be found by searching for the disorder on the Twitter website.
Keywords Filter
Manual Verification
Collect Followers (REST API)
Randomly Sample Users(Streaming API & REST API)
Manually Collect Community Portals
Tweets of Patients , Experts and Random Samples
Collect Tweets (REST API)
1
Data CollectingDownload Followers of Community Patients.
(5000 followers for each portal in this study)
Filter out suspicious patients from follower profiles by keywords:
BPD and Bipolar in this study
Manually label the users as patients, experts and non-related.
Keywords Filter
Manual Verification
Collect Followers (REST API)
Randomly Sample Users(Streaming API & REST API)
Manually Collect Community Portals
Tweets of Patients , Experts and Random Samples
Collect Tweets (REST API)
23
4
A BPD patient
A BPD Expert
Data Collecting Download Tweets by REST API
(3200 tweets at most, exclude retweets)
1. Randomly sample English spoken users by Twitter Streaming API
2. Download Tweets by REST API
(3200 tweets at most, exclude retweets)
Keywords Filter
Manual Verification
Collect Followers (REST API)
Randomly Sample Users(Streaming API & REST API)
Manually Collect Community Portals
Tweets of Patients , Experts and Random Samples
Collect Tweets (REST API)
65
Data Collected Group Users
Random Samples 823
Bipolar 798
BPD 427
Bipolar Experts 54
BPD Experts 42
We assume theses random sampled Twitter users and experts does not have Bipolar or BPD
Because prevalence of Bipolar is 2.6% and BPD is 1.6% (*) in United States. It shouldn’t seriously damage the predictive performance
National Insititute of Mental Helath: http://www.nimh.nih.gov/health/statistics/prevalence/index.shtml
Preprocessing
Sentiment 140 API
Emotion Classification API
Processed Data of Patients, Experts and Random Samples
Spam and Inactive User Filter
1. Tweets amount > 1002. Tweets contain hyperlink lower
than 50%
Positive Negtaive Neutral
Data after preprocessing
Group Users Tweets Averaged Tweets
Random Samples
548 796957 1454.3
Bipolar Patients
278 347774 1250.99
BPD Patients 203 225774 1112.19
Bipolar Experts 11 14056 1611.67
BPD Experts 9 19696 1790.55
Feature Extraction
1
TF-IDF Features
LIWC Features
Pattern of Life Features
TF-IDF Calculation
LIWC Counting
Polarity Extraction
Emotions Extraction
Age Gender Prediction
Social Behavior Extraction
Open vocabulary approach by
calculating unigram and bigram
Personal behaviors: Emotional Pattern,
Social Interactions and Profiles Data
64 Categories of Psychological
Dictionary
Feature Extraction :
Pattern of Life FeaturesProposed by Coppersmith et al. We further improve it as following :
1. Polarity: Positive and negative percentages, Positive and negative combos ratio, Flips ratios
2. Emotions: Percentage of eight emotions
3. Age and Gender: Inferred age and gender(*)
4. Social Interactions: Mentioning Rate, Frequent menting Counts, Unique Mentioning Counts
Schwartz, H. Andrew, et al. "Personality, gender, and age in the language of social media: The open-vocabulary approach." PloS one 8.9 (2013): e73791.
APA
Feature Extraction: Illustration of combos and Flips
3 min 900 min 18 min 15 min 800 min 200 min
1 Flips 3 Negative Combos
Flip Ratio = 1 / 7Negative Combo Ratio = 3 / 7
Time interval between tweets
Flip Time threshold: 30 minCombo Time threshold: 120 min
Classifiers Training and Evaluations
TF-IDF Models
Pattern of Life Models
LIWC Models
Random Forest Classifier Training
10-Fold Cross Validation Test
Selection Bias Test
Limited Data Test
Classifiers Training and Evaluations
TF-IDF Models
Pattern of Life Models
LIWC Models
Random Forest Classifier Training
10-Fold Cross Validation Test
Limited Data Test
Selection Bias Test
Shows relationship between precision and recall
Randomly split data into 10 chunks, 9 chunks for training and 1 chunks for testing. And calculate the precision and recall after multiple iteration
Evaluations on Bipolar: 10-fold Cross Validation
Area Under the Curve:
Pattern of Life
0.90
LIWC 0.91
TF-IDF 0.96
Evaluations on BPD: 10-fold Cross Validation
Area Under the Curve:
Pattern of Life
0.91
LIWC 0.90
TF-IDF 0.96
Classifiers Training and Evaluations
TF-IDF Models
Pattern of Life Models
LIWC Models
Random Forest Classifier Training
10-Fold Cross Validation Test
Selection Bias Test
Selection Bias Test
To see if model is predicting people having disorder or just talking about it
11 Bipolar experts 9 BPD experts as the testing data. It shows the tendency of classifiers mis-classified experts as patients
Selection Bias Test
TOP 10 Keywords from TF-IDF Classifier
Bipolar BPD
mentalhealth dbt
meds feeling
blog borderline
therapy helps
anxiety self harm
thoughts psychiatrist
feel better cpn
electroboyusa disorder
health bpdchat
bipolarblogger depression
TF-IDF Classifier has the tendency to detectpeople who are talking about disorder
Classifiers Training and Evaluations
TF-IDF Models
Pattern of Life Models
LIWC Models
Random Forest Classifier Training
10-Fold Cross Validation Test
Limited Data Test
Selection Bias Test
Reveals how precision changes when the tweets are limited.
Similar to 10-fold cross validation, but the testing data are extracted only from the latest K tweets
Evaluations on Bipolar: Limited Tweets Precision
Evaluations on BPD: Limited Tweets Precision
Conclusion:
How to efficiently collect the tweets data patients?
We proposed an efficient and accessible way to collect tweets of patients
How to correctly detect mental disorder patients?
We suggested that Pattern of Life Model gives high precision and low bias