Toward0 MacroFInsights0 for0 Suicide0 PrevenLon:0 ... · RIT and NSF award SES-1111016. ... video,...

1
Suicide is among the leading causes of death in the United States. Two timely challenges to suicide prevention are that diagnosis is often based on unreliable self-reports and that it is difficult to understand warning signs. We focus on distress levels, and take steps toward its automated detection using linguistic data on Twitter. We use an annotated dataset, supervised machine learning, and topic modeling to model distress, a key risk factor for suicide ideation. The results indicate that the expertise level of annotators is key. ABSTRACT INTRODUCTION METHODS AND RESULTS CONCLUSIONS We showed that methods used to automatically collect data, as well as the expertise level of the annotator affect the performance of automatic systems. In general, our study and its results—from ltering via data annotation to classi cation—con rmed the critical importance of bringing clinical expertise into the computational modeling loop. It is essential to consider the ethical issues regarding the use of publicly available social media data. REFERENCES [1] H.C. Kung, D. L. Hoyert, J. Xu, and S. L. Murphy, “Deaths: nal data for 2005.,” Natl Vital Stat Rep, vol. 56, no. 10, pp. 1–120, 2008. [2] Jared Jashinsky, Scott H Burton, Carl L Hanson, Josh West, Christophe Giraud-Carrier, Michael D Barnes, and Trenton Argyle. 2013. Tracking suicide risk factors through Twitter in the US. Crisis, pages 1–9. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, Mar. 2003. [4] Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (September 1995), 273-297. ACKNOWLEDGEMENTS This work was sponsored by a Kodak Endowed Chair grant from the Golisano College of Computing and Information Sciences at RIT and NSF award SES-1111016. ~ 1 million people worldwide die by suicide. It is a leading cause of death among individuals ages 14-25 [1] We take an initial step toward detection of suicidal risk factors through social media without reliance on self- reporting We consider the performance of annotators’ expertise in suicide prevention at annotating micro-blog data for the purpose of training text-based models for detecting distress, a known suicide risk factor Golisano College of Compu/ng and Informa/on Sciences # *; College of Liberal Arts^ Rochester Ins/tute of Technology Department of Psychiatry $ University of Rochester Medical Center Christopher Homan # , Ravdeep Johar*, Tong Liu*, Megan Lytle $ , Vincent Silenzio $ and Cecilla Ovesdo[er Alm^ [email protected] # | rsj7209, [email protected]* | Megan_Lytle, [email protected] $ | [email protected]^ Toward MacroInsights for Suicide PrevenLon: Analyzing FineGrained Distress at Scale The mental state of another individual is hard to discern, even under less noisy conditions. Observers’ understandings of distressmay differ drastically. Due to the overlap between internal and external expressions of anger, it is difficult to classify distress behavior without more contextual information. LIMITATIONS Summary statistics and thematic categories distributions of the collected dataset. Distribution of distress level annotations on the tweets annotated by Novices 1 and 2 (N=250, identical set) Distribution of distress level annotations from Novice 1 and Expert. Note these two datasets are disjoint (N=1000, respectively) Topic analysis on bigrams of tweets labeled as high distress vs. randomly selected tweets from the larger, unlabeled dataset. The high distress tweets clearly convey strong negative affect. MATERIALS Dataset Collected over a month from Twitter in May 2012 Only active users’ tweets were considered Flowchart for steps involved for going from a large micro-blog database to distress modeling Source tweets Number of tweets 2,535,706 Unique geo-active users 6,237 “Follows” relationships 102,739 “Friends” relationships 31,874 Filtered tweets Number of tweets 2,000 Unique users 1,467 Unique unigrams 1,714,167 Unique bigrams 9,246,715 Unique trigrams 13,061,142 Category distribution [2] LIWC sad 1,370 Depressive feeling 283 Suicide ideation 123 Depression symptoms 72 Self harm 67 Family violence/discord 47 Bullying 10 Gun ownership 10 Drug abuse 6 Impulsivity 6 Prior suicide attempts 2 Suicide around individual 2 Psychological disorders 2 0.5 0.6 0.4 Combinations Suicide risk factors keywords LIWC sad scores Cohen Kappa inter-annotator agreement between Novice 1 and 2 979 Date: XXXX -3: Dat man on maury is overreacting!! He juss doin dat cuz he on tv [-0:24:39] -2: @XXXX cedes!!! [-0:21:25] -1: Yesssss! Da weatherman was wronq no rainy ass prom days!! Yesss prom is 2day guys!!! Class of 2010! [-0:02:56] >>> @XXXX awwww thanks trae-trae <<< 1: Rt @XXXX: abt 2 hop in a kab to skool I wouldn’t dare spend over 2 dollars to get somewhere I dnt wanna be n da first place! [+0:00:57] 2: @XXXX yeaa [+0:03:59] 3: @XXXX wassup? [+0:05:28] Msg_id : XX [Distress: ND, LIWC Sad: No] Training Testing Precision Recall F-Measure Novice1 Novice1 0.53 0.63 0.58 Novice1 Expert 0.58 0.27 0.37 Expert Expert 0.59 0.71 0.64 Expert Novice1 0.34 0.85 0.48 N1+E N1+E 0.33 0.41 0.37 High Distress Random feel like, wanna cry, get hurt, miss 2, ima miss, win lose, tired everything, broke bitches, gun range, one person good morning, last night, happy birthday, look like, bout 2, can’t wait, video, know (cont), chris brown, jus got commit suicide, miss you!, miss baby, feel empty, committing suicide, tired living, sleep forever, lost phone, left alone, :( miss feel like, let know, make sure, bout go, time get, don’t get, wats good, . ., don’t want, jus saw hate job, feel sad, tummy hurts, lost friend, feel helpless, leave alone, don’t wanna, worst feeling, leave world, don’t let don’t know, let’s go, looks like, what’s good, go sleep, even tho, hell yea, new single, r u?, don’t wanna Performance of SVM-based classification. We combine LD and HD into a single class with respect to binary (distress vs. non-distress) classification. In each case, a held- out set of 100 randomly selected tweets compose the test set and the remaining 900 tweets from that annotator compose the training set. The last row shows when the two training sets (respectively, test sets) are combined into a single set of 1,800 (respectively, 200) tweets. Many of the risk factors for suicidal behavior may be linked to other expression of distress, such as aggression and interpersonal violence. Consistent with the stress diathesis model for suicidal behavior, aggression was an emerging theme that arose from the data. For example: DISCUSSION @XXXX I don’t feel sad 4 him. He gets pissed n says wat he wants then sends out fony apologies. @XXXX cuz he’s n a relationship with that horseface bitch & he lied 2 me & I feel so used & worthless now Tweets were annotated for four categories (H—happy, ND—no distress, LD—low distress, HD—high distress) Tweets do not only reveal internalized aggression (related to suicidal behavior) but also externalized aggression.

Transcript of Toward0 MacroFInsights0 for0 Suicide0 PrevenLon:0 ... · RIT and NSF award SES-1111016. ... video,...

Page 1: Toward0 MacroFInsights0 for0 Suicide0 PrevenLon:0 ... · RIT and NSF award SES-1111016. ... video, know (cont), ... • @XXXX cuz he’s n a relationship with that horseface bitch

QU ICK START ( con t . )

How to change the template color theme You can easily change the color theme of your poster by going to the DESIGN menu, click on COLORS, and choose the color theme of your choice. You can also create your own color theme. You can also manually change the color of your background by going to VIEW > SLIDE MASTER. After you finish working on the master be sure to go to VIEW > NORMAL to continue working on your poster.

How to add Text The template comes with a number of pre-formatted placeholders for headers and text blocks. You can add more blocks by copying and pasting the existing ones or by adding a text box from the HOME menu.

Text size

Adjust the size of your text based on how much content you have to present. The default template text offers a good starting point. Follow the conference requirements.

How to add Tables To add a table from scratch go to the INSERT menu and click on TABLE. A drop-down box will help you select rows and columns.

You can also copy and a paste a table from Word or another PowerPoint document. A pasted table may need to be re-formatted by RIGHT-CLICK > FORMAT SHAPE, TEXT BOX, Margins.

Graphs / Charts You can simply copy and paste charts and graphs from Excel or Word. Some reformatting may be required depending on how the original document has been created.

How to change the column configuration RIGHT-CLICK on the poster background and select LAYOUT to see the column options available for this template. The poster columns can also be customized on the Master. VIEW > MASTER.

How to remove the info bars

If you are working in PowerPoint for Windows and have finished your poster, save as PDF and the bars will not be included. You can also delete them by going to VIEW > MASTER. On the Mac adjust the Page-Setup to match the Page-Setup in PowerPoint before you create a PDF. You can also delete them from the Slide Master.

Save your work Save your template as a PowerPoint document. For printing, save as PowerPoint of “Print-quality” PDF.

Print your poster When you are ready to have your poster printed go online to PosterPresentations.com and click on the “Order Your Poster” button. Choose the poster type the best suits your needs and submit your order. If you submit a PowerPoint document you will be receiving a PDF proof for your approval prior to printing. If your order is placed and paid for before noon, Pacific, Monday through Friday, your order will ship out that same day. Next day, Second day, Third day, and Free Ground services are offered. Go to PosterPresentations.com for more information.

Student discounts are available on our Facebook page. Go to PosterPresentations.com and click on the FB icon.

©  2013  PosterPresenta/ons.com          2117  Fourth  Street  ,  Unit  C                            Berkeley  CA  94710          [email protected]  

(—THIS SIDEBAR DOES NOT PRINT—) DES I G N G U I DE

This PowerPoint 2007 template produces a 36”x48” presentation poster. You can use it to create your research poster and save valuable time placing titles, subtitles, text, and graphics. We provide a series of online tutorials that will guide you through the poster design process and answer your poster production questions. To view our template tutorials, go online to PosterPresentations.com and click on HELP DESK. When you are ready to print your poster, go online to PosterPresentations.com Need assistance? Call us at 1.510.649.3001

QU ICK START

Zoom in and out As you work on your poster zoom in and out to the level that is more comfortable to you.

Go to VIEW > ZOOM.

Title, Authors, and Affiliations Start designing your poster by adding the title, the names of the authors, and the affiliated institutions. You can type or paste text into the provided boxes. The template will automatically adjust the size of your text to fit the title box. You can manually override this feature and change the size of your text. TIP: The font size of your title should be bigger than your name(s) and institution name(s).

Adding Logos / Seals Most often, logos are added on each side of the title. You can insert a logo by dragging and dropping it from your desktop, copy and paste or by going to INSERT > PICTURES. Logos taken from web sites are likely to be low quality when printed. Zoom it at 100% to see what the logo will look like on the final poster and make any necessary adjustments. TIP: See if your school’s logo is available on our free poster templates page.

Photographs / Graphics You can add images by dragging and dropping from your desktop, copy and paste, or by going to INSERT > PICTURES. Resize images proportionally by holding down the SHIFT key and dragging one of the corner handles. For a professional-looking poster, do not distort your images by enlarging them disproportionally.

Image Quality Check Zoom in and look at your images at 100% magnification. If they look good they will print well.

ORIGINAL   DISTORTED  

Corner  handles  

Good

 prin

/ng  qu

ality

 

Bad  prin/n

g  qu

ality

 

Suicide is among the leading causes of death in the United States. Two timely challenges to suicide prevention are that diagnosis is often based on unreliable self-reports and that it is difficult to understand warning signs. We focus on distress levels, and take steps toward its automated detection using linguistic data on Twitter. We use an annotated dataset, supervised machine learning, and topic modeling to model distress, a key risk factor for suicide ideation. The results indicate that the expertise level of annotators is key.

ABSTRACT  

INTRODUCTION  

METHODS  AND  RESULTS  

CONCLUSIONS  

We showed that methods used to automatically collect data, as well as the expertise level of the annotator affect the performance of automatic systems. In general, our study and its results—from filtering via data annotation to classification—confirmed the critical importance of bringing clinical expertise into the computational modeling loop. It is essential to consider the ethical issues regarding the use of publicly available social media data.

REFERENCES  [1] H.C. Kung, D. L. Hoyert, J. Xu, and S. L. Murphy, “Deaths: final data for 2005.,” Natl Vital Stat Rep, vol. 56, no. 10, pp. 1–120, 2008. [2] Jared Jashinsky, Scott H Burton, Carl L Hanson, Josh West, Christophe Giraud-Carrier, Michael D Barnes, and Trenton Argyle. 2013. Tracking suicide risk factors through Twitter in the US. Crisis, pages 1–9. [3] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, vol. 3, pp. 993–1022, Mar. 2003. [4] Corinna Cortes and Vladimir Vapnik. 1995. Support-Vector Networks. Mach. Learn. 20, 3 (September 1995), 273-297.

ACKNOWLEDGEMENTS  This work was sponsored by a Kodak Endowed Chair grant from the Golisano College of Computing and Information Sciences at RIT and NSF award SES-1111016.

•  ~ 1 million people worldwide die by suicide. It is a leading cause of death among individuals ages 14-25 [1]

•  We take an initial step toward detection of suicidal risk factors through social media without reliance on self-reporting

•  We consider the performance of annotators’ expertise in suicide prevention at annotating micro-blog data for the purpose of training text-based models for detecting distress, a known suicide risk factor

Golisano  College  of  Compu/ng  and  Informa/on  Sciences#*;  College  of  Liberal  Arts^  

Rochester  Ins/tute  of  Technology  Department  of  Psychiatry$  

University  of  Rochester  Medical  Center    

Christopher  Homan#,  Ravdeep  Johar*,  Tong  Liu*,  Megan  Lytle$,  Vincent  Silenzio$  and  Cecilla  Ovesdo[er  Alm^  [email protected]#  |  rsj7209,  [email protected]*  |  Megan_Lytle,  [email protected]$  |  [email protected]^  

Toward   Macro-­‐Insights   for   Suicide   PrevenLon:   Analyzing   Fine-­‐Grained   Distress   at   Scale

•  The mental state of another individual is hard to discern, even under less noisy conditions. Observers’ understandings of “distress” may differ drastically. •  Due to the overlap between internal and external expressions of anger, it is difficult to classify distress behavior without more contextual information.

LIMITATIONS  

Summary statistics and thematic categories distributions of the collected dataset.

Distribution of distress level annotations on the tweets annotated by Novices 1 and 2 (N=250,

identical set)

Distribution of distress level annotations from Novice 1 and Expert. Note these two datasets are disjoint (N=1000, respectively)

Topic analysis on bigrams of tweets labeled as high distress vs. randomly selected tweets from the larger, unlabeled dataset. The high distress tweets clearly convey strong negative affect.

MATERIALS  Dataset •  Collected over a month from Twitter in May 2012 •  Only active users’ tweets were considered

Flowchart for steps involved for going from a large micro-blog database to distress modeling

Source tweets

Number of tweets 2,535,706

Unique geo-active users 6,237

“Follows” relationships 102,739

“Friends” relationships 31,874

Filtered tweets

Number of tweets 2,000

Unique users 1,467

Unique unigrams 1,714,167

Unique bigrams 9,246,715

Unique trigrams 13,061,142

Category distribution

[2]

LIWC sad 1,370

Depressive feeling 283

Suicide ideation 123

Depression symptoms 72

Self harm 67

Family violence/discord 47

Bullying 10

Gun ownership 10

Drug abuse 6

Impulsivity 6

Prior suicide attempts 2

Suicide around individual 2

Psychological disorders 2

0.5  

0.6  

0.4  

Combinations

Suicide risk factors keywords

LIWC sad scores

Cohen Kappa inter-annotator agreement between Novice 1 and 2

979 Date: XXXX

-3: Dat man on maury is overreacting!! He juss doin dat cuz he on tv [-0:24:39]

-2: @XXXX cedes!!! [-0:21:25]

-1: Yesssss! Da weatherman was wronq no rainy ass prom days!! Yesss prom is 2day guys!!! Class of 2010!

[-0:02:56]

>>> @XXXX awwww thanks trae-trae <<<

1:

Rt @XXXX: abt 2 hop in a kab to skool I wouldn’t dare spend over 2 dollars to get somewhere I dnt wanna be n da first place!

[+0:00:57]

2: @XXXX yeaa [+0:03:59] 3: @XXXX wassup? [+0:05:28]

Msg_id: XX [Distress: ND, LIWC Sad: No]

Training Testing Precision Recall F-Measure

Novice1 Novice1 0.53 0.63 0.58

Novice1 Expert 0.58 0.27 0.37

Expert Expert 0.59 0.71 0.64

Expert Novice1 0.34 0.85 0.48

N1+E N1+E 0.33 0.41 0.37

High Distress Random feel like, wanna cry, get hurt, miss 2, ima miss, win lose, tired

everything, broke bitches, gun range, one person good morning, last night, happy birthday, look like, bout 2, can’t wait,

video, know (cont), chris brown, jus got

commit suicide, miss you!, miss baby, feel empty, committing suicide, tired living, sleep forever, lost phone, left alone, :( miss

feel like, let know, make sure, bout go, time get, don’t get, wats good, . ., don’t want, jus saw

hate job, feel sad, tummy hurts, lost friend, feel helpless, leave alone, don’t wanna, worst feeling, leave world, don’t let

don’t know, let’s go, looks like, what’s good, go sleep, even tho, hell yea, new single, r u?, don’t wanna

Performance of SVM-based classification. We combine LD and HD into a single class with respect to binary (distress vs. non-distress) classification. In each case, a held-out set of 100 randomly selected tweets compose the test set and the remaining 900 tweets from that annotator compose the training set. The last row shows when the two training sets (respectively, test sets) are combined into a single set of 1,800 (respectively, 200) tweets.

Many of the risk factors for suicidal behavior may be linked to other expression of distress, such as aggression and interpersonal violence. Consistent with the stress diathesis model for suicidal behavior, aggression was an emerging theme that arose from the data. For example:

DISCUSSION  

•  @XXXX I don’t feel sad 4 him. He gets pissed n says wat he wants then sends out fony apologies.

•  @XXXX cuz he’s n a relationship with that horseface bitch & he lied 2 me & I feel so used & worthless now

Tweets were annotated for four categories (H—happy, ND—no distress, LD—low distress, HD—high distress)

Tweets do not only reveal internalized aggression (related to suicidal behavior) but also externalized aggression.