Explicitness and implicitness of discourse relations ...€¦ · This work: Data • Tweets and...

41
Explicitness and implicitness of discourse relations across social media Tatjana Scheer University of Potsdam January 18, 2020

Transcript of Explicitness and implicitness of discourse relations ...€¦ · This work: Data • Tweets and...

Explicitness and implicitness of discourse relations across

social media Tatjana Scheffler

University of Potsdam

January 18, 2020

Do explicitness and implicitness of discourse

relations differentiate between types of social

media?

Close DistantWritten

Spoken

Koch/Österreicher (1985)

Blogs

Tweets

Structure of this presentation

• Coherence relations in spoken / written language

• Coherence relation marking in social media

• Dataset

• First results: qualitative sample

• (First results: quantitative)

Background

• Expression of discourse coherence differs between spoken and written media

• Some annotation efforts: Tonelli et al. (2010), Rehbein et al. (2016), Zeyrek et al. (2018)

• Discourse parsing in spoken domain: Riccardi et al. (2016)

• Conceptional work: Crible/Cuenca (2017), Zeyrek et al. (2018)

Coherence relations in speech vs. writing

Speech Writing

Fewer overall relations More coherence relations TRPJ2010

Explicit 2 : 1 Implicit Explicit 1:1 Implicit TRPJ2010, RSD2016

Truncated relation structures Full structures (both args present) CC2017

Connectives: far scope, vague, multifunctional CC2017

Temporal and Causal relations,Epistemic cause (Many EntRels) TRPJ2010,

RSD2016

New functions: Repetition, Hypophora (Q-A)

TRPJ2010, ZMK2018

Research questions

• Do blogs exhibit more or less explicit discourse relations than tweets?

• Which types of connectives and relations vary across the two media?

• Are individual author choices relevant for explicitation or implicitation of discourse relations?

Coherence relations in social media

• Scheffler/Stede 2016: Argumentative relations in PCC news text vs. Twitter

• Scheffler/Aktas/Das/Stede 2019: Annotating shallow discourse relations in Twitter conversations

Scheffler/Stede 2016• identify a subset of argumentative structures by text

segments using two types of “pragmatic” rhetorical relations:

• adversative relations and causal relations

• compare the linguistic signalling of these relations in two types of German corpora

• PCC newspaper editorials

• political Twitter conversations

• feature set: connectives, negation, 1st person, (modals)

• RST annotations

Complexity

• newspaper text segments much longer (18.9 vs. 7.4 words)

• newspaper segments also somewhat more complex

• number of verbs:

Connectives

• adversative relations more often marked than causal

• nucleus in causal relations almost never marked in Twitter (predominance of weil ‘because’ is similar to spoken language)

Types of connectives

• paratactic connectives in adversative relations on Twitter

• causal connectives (denn vs. weil) reflect spoken/written continuum

Twitter PDTB annotation• Explicit discourse relations occur frequently in English

Twitter data.Out of 1756 tweets, over 40% contain at least one tweet-internal explicit connective.

• Different relation distribution from PDTB (more Contingency)

• Dominance of few common connectives (and, but)

• Many fragments or incomplete utterances

• Connectives used as discourse markers (e.g., and)

This work: Data• Tweets and blog posts from the same authors

• Twitter list: “Elternbloggerkarte” (parenting theme)

• Identified blog associated with the Twitter account from the bio

• Extracted Twitter timeline (~ last 4-5 months of tweets) through API and last 5-10 blog posts via RSS feeds

• Excluded users w/ < 1000 tokens in tweets or blogs*

=> 62 users, 580 blog posts, >120,000 tweets

Blogs/tweets corpusBlogposts Tweets PCC (news)

users 62 62

items 580 120,728

tokens 463,743 1,892,146

type/token ratio (avg.) 0.28 0.22 0.54

word length (chars.) 4.68 4.85 6.36

Overall connective frequency

Which is it?

• Connectives are more frequent in blogs:

1. Discourse relations are generally more frequent (per sentence/token).

2. Discourse relations are more frequently made explicit.

Individual connective frequency

• One author with typical distribution (#11)

• One author with similar frequency in tweets/blogs (#63)

#63 - Example tweets(1) Ich so: Ich muss jetzt die Steuer machen! Konzentration!

Wo sind die Kekse? Muss mir erst Tee kochen. Oh, die Katze will gekrault werden!Me: I have to do the taxes! Concentration! Where are the cookies? Must make tea first. Oh, the cat wants to be pet!

(2) Ich will Steuer machen und die Katze will auf meinem Schreibtisch gekrault werden. Tja.I want to do the taxes and the cat wants to be pet on my desk. Well.

#63 - Implicit

• Tweets: Count of relations depends on whether only intra-speaker implicit relations are allowed (see RSD 2016)

(4) @USER mein Vater hat das auf FB geteilt 🙄@USER my father shared that on FB 🙄

(5) @USER aber ich liebe Drews Haare 😍@USER but I love Drew’s hair 😍

#63 - DataTweets Blogs

Explicit 46 29

Implicit 23 / 82 37

NoRel 81 / 22 16

Tweets/Sentences 150 82

#63 - NoRel

• Tweets: First tweet of a thread

• (also discounted tweets of only links, English tweets, and retweets)

• Blogs: Title, first sentence, parentheticals

(3) … (hab’ ich was verpasst?) …… (did I miss something?) …

#63 - Implicit• Tweets: Many answers to previous tweets, including Hypophora

(4) @USER mein Vater hat das auf FB geteilt 🙄 @USER my father shared that on FB 🙄

(5) @USER aber ich liebe Drews Haare 😍 @USER but I love Drew’s hair 😍

• Blogs: Chopped, narrative style

(6) Im Flugzeug, ich sitze am Gang. eine ältere Dame auf der anderen Seite.On the airplane, I’m sitting in the aisle seat. an older lady across the aisle.

#63 - ExplicitTweets Blogs

23 und / and 14 und / and

10 aber / but 7 aber / but

8 wenn / if 2 weil / because

6 weil / because, dann / then außer, dann, denn, nachdem, nämlich, wegen

2 dabei / while , um…zu / to except, then, since, after, therefore, due to

als, damit, danach, deshalb, doch, nachdem, ohne…zu

when, in order to, after, therefore, however, after, without

#63 - Explicit

• Tweets: Connective clusters:

(7) Der Mann sagt,er will erst ab April tapezieren, weil man dabei nicht lüftet und dann danach keine kalte/feuchte Luft rein soll. Helft mir :(My husband says he doesn’t want to paint till April, because one shouldn’t air the room, and then afterwards shouldn’t let cold/damp air in. Help me :(

• Blogs:Question: What is the argument of

‘danach’ (afterwards)?

#63 - Explicit

• Tweets: Connective clusters:

(7) Der Mann sagt,er will erst ab April tapezieren, weil man dabei nicht lüftet und dann danach keine kalte/feuchte Luft rein soll. Helft mir :(My husband says he doesn’t want to paint till April, because one shouldn’t air the room, and then afterwards shouldn’t let cold/damp air in. Help me :(

• Blogs: Narrative ‘und’, ‘aber’

#63 - DataTweets Blogs

Explicit 46 29

Implicit 23 / 82 37

NoRel 81 / 22 16

Tweets/Sentences 150 82

#11 - DataTweets Blogs

Explicit 32 48

Implicit 39 / 93 44

NoRel 79 / 25 8

Tweets/Sentences 150 100

#11 - Implicit• About as many implicit relations as explicit relations in the

tweets -> Overall fewer (intra-speaker) relations

• Implicit relations mostly narrative fragments (as before):

(8) @USER Das Büro ist nicht betretbar. Alles Ordner aus den Schränken, Akten zerrissen. Geld noch da.@USER The office is inaccessible. All the folders off the shelves, files torn. Money still there.

• Many short replies

#11 - ExplicitTweets Blogs

11 aber / but 23 und / and

9 und / and 5 als / when

8 wenn / if 4 doch / however

2 dann / then 3 um…zu / in order to

ansonsten, da, nachdem, oder, weil, zumal 2 aber, dann, denn, wenn / but, then, since, if

otherwise, since, after, or, because, since also, auch, da, damit, dennoch, entweder…oder, nachdem, obwohl…so, also, since, to, however, either…or,

after, although, …

#11 - Explicit

• In explicit relations, one argument is often missing (see Cuenca/Crible for spoken language):

(9) @USER na wenn sonst nichts los ist 😂@USER well if there’s nothing else happening 😂

Summary Qualitative Analysis

• More explicit discourse connectives in blogs than in tweets (per sentence and per token)

• In blogs, about the same number of explicit and implicit relations (similar to other written text!)

• For tweets, all depends on one’s definition of a discourse relation:

• If only intra-speaker implicit relations are considered, then there are fewer overall discourse relations

• If inter-speaker implicit relations are allowed, then there are many more (implicit) discourse relations in tweets than blogs

This contradicts previous research on speech

Connective counts

Concession• Concession cannot be expressed implicitly

• Concessive connectives that occur on average more than 5 times in the data

Causal conjunctions

Causal connectives• Causal connectives are frequent in all media

• Conceptually oral/informal style of justification on Twitter

•(Scheffler, 2014)

Causal connectives• Fewer epistemic and speech-act level causes in Twitter

than in spoken language:

•(Scheffler, Schlüter, Stede, 2016; Volodina, 2010)

Quantitative analysis

• Ongoing

• For explicits only (for now): disambiguate connectives

• Quantify both individual variation and cross-media-effects

Individual variation

• Individual authors show preferences for (certain) connectives

• This reflects blog/tweet style