Gender and language (linguistics, social network theory, Twitter!)

86
Gender, language and Twitter: Social theory and computational methods Tyler Schnoebelen (including work with David Bamman and Jacob Eisenstein) Tweet this talk! @Tschnoebelen

description

The relationship between gender, linguistic style, and social networks, using a novel corpus of over 14,000 Twitter users. Prior quantitative work on gender often treats it as a female/male binary, but that's problematic at a theoretical level and descriptively inadequate. By clustering Twitter users by the words they use, we find a natural decomposition of the dataset into various styles and topical interests. Many of these clusters end up having strong gender orientations, but they offer a more accurate reflection of the multifaceted nature of gendered language styles. Previous corpus-based work has also had little to say about individuals whose linguistic styles defy population-level gender patterns. To identify such individuals, we train a statistical classifier, and measure the classifier confidence for each individual in the dataset. Examining individuals whose language does not match the classifier's model for their gender, we find that they have social networks that include significantly fewer same-gender social connections, and that in general, social network homophily is correlated with the use of same-gender language markers. I'll hope to persuade you that the combination of computational methods and social theory offers new perspectives on how gender emerges as individuals position themselves relative to audiences, topics, and mainstream gender norms.

Transcript of Gender and language (linguistics, social network theory, Twitter!)

Page 1: Gender and language (linguistics, social network theory, Twitter!)

Gender, language and Twitter: Social theory and computational methods

Tyler Schnoebelen (including work with David Bamman and Jacob Eisenstein)

Tweet this talk! @Tschnoebelen

Page 2: Gender and language (linguistics, social network theory, Twitter!)

Welcome to the slide-u-ment

• Hi, you may want to check out the “Notes” fields for additional context.

Page 3: Gender and language (linguistics, social network theory, Twitter!)

At its most basic

Page 4: Gender and language (linguistics, social network theory, Twitter!)

At its most basic

• Assumption 1: Men and women use different vocabularies– Hypothesis I: Computational methods can cut through

noise and predict speaker gender based on the words they use

• Assumption 2: Social networks are typically “homophilous” (birds of a feather flock together)– Hypothesis II: Adding the gender make-up of a user’s

social network should get even better prediction

Page 5: Gender and language (linguistics, social network theory, Twitter!)
Page 6: Gender and language (linguistics, social network theory, Twitter!)

Let’s say we can predict gender

• So what?• Does it license us to connect words/word

groups to the social category in question?• This assumes that gender is– Stable– The primary driving force

Page 7: Gender and language (linguistics, social network theory, Twitter!)

Our actual goal

• Problematize gender prediction as a task– Define a system where we could just “stop” and

call it good– But NOT ACTUALLY STOP

• Demonstrate that simple gender binaries aren’t actually descriptively accurate

• Show ways to combine social theory and computational methods that expand the questions on both sides

Page 8: Gender and language (linguistics, social network theory, Twitter!)

QUICK LITERATURE REVIEW

Page 9: Gender and language (linguistics, social network theory, Twitter!)

“Standard” is a keyword

Page 10: Gender and language (linguistics, social network theory, Twitter!)

Typical findings• Women use standard variables

more often than men.– In fact, early dialectologists

ignored women completely because they wanted “NORMS”—non-mobile, older, rural male speakers, seen as preserving the purest regional (non-standard) forms • See Chambers and Trudgill (1980).

– Did they do it for prestige (to acquire social capital)?

– To avoid losing status?– Are women actually creating

norms, not following them?

Page 11: Gender and language (linguistics, social network theory, Twitter!)

Computational/corpus work

• People are fascinated by gender differences• In order to get statistical significance, you have

to have enough data where you can detect a signal

• In the past, this has led researchers to roll up words into word classes

Page 12: Gender and language (linguistics, social network theory, Twitter!)

The most common distinctions

• Men use informative language– Prepositions (to), attributive adjectives (fat), higher

word lengths (gargantuan)• Women use involved language– First and second person pronouns (you), present

tense verbs (goes), contractions (don’t)

• (Argamon, Koppel, Fine, & Shimoni, 2003; Herring & Paolillo, 2006b; Schler, Koppel, Argamon, & Pennebaker, 2006…they are working off of dimensions in Biber 1995 and Chafe 1982)

Page 13: Gender and language (linguistics, social network theory, Twitter!)

Or “contextuality”

• Men are formal and explicit– Nouns (floor), adjectives (big), prepositions (to), articles (the)

• Women are deictic and contextual– Pronouns (you), verbs (run), adverbs (happily), interjections

(oh!)• “Contextuality” decreases when an unambiguous

understanding is more important or difficult—when people are physically or socially farther away

• (Mukherjee & Liu, 2010; Nowson, Oberlander, & Gill, 2005 building off of Heylighen and Dewaele 2002)

Page 14: Gender and language (linguistics, social network theory, Twitter!)

Are all nouns really the same?

Page 15: Gender and language (linguistics, social network theory, Twitter!)

Are all nouns really the same?

Page 16: Gender and language (linguistics, social network theory, Twitter!)

And what about…

Page 17: Gender and language (linguistics, social network theory, Twitter!)

And what about…

Page 18: Gender and language (linguistics, social network theory, Twitter!)

Our approach also lumps

• It’s just at a lower level – instead of “nouns” or

“blog words”– we assume all usages of a

unigram are identical

• Lumping itself isn’t a problem. In fact, you have to.– But ideologies are going

to structure your lumpings and divisions, so watch out!

Page 19: Gender and language (linguistics, social network theory, Twitter!)

OUR WORK(WITH DAVID BAMMAN AND JACOB EISENSTEIN)

Page 20: Gender and language (linguistics, social network theory, Twitter!)

Data• Public Twitter messages in same-gender and cross-gender social

networks– Word frequencies (unigrams)– Gender (induced from first names using the Social Security

Administration data)• 14,464 Twitter users (56% male)

– Geolocated in the US– Must use 50 of top 1,000 most frequent words – Between 4 and 100 ties (at least 2 “mutual @’s” separated by 14 days)

• Women have 58% female friends• Men have 67% male friends

• 9.2M tweets, Jan-Jun 2011

Page 21: Gender and language (linguistics, social network theory, Twitter!)

Twitter has a pretty good swath (Pew)

• Nearly identical usage among women and men:– 15% of female internet users are on Twitter– 14% of male internet users

• High usage among non-Hispanic Blacks (28%)• Even distribution across income and education

levels• Higher usage among young adults (26% for

ages 18-29, 4% for ages 65+)

Page 22: Gender and language (linguistics, social network theory, Twitter!)

First names are highly gendered

Matt

Alex

Chris

Kelly

Sarah

0 10 20 30 40 50 60 70 80 90 100

100

97

86

15

0

0

3

14

85

100

% female% male

95% of users have a name 85% associated with one genderMedian user name is 99.6% associated with its majority gender

Page 23: Gender and language (linguistics, social network theory, Twitter!)

First step: gender prediction

• Logistic regression: – Will you have a heart attack Y/N?– Will you vote for X or Y?– Will your Brazilian Portuguese nouns and modifiers

agree in number? • Logistic regression is the statistical technique at

the core of variable rule analysis (Tagliamonte 2006)

• But we’re going to reverse the direction for what sociolinguists typically do

Page 24: Gender and language (linguistics, social network theory, Twitter!)

First step: gender prediction

• The relevant linguistic variables aren’t known beforehand

• So the dependent variable—the thing we are trying to predict—is author gender

• The independent variables are the 10,000 most frequent lexical items in the tweets

Page 25: Gender and language (linguistics, social network theory, Twitter!)

Preventing overfitting

• This involves estimating a lot of parameters.• Which raises the risk of overfitting: learning

parameter values that perfectly describe the training data but won’t generalize to new data

Page 26: Gender and language (linguistics, social network theory, Twitter!)

Why regularize?

Regularization dampens the effect of an individual variable (Hastie et al 2009).

A single regularization parameter controls the tradeoff between perfectly describing the training data and generalizing to unseen data.

Page 27: Gender and language (linguistics, social network theory, Twitter!)

Evaluating accuracy

• We use the typical method of cross-validation.1. Randomly divide the full dataset into 10 parts.2. Train on 80% of the data3. Use 10% of the data to tune the regularization

parameter4. Now, use the model to predict the other 10%5. Compare the predictions to what really happened

• Do this 10 times and take the average.

Page 28: Gender and language (linguistics, social network theory, Twitter!)

Gender prediction results

• State-of-the-art accuracy: 88.0%– Lexical features strongly predict gender– Ignoring syntax (treating tweets as “bags of

words”) does pretty good

Page 29: Gender and language (linguistics, social network theory, Twitter!)

Previous literature In our dataPronouns F FEmotion terms F FFamily terms F Mixed results"Blog words" (lol, omg) F FConjunctions F F (weakly)Articles M No resultsNumbers M MQuantifiers M No resultsTechnology words M MPrepositions Mixed results F (weakly)Swear words Mixed results MAssent Mixed results Mixed resultsNegation Mixed results Mixed resultsEmoticons Mixed results FHesitation markers Mixed results F

Top 500 markers for each gender

Page 30: Gender and language (linguistics, social network theory, Twitter!)

At a corpus level, women use more non-dictionary words and men use more named entities. In a moment we’ll ask how universal this is.

Hand classification of most frequent 10k words (90.0% agreement)

Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g.,

:), lmao)

4.28% 2.99%

Non-standard, pronounceable words (e.g., luv) 3.55% 3.35% Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%

Page 31: Gender and language (linguistics, social network theory, Twitter!)

Involvement

• Using traditional definitions, it looks as if our data confirms:– men as more informational (all those named

entities) – women as more interactive/involved (pronouns,

emoticons, etc.)• Note that most of the named entities for the

men are sports figures and teams

Page 32: Gender and language (linguistics, social network theory, Twitter!)

Right. These guys are not “involved”.

Page 33: Gender and language (linguistics, social network theory, Twitter!)
Page 34: Gender and language (linguistics, social network theory, Twitter!)

Clustering without regard to gender

• We apply probabilistic clustering in order to group authors who are linguistically similar

• Each author is represented as a list of word counts across the 10,000 words used in the classification experiment

Page 35: Gender and language (linguistics, social network theory, Twitter!)

Clustering! (Hastie et al 2009)

Easy example: 2 clusters “Expectation Maximization”1. Randomly assign all authors to one of

20 clusters2. Calculate the center of the cluster

from the average word counts of all authors put in it

3. Assign each author to the nearest cluster, based on the distance between their word counts and the average word counts of the cluster center

4. Keep iterating through this moving from random clustering to meaningful clusters

5. Repeat steps 1-4 (25 times)6. Pick the best

Page 36: Gender and language (linguistics, social network theory, Twitter!)

Some definitions

• Style: combinations of linguistic resources• Cluster: a group of authors who use a

particular style• Social network: each author has a social

network made up of people who they send AND receive messages from

• An author’s social network does not have to be a part of that author’s cluster

Page 37: Gender and language (linguistics, social network theory, Twitter!)

Majority female clusters  Size % fem Top words

c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup

c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl

c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:

c16 200 78.00% xo blessings -) xoxoxo #music #love #socialmedia slash :)) xoxo

c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww

c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___-

c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes

c9 458 60.00% wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls niggas finna

Page 38: Gender and language (linguistics, social network theory, Twitter!)

Looks like “women are trying to destroy the English language”

Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g.,

:), lmao)

4.28% 2.99%

Non-standard, pronounceable words (e.g., luv)

3.55% 3.35%

Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%

Page 39: Gender and language (linguistics, social network theory, Twitter!)

Clusters that are majority female

• At the population level, women use many non-dictionary words.

• But there are clusters of (mostly) women who actually use fewer words like lol, nah, haha than men do

  Size % fem Top words

c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup

c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:

c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes

Page 40: Gender and language (linguistics, social network theory, Twitter!)

Consider xo• A lot more women use xo than

men– 11% of all women– 2.5% of all men

• But that means that 89% of women aren’t using it at all.

• People who use xo are three times more likely to use ttyl (‘talk to you later’)– The style is more commonly adopted

by women– But there’s other stuff going on

here: age, job, etc.– It’s not clear that gender is even the

most important, it’s just that we’re starting with gender-colored glasses

Page 41: Gender and language (linguistics, social network theory, Twitter!)

Shit Girls Say

http://www.youtube.com/watch?feature=player_embedded&v=u-yLGIH7W9Y

Page 42: Gender and language (linguistics, social network theory, Twitter!)

Meme-splosion!

Page 43: Gender and language (linguistics, social network theory, Twitter!)

Group Gender Activity/social role Interactions GeographyShit Guys Don't Say Out LoudShit College Freshmen SayShit Girlfriends SayShit Asian Dads SayShit Redneck Guys SayShit Girls Say to Gay Guys SayShit Black Girls Say SayShit Black Guys Say SayShit People Say in LAShit White Girls Say…to Black GirlsShit New Yorkers SayShit Frat Guys SayShit Whipped Guys SayShit Guys Don't Say SayShit Asian Girls SayShit Tumblr Girls SayShit Brides SayShit Spanish Girls SayShit Asian Moms SayShit Vegans SayShit Hipsters SayShit Cyclists SayShit Yogis SayShit Skiers Say

Page 44: Gender and language (linguistics, social network theory, Twitter!)

Notice

• That gender wasn’t really limited to the “gender” column– “Moms” and “dads” are gendered social roles

• And that the words “guys” and “girls” aren’t really the same as “male” and “female”– What are the plausible age ranges and social styles

for “guys” and “girls”?

Page 45: Gender and language (linguistics, social network theory, Twitter!)

Clusters that are majority male Size % male Top words

c13 761 89.40% #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer

c10 1,865 85.40% /cc api ios ui portal developer e3 apple's plugin developers

c18 623 81.10% @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler

c11 432 73.80% niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo

c20 429 72.50% gop dems senate unions conservative democrats liberal palin republican republicans

c15 963 65.30% #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare

Page 46: Gender and language (linguistics, social network theory, Twitter!)

Looks like “men are Twitter-headed sailor-swearing accountants”

Female authors Male authors Common words in a standard dictionary 74.2% 74.9% Punctuation 14.6% 14.2% Non-standard, unpronounceable words (e.g.,

:), lmao)

4.28% 2.99%

Non-standard, pronounceable words (e.g., luv)

3.55% 3.35%

Named entities 1.94% 2.51% Numbers 0.83% 0.99% Taboo words 0.47% 0.69% Hashtags 0.16% 0.18%

Page 47: Gender and language (linguistics, social network theory, Twitter!)

Aggregates generally don’t hold Top words Notes

c13 #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer

Few Taboo/Hashes Lots of Punc

c10 /cc api ios ui portal developer e3 apple's plugin developers

Few Taboo/Hashes Lots of Punc

c18 @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler

c11 niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo

Few Dict words, Lots of unPron and Pron

c20 gop dems senate unions conservative democrats liberal palin republican republicans

Few Taboo/Hashes Lots of Punc

c15 #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare

Few Taboo Lots of Punc

Page 48: Gender and language (linguistics, social network theory, Twitter!)

Small exceptions

• At the population level, men use many named entities and numbers

• Clusters use these at various rates, but:– No female-skewed clusters use them *more* than the

male average– No male-skewed clusters use them *less* than the

female average• But again, the other 6 generalizations about

gender we might have made at an aggregate aren’t supported once we get to clusters

Page 49: Gender and language (linguistics, social network theory, Twitter!)

Erasure!• Clusters are highly gendered• For example, let’s consider clusters

made up of 60% or more of people of the same gender– That covers 82.95% of all the authors– But what about the 1,242 men who

are part of female-majority clusters?– The 1,052 women who are part of

male-majority clusters?– Are they just noise? Odd-balls? Is

there no structure to what they’re doing?

– These people are using language to do identity work, even as they construct identities at odds with conventional notions of masculinity and femininity.

Page 50: Gender and language (linguistics, social network theory, Twitter!)

Clusters vs. social networks

• The more skewed a cluster is, the more skewed the social networks of its members

Page 51: Gender and language (linguistics, social network theory, Twitter!)

Women with female networks use the most female markers

Page 52: Gender and language (linguistics, social network theory, Twitter!)

Men with male networks use the most male markers

Page 53: Gender and language (linguistics, social network theory, Twitter!)

Women with male networks use more male markers (and vice versa)

Page 54: Gender and language (linguistics, social network theory, Twitter!)

Women with highly female networks are easier to classify (and vice versa)

Page 55: Gender and language (linguistics, social network theory, Twitter!)

In other words

• The classifier is picking up on the fact that if you insist upon a gender binary then people with same-gender networks use language in a more “gender-coherent” way.

Page 56: Gender and language (linguistics, social network theory, Twitter!)

Does social network help prediction?

• 88% accuracy with text alone– Logistic regression, 10-fold cross-validation– State-of-the-art accuracy

• Add network information…– Still 88% accuracy

Page 57: Gender and language (linguistics, social network theory, Twitter!)

Once we have 1000 words/author, network info doesn’t help

Page 58: Gender and language (linguistics, social network theory, Twitter!)

Wait, why not?

• A new feature is only going to improve classification accuracy if it adds new information.

• There is strong homophily: 63% of the connections are between same-gender individuals.

• But language and social network can’t mutually disambiguate because they aren’t independent views on gender.

• Individuals who use linguistic resources from “the other gender” consistently have denser social network connections to the other gender. – Performance, style, accommodation

• Gender is not an “A or B” kind of thing

Page 59: Gender and language (linguistics, social network theory, Twitter!)

If we seek only predictive accuracy…

Page 60: Gender and language (linguistics, social network theory, Twitter!)

We’re awesome!

Page 61: Gender and language (linguistics, social network theory, Twitter!)

Not so simple

• If we want to understand categories, we should start with people in interactions.– Counting is great but we have to watch our bins

and investigate them, too.

Page 62: Gender and language (linguistics, social network theory, Twitter!)

Look at words a different way

Page 63: Gender and language (linguistics, social network theory, Twitter!)

Not markers…

Page 64: Gender and language (linguistics, social network theory, Twitter!)

Not markers…makers

Page 65: Gender and language (linguistics, social network theory, Twitter!)

Positioning

Page 66: Gender and language (linguistics, social network theory, Twitter!)

Positioning and stance• “Stance” is usually seen as an

expression of a speaker’s relationship to their talk and their interlocutors – E.g., Kiesling (2009); Du Bois

(2007); Bednarek (2008)• But “stance” (and “roles”)

seem static• I’d like something with more

motion and dynamism

Page 67: Gender and language (linguistics, social network theory, Twitter!)

Positioning and stance• “Stance” is usually seen as an

expression of a speaker’s relationship to their talk and their interlocutors – E.g., Kiesling (2009); Du Bois

(2007); Bednarek (2008)

• But “stance” (and “roles”) seem static

• I’d like something with more motion and dynamism

• I develop positioning to connect linguistic forms to social structures

• (Particularly affect, actually)

Page 68: Gender and language (linguistics, social network theory, Twitter!)

Positioning in a social grid

Page 69: Gender and language (linguistics, social network theory, Twitter!)

Sister

Daughter

Spinster

Subject

Object

Dentist

Farmer

Father

Page 70: Gender and language (linguistics, social network theory, Twitter!)

Positioning in a social grid

• Social structures are created, maintained, and changed by specific interactions

• People enter interactions already positioned

• Interactions change these positions, people are attentive to changes

Page 71: Gender and language (linguistics, social network theory, Twitter!)

Conventions

• Different linguistic resources come to be associated with different positionings

• Distributions of experiences are usually maintained

• The maintenance and disruption of expectations has (affective) consequences

Page 72: Gender and language (linguistics, social network theory, Twitter!)

A LITTLE BIT OF LITTLE

Page 73: Gender and language (linguistics, social network theory, Twitter!)

CHILDES (MacWhinney, 2000)

• 4,676 transcripts of parent-child interactions– American English

Observed little Expected little O/EMothers-to-boys 4,313 4,158 1.037Fathers-to-boys 1,516 1,381 1.098Mothers-to-girls 6,312 5,441 1.160Fathers-to-girls 230 281 0.819Girls-to-mothers 1,221 1,533 0.796Girls-to-fathers 4 3 1.482Boys-to-mothers 875 1,526 0.573Boys-to-fathers 117 265 0.441

Page 74: Gender and language (linguistics, social network theory, Twitter!)

Gender and little• Women tend to use little more—multiple corpora show significant

differences• But this misses the point

Buckeye OE

CALLHOME OE

Female 1.170 1.073

Male 0.855 0.725

Page 75: Gender and language (linguistics, social network theory, Twitter!)

Add interlocutor gender

 CHILDES Parent-Child OE

CHILDES Child-Parent OE

Buckeye OEFisher Am. Eng. OE

Fisher Ohioans OE

CALLHOME OE

Female to female

1.160 0.796 0.936 1.051 1.160 1.088

Female to male

1.037 1.482 1.290 0.887 0.771 1.064

Male to male

1.098 0.441 0.879 1.071 0.830 0.685

Male to female

0.819 0.573 0.908 0.842 0.836 0.727

Page 76: Gender and language (linguistics, social network theory, Twitter!)

Gender and topics• Some topics are more face-threatening than others.

– Face-threatening topics get less little. • When topic is held constant, men and women mostly have the

same little usage .– Regardless of the gender of the person they’re talking to.

• But there are some exceptions, which are connected to issues of masculinity, femininity, and emotional regulation. – Some examples:

• Generally, people don’t use little to talk about terrorism. EXCEPT women speaking to women use little to modify emotions (terrified, scared)

• Generally, people DO use little to talk about fitness. EXCEPT men talking to men. The men talking to women use little to talk about their pudgy, flabby bodies. The few men talking to men who use little use it to talk about working out a little harder or putting on a little more muscle mass.

Page 77: Gender and language (linguistics, social network theory, Twitter!)

ICSI meeting corpus (Janin et al., 2003)

• 75 meetings from Berkeley’s International Computer Science Institute (2000-2002)– 3-10 participants (avg of 6)– 17-103 minutes each (usually an hour)– 72 hours of data

# speakers (avg age)

Observed little

Expected little

O/E

Undergrad 6 (30 yo) 59 34 1.734Grad 14 (29 yo) 234 223 1.049Postdoc 1 (not given) 51 75 0.676Ph.D. 11 (37 yo) 152 228 0.667Professor 4 (52 yo) 278 213 1.302

Page 78: Gender and language (linguistics, social network theory, Twitter!)

Gender, genre, topic, style

• “Different ways of saying things are intended to signal different ways of being, which includes different potential things to say.” (Eckert 2008)

Page 79: Gender and language (linguistics, social network theory, Twitter!)

Majority female clusters  Size % fem Top words

c14 1,345 89.60% hubs blogged bloggers giveaway @klout recipe fabric recipes blogging tweetup

c7 884 80.40% kidd hubs xo =] xoxoxo muah xoxo darren scotty ttyl

c6 661 80.00% authors pokemon hubs xd author arc xxx ^_^ bloggers d:

c16 200 78.00% xo blessings -) xoxoxo #music #love #socialmedia slash :)) xoxo

c8 318 72.30% xxx :') xx tyga youu (: wbu thankyou heyy knoww

c5 539 71.10% (: :') xd (; /: <333 d: <33 </3 -___-

c4 1,376 63.00% && hipster #idol #photo #lessambitiousmovies hipsters #americanidol #oscars totes #goldenglobes

c9 458 60.00% wyd #oomf lmbo shyt bruh cuzzo #nowfollowing lls niggas finna

Page 80: Gender and language (linguistics, social network theory, Twitter!)

Clusters that are majority male Size % male Top words

c13 761 89.40% #nhl #bruins #mlb nhl #knicks qb @darrenrovell inning boozer jimmer

c10 1,865 85.40% /cc api ios ui portal developer e3 apple's plugin developers

c18 623 81.10% @macmiller niggas flyers cena bosh pacers @wale bruh melo @fucktyler

c11 432 73.80% niggas wyd nigga finna shyt lls ctfu #oomf lmaoo lmaooo

c20 429 72.50% gop dems senate unions conservative democrats liberal palin republican republicans

c15 963 65.30% #photo /cc #fb (@ brewing #sxsw @getglue startup brewery @foursquare

Page 81: Gender and language (linguistics, social network theory, Twitter!)

Gender is not something people have

Page 82: Gender and language (linguistics, social network theory, Twitter!)

It’s something people *do*

And there are a lot of ways to “do” gender.

Page 83: Gender and language (linguistics, social network theory, Twitter!)

Computational Judith Butler!

Page 84: Gender and language (linguistics, social network theory, Twitter!)

Gender is binary only with blinders

• “My mom doesn’t say that’s lovely or omg!...”– “Nevermind that!”

• Problem: Sliding from predictive accuracy to causal stories

• Realistic finding: There are lots of ways to do gender

Page 85: Gender and language (linguistics, social network theory, Twitter!)

Big data, big opportunities

• Big data offers us the opportunity to let clusters emerge (and test them against our big bins)

• We can show how language reflects and creates the social worlds we live in

Page 86: Gender and language (linguistics, social network theory, Twitter!)

THANKS!