Debate in the Wildgary/cs200/w17/debate_in_the... · 2017-02-28 · Debate in the Wild - !6 Each...
Transcript of Debate in the Wildgary/cs200/w17/debate_in_the... · 2017-02-28 · Debate in the Wild - !6 Each...
Debate in the Wild - !1
Debate in the Wild:
Exploring the Real-World Impact of Language on Persuasiveness
Alexandra Paxton1 & Rick Dale2
1 Institute of Cognitive & Brain Sciences, University of California, Berkeley
2 Cognitive & Information Sciences, University of California, Merced
PAPER UNDER REVIEW — PROVISIONAL PDF ONLY — DO NOT DISTRIBUTE
PLEASE CONTACT AUTHORS PRIOR TO CITING
Corresponding Author:
Alexandra Paxton
University of California, Berkeley
Institute of Cognitive and Brain Sciences
3210 Tolman Hall # 1650
Berkeley, CA 94720-1650
Debate in the Wild - !2
Abstract
How does language affect persuasion in real-world settings? We created and analyzed a novel
corpus that is well-suited to answering this question: 69 transcripts of televised Oxford-style
sociopolitical debates from Intelligence Squared U.S. (http://www.iq2us.org). This forum includes pre-
and post-debate voting from a live audience, serving as native metric of persuasion. Using machine
learning and natural language processing, we explore how language patterns and attractiveness interact
with group membership (i.e., being “for” or “against” a motion) to influence persuasion. We see our
work as fitting with a growing push to explore cognitive and behavioral phenomena in real-world data.
Keywords: language; persuasion; philosophy; political science; psychology; big data
Debate in the Wild - !3
Debate in the Wild: Exploring the Real-World Impact of Language on Persuasiveness
Language and communication are commonly studied in neutral or positive contexts. This makes
debate an interesting phenomenon: Rather than disengaging during conflict, debaters are actively
engaged because of their intense disagreement. Equally interestingly, each debater simultaneously
attempts to persuade the audience of their perspective while defending against others’ attacks. By
encompassing numerous cognitive and social factors, debate is a theoretically interesting communicative
context that affords exciting opportunities to better understand language.
Here, we conduct exploratory analyses of debate “in the wild.” As a complex discourse context,
debate reflects the power and nuance of communication by weaving together diverse cognitive and
social components. Finding insight into these complex communication processes is difficult given the
many variables that likely influence persuasion naturally. Leveraging new methods and large datasets,
then, may open avenues of investigation into complex discourse settings that may be difficult to control
experimentally.
We take a “big data”-inspired approach to understanding communication and persuasion. The
growing availability of rich data provides new opportunities to quantify and analyze human behavior “in
the wild.” Inspired by recent calls for a computational cognitive revolution (Griffiths, 2014), we here
take advantage of large-scale, natural data to explore real-world implications of language patterns.
In exploratory analyses of over 300 experts across nearly 70 debates, we shed light on the ways
that language use, framing, and persuasion intersect in the “wild.” We analyze a novel corpus from
transcripts from Intelligence Squared U.S., a television show that pits teams of experts against one
another in Oxford-style debates about sociopolitical topics. Uniquely, the corpus enjoys a native
outcome measure: Spectators in the live audience were polled about their opinions at the beginning and
Debate in the Wild - !4
end of each debate. The present corpus, then, is able to precisely quantify persuasion in a real-world
debate through the self-reported opinions of the debate’s direct audience.
Background
From linguistic choices (e.g., Roddy & Garramone, 1988) to personal traits (e.g., Berggren,
Jordahl, & Poutvaara, 2010), seemingly trivial or unrelated factors can significantly affect a debater’s
ability to sway an audience’s opinions. Indeed, Mercier and Sperber (2011) have prominently suggested
that natural human argument is geared towards social influence rather than finding truth, in contrast to
traditional approaches to argumentation (e.g., Blair & Tindale, 2011; R. Cohen, 1987; Dung, 1995;
Jacobs, 2000; van Eemeren et al., 1995, 2014; Yoshimi, 2004). Taking our cue from this work, we
explore the roles that linguistic behaviors and appearance can have on what an audience may believe is
their rational decision-making process.
Decades of work in political science and psychology suggest a number of linguistic factors that
guide our exploratory analysis. Negativity can sway undecided voters, but using excessive negativity
can backfire against the speaker (e.g., Burgoon, Miller, M. Cohen, & Montgomery, 1978; Garramone,
1984; Geer, 2008; Kahn & Kenney, 1999; Roddy & Garramone, 1988). Hedging and pronoun use could
increase speaker persuasiveness by building rapport with the audience (e.g., Dafouz-Milne, 2008):
Hedges (e.g., “may,” “might”) signal tentativeness or uncertainty and can soften opinions for social
acceptability (e.g., Fuertes-Olivera, Velasco-Sacristán, Arribas-Baño, & Samaniego-Fernández, 2001),
while personal pronouns could promote an “in-group” bond between speaker and audience (e.g.,
Dafouz-Milne, 2008).
Personal traits also factor into persuasion, including attractiveness. When evaluating political
candidates, appearance can be more influential than personality traits: Attractiveness can mitigate
Debate in the Wild - !5
negative perceptions of candidates stemming from differences of political opinions (Budesheim &
DePaola, 1994). Analyses of voting records have shown that attractive candidates enjoy more than 20%
gains over average-looking candidates (Berggren et al., 2010).
A final potential factor is simply how a debater is framed within a context of a particular debate
—whether a debater is considered “for” or “against” a motion, simply by virtue of how a motion has
been posed. Subtle aspects of framing impact speaking and understanding, including during political
discussion and evaluation of political candidates (for review, see Matlock, 2012). Although some U.S.
sociopolitical issues can be characterized by codified labels (e.g., “pro-choice” versus “pro-life”), the
current corpus includes many topics that have not yet acquired such labels.
The Present Study
We utilize a new corpus to explore how these key factors play out in natural Oxford-style debate.
The corpus includes a range of features about each debater: language metrics, ratings of attractiveness,
and identity-by-framing (whether “for” or “against” a motion). We extract audience votes for a ground-
truth score of persuasiveness in each debate. As we show below, combinations of these factors can
successfully predict the outcome of debates. Though exploratory, our analyses recommend future
quantitative and theoretical directions for the study of debate and other contexts of communication.
Method
Corpus
Intelligence Squared U.S. (IQ2US; http://www.iq2us.org) is a television show of Oxford-style
debates. Each debate topic is posed as a motion (e.g., “America doesn’t need a strong dollar policy”).
Teams of debaters argue either for or against that motion. All debaters are experts or professionals, and
debaters only argue positions that they truly believe.
Debate in the Wild - !6
Each debate lasts approximately 105 minutes and features two equal groups of 2-3 experts, with
one group on each side of the motion. Panelist first receive 7 minutes for opening statements, alternating
between “for” and “against” group members. Next, panelists answer challenges from one another, the
moderator, and audience members. The final segment permits each panelist 2 minutes for closing
arguments, again alternating between groups.
We created a corpus of 69 publicly available debate transcripts from the IQ2US website. All
transcripts between September 2006 to September 2013 were included, spanning a variety of
sociopolitical issues (see Table 1). Non-speech elements (e.g., laughter) and moderator or audience
contributions were removed. The current subset encompassed 305 unique debaters across 69 debates,
with 908,348 words in 9,033 turns.
A unique feature of this corpus is its native measure of persuasion. Before and after each debate,
live audience members indicate their stance (“for,” “against,” or “undecided”). We derive a single
persuasion value for each debate through the net gain from pre- to post-debate vote in the “for” group
relative to the “against” group, which we call ∆V (see Table 1 for demonstration). Mean ∆V is slightly
Table 1. Example motions (topics) with pre- and post-debate votes, net gains from pre- to post-debate
votes, and ∆V (our derived outcome score for each debate) for each group. ∆V subtracts the net gain
for the “for” group from the net gain by the “against” group.
MotionPre-Debate Vote Post-Debate Vote Net Gain
∆VFor Against For Against For Against
Ban College Football
16% 53% 53% 39% 37% -14% 51%
Obesity is the Government’s
Business55% 19% 55% 35% 0% 16% -16%
The Rich are Taxed Enough 28% 49% 30% 63% 2% 14% -12%
California is the First Failed State
31% 25% 58% 37% 27% 12% 15%
Debate in the Wild - !7
skewed toward the “for” group (M=1.79, range=-43–71); both groups win approximately the same
number of debates (“for” wins=.51).
Debater Ratings
We collected ratings of debater attractiveness using headshots (90-100 x 70-100 pixels)
downloaded from the IQ2US website. Headshots were divided into 10 online surveys (M=39.3
headshots; range=30-40) and rated in random order on a 1-5 Likert-style scale for attractiveness. For
each debater, we averaged ratings from 91-97 undergraduate participants (M=95) from the University of
California, Merced. (See Appendix for more.)
Language Measures
To quantify language, we used Linguistic Inquiry and Word Count (LIWC; Pennebaker, Francis,
& Booth, 2007), a linguistic analysis tool in social psychology. LIWC is a bag-of-words approach that
scans text to identify the percentage of the text comprising 67 categories, ranging from style (e.g.,
function words) to content (e.g., negativity).
Analysis Approach
The resulting data were compiled at the speaker level. This created a large multivariate dataset
with a range of information on each debater: proportional use of each LIWC category (for all turns
within each debate), perceived attractiveness, group membership (“for” or “against"), and debate
outcome (∆V). We used these data for exploratory analyses using regression and dimensionality-
reduction approaches.
Results
Our analyses utilized two approaches. The first examined how targeted linguistic and
nonlinguistic factors influence persuasiveness (∆V) with a linear mixed-effects model. The next
Debate in the Wild - !8
leveraged a range of linguistic factors to identify language patterns that affect opinion change. All
analyses were performed in R (R Core Development Team, 2008) and are freely available (http://
www.github.com/a-paxton/debate-in-the-wild). For convenience, we number each model (M1-M4).
M1: Targeted Linguistic and Personal Effects
Our first analysis was a linear mixed-effects model predicting ∆V with our targeted variables:
attractiveness, hedging (tentat LIWC category), first-person plural pronouns (we LIWC category),
negativity (negemo LIWC category), group, and all two-way interactions with group. The model
included debate winner (not ∆V) as the sole random intercept with maximally permitted random slope. 1
All variables were first centered and standardized, allowing estimates to be interpreted as effect sizes
(Keith, 2008). The model was created using the lme4 package (Bates, Maechler, Bolker, & Walker, 2
2015).
Our targeted model’s results, interestingly, found only a significant main effect of attractiveness
(see Table 2). Debaters rated as more attractive—regardless of their own group—were associated with a
greater net gain in “for” group votes. This result ran counter to our expectations: Rather than seeing all
debaters benefit from their own attractiveness, the attractiveness of “against” group debaters instead
backfires against them.
We also found a trend toward an interaction between group and attractiveness (see Fig. 1). This
suggested that more attractive “against” group debaters were associated with more positive ∆Vs than
even “for” group debaters, although the trend did not reach significance. No other effects reached
Models with debate and/or speaker as additional maximally specified random intercepts were not able to 1
converge due to excessive function evaluations (essentially, a measure of model complexity).
Unstandardized results are available in the R markdowns for the project. 2
Debate in the Wild - !9
Figure 1. Interactions for the effects of each of the four targeted effects (attractiveness, hedging,
negativity, and first-person plural pronouns) and group membership (green=“for”, red=“against”) on
outcome (∆V).
Table 2. Results for standardized model (M1) predicting ∆V with targeted linguistic and personal
variables (first-person plural pronouns, abbreviated as FPP pronouns; attractiveness; negativity; and
hedging), group membership, and all two-way interaction terms. Marginal R2=.07; conditional R2=.35.
Significant and trending effects marked: .=p<.10; *=p<.05; **=p<.01; ***=p<.001.
Effect Estimate t p *Group 0.57 1.5 0.13Hedging 0.07 0.48 0.63FPP Pronouns 0.10 0.69 0.49Attractiveness 0.36 2.25 0.02 *Negativity -0.18 -1.18 0.24Group x Hedging 0.04 0.16 0.87Group x FPP Pronouns -0.28 -1.6 0.11Group x Attractiveness -0.57 -1.83 0.07 .Group x Negativity 0.07 0.4 0.69
Debate in the Wild - !10
significance. Using the piecewiseSEM (Lefcheck, 2015) package, fixed effects accounted for 7% of
the variance in ∆V (marginal R2=.07); fixed and random effects together accounted for 35% (conditional
R2=.35).
Broader Language Effects
We then examined debaters’ broader language patterns by leveraging proportional use of all
LIWC categories. For these models (M2-M3), two-thirds of the dataset (n=220) were randomly chosen
as a training set, and the remainder (n=110) served as the test set.
M2: Predicting Group Membership. Our second model tested whether group membership
could be detected by language alone. We trained a C-classification support vector machine (SVM) with
10-fold cross-validation using the e1071 package (Meyer et al., 2015). The SVM classified binary
group membership with all 67 LIWC categories.
Performance was evaluated through accuracy (proportion of accurate classifications), sensitivity
(proportion of true positives to all positives; “against” group), and specificity (proportion of true
negatives to all negatives; “for” group) on the test set. Our model had an accuracy of .45, sensitivity of
.57, and specificity of .31. This relatively poor performance suggested that debaters of both groups
tended to use similar language patterns.
M3: Predicting Outcome. Our third analysis was an ε-regression SVM with 10-fold cross-
validation, predicting ∆V with (1) the main terms for group and all 67 LIWC categories and (2) the two-
way interaction terms of group by each LIWC category. After training, model fit on the test set was
evaluated with a linear model predicting the speaker’s actual ∆V with each predicted ∆V. The model
trended toward but did not reach statistical significance, F(1, 108), adjusted R2=.02, B=.05, p=.06.
Debate in the Wild - !11
M4: Clustered Language Effects
The prior model (M3) suggested that individual linguistic effects may influence the debate
outcomes but did not reach significance. The fourth analysis took an even broader view of language by
reducing the dimensionality of the data, allowing us to identify the linguistic clusters that influence
persuasion. To do so, we entropy-transformed and then performed SVD (singular value decomposition)
over group, LIWC, and group-by-LIWC terms (cf. Landauer & Dumais, 1998).
We then explored model structures that maximized model-over-model gains in adjusted R2 while
minimizing the number of included components. We ran consecutive linear models predicting ∆V with
the first 1-50 components, adding a single component each time. Our chosen model included the first 13
components and accounted for 13% of the variance in ∆V (adjusted R2=.13), F(13, 316)=4.86.
Components 6, 8, and 13 significantly predicted ∆V (see Fig. 2), while components 3 and 5 trended
toward significance (see Fig. 3). Below, each significant or trending component is discussed. Table 3
presents all model results.
Table 3. Results for model (M4) predicting ∆V with clustered (i.e., reduced-dimensionality) LIWC,
group, and group-by-LIWC variables. Adjusted R2=.13, F(13, 316). Significant and trending effects
marked: .=p<.10; *=p<.05; **=p<.01; ***=p<.001.
Component Estimate t p *1 -1866 -1.51 0.122 -469.9 -1.54 0.123 95.09 1.76 0.08 .4 0.06 0.001 15 -36.38 -1.76 0.08 .6 -51.89 -2.62 0.009 **7 2.94 0.15 0.888 83.33 4.15 < .0001 ***9 4.86 0.25 0.8110 8.33 0.4 0.6911 -26.23 -1.325 0.1912 24 1.19 0.2313 101 4.92 < .0001 ***
Debate in the Wild - !12
Relations. Component 6—the “relations” component—was associated with lower ∆V. Its five
strongest contributors were relativity-by-group, relativity, spatial-by-group, personal pronoun-by-group,
and first-person pronoun-by-group. It encompassed relations across time, space, and people.
Health. Component 8 significantly and positively predicted ∆V. We termed this the “health”
component. It loaded most strongly on biological processes-by-group, health-by-group, biological
processes, relativity-by-group, and health.
Social. Component 13 was significantly linked to an increase in ∆V. We categorized this as the
“social” component. Its largest factors were social-by-group, second-person pronoun-by-group, social,
present time-by-group, and time-by-group.
Assent. Component 3 showed a trend toward a positive effect on ∆V. We called this the “assent”
category because of the outsized effects of assent and assent-by-group factors. The remaining three of
Figure 2. The effects on success (∆V) of all significant components (plotted as a mean split for
visualization only) by group membership (green=“for”, red=“against”).
Debate in the Wild - !13
five top factors were smaller by nearly half of assent-by-group and nearly one-fourth of assent:
dictionary (i.e., any words included in LIWC), dictionary-by-group, and the group main term.
Emotion. Component 5—the “emotion” component—was not significant but trended toward a
negative effect on ∆V. Interestingly, each of the top five factors of the component were interaction terms
with group: negative emotion-by-group, cognitive mechanisms-by-group, anger-by-group, six-letter-
words-by-group (i.e., words equal to or greater than 6 letters), and affect-by-group.
Discussion
Communication is richly adaptive: It can build rapport and facilitate teamwork, but it can also
reflect conflict and change minds. While experimental studies have long focused on the former, today’s
ready availability of real-world data can shed new light on the latter. In that vein, we here explored a
novel corpus of debate transcripts to uncover how persuasion unfolds “in the wild.”
Figure 3. The effects on success (∆V) of all trending components (plotted as a mean split for
visualization only) by group membership (green=“for”, red=“against”).
Debate in the Wild - !14
To do so, we combined two approaches. First, we built a targeted analysis using expectations
from diverse perspectives on communication and persuasion. Next, data-driven explorations identified
how clusters of individual linguistic effects impact persuasiveness. Each provided unique insights into
this large-scale naturalistic corpus.
Our targeted analysis (M1) examined the effects of negativity, attractiveness, hedging, and first-
person personal pronouns on persuasiveness. We found only a significant effect of attractiveness: The
more attractive the debater, the more likely that the “for” group would win more votes—even if the
debater was part of the “against” group. This ran counter to our expectations, as we hypothesized that
all debaters should receive a boost in persuasiveness based on their attractiveness (e.g., Berggren et al.,
2010).
Our second group of analyses explored patterns across all LIWC variables. Interestingly,
debaters across groups used similar language patterns (M2), but as with attractiveness, some linguistic
patterns led one group to benefit more than the other. Modeling clusters of individual linguistic effects in
dimensionality-reduced space (M4), we found that three language clusters—relations, health, and social
components—were significantly associated with persuasiveness. Increased overall use of the health and
social components predicted an increase in net votes for the “for” group, while increased overall use of
the relations component was associated with an increase in net votes for the “against” group.
Our results suggest that the framing of a debate may significantly affect the ways in which
debaters’ arguments are perceived. Being “for” something appears to make an attractive debater more
persuasive, while the effect backfires for debaters who are “against” it. Debaters who are “against”
something are helped by an emphasis on relations (across space, time, and people), while debaters who
Debate in the Wild - !15
are “for” something see a boost by emphasizing health and social topics. Both effects are reversed for
members of the opposite team.
Again, these results were derived from nearly 70 debates with a range of topics, making it
unlikely that the idiosyncrasies of a single debate or debater drove the effects. The significant (relations,
health, and social) and trending (assent and emotion) components are central to daily life, likely making
them applicable to either side of any debate.
Although our dimensionality-reduced analyses revealed significant effects of language clusters,
our analysis of the (much sparser) raw LIWC data only trended toward significance (M3). Given the
richness of debate as a discourse activity, it may be that targeting specific language effects (M1 and M3)
cannot capture the clustered effects (M4) that form larger-scale language dynamics. We see our results as
supporting the idea that we can continue to unravel the complex interdependencies of human
communication through real-world datasets.
Conclusion
The explosion of freely available online data gives us the opportunity to explore natural language
“in the wild.” Building on these efforts, we introduce and analyze a novel corpus of publicly available
debate transcripts to explore language and persuasion outside the lab. We find that attractiveness and
various high-level language dynamics do not uniformly impact debaters’ success: Instead, these effects
are shaped by the framing of the debate, helping or hurting debaters based on whether they have been
cast as “for” or “against” the topic in the Oxford-style debate. By analyzing over 300 debaters across
nearly 70 debates, we can extract potential insight into the effects of language use and framing on
persuasion while abstracting away from idiosyncratic patterns of any one topic. Our findings support a
Debate in the Wild - !16
view of human reasoning as context sensitive (Mercier & Sperber, 2011) and provide interesting
implications for political and philosophical arenas.
Acknowledgments
Thanks to Jeff Yoshimi (University of California, Merced) for insight into argumentation theory
and feedback on an earlier version; to Stephanie Huette (University of Memphis), Teenie Matlock
(University of California, Merced), and Michael Spivey (University of California, Merced) for feedback
on earlier versions; and to research assistants Chelsea Coe, Alex Lau, Nicholas Rodriguez, and Amanda
Varela for help in data preparation.
Debate in the Wild - !17
References
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using
lme4. Journal of Statistical Software, 67(1), 1-48.
Berggren, N., Jordahl, H., & Poutvaara, P. (2010). The looks of a winner: Beauty and electoral success.
Journal of Public Economics, 94(1), 8–15.
Blair, J. A., & Tindale, C. W. (2011). Groundwork in the theory of argumentation: Selected papers of J.
Anthony Blair. Heidelberg: Springer.
Budesheim, T. L., & DePaola, S. J. (1994). Beauty or the beast? The effects of appearance, personality,
and issue information on evaluations of political candidates. Personality and Social Psychology
Bulletin, 20(4), 339–348.
Burgoon, M., Miller, M. D., Cohen, M., & Montgomery, C. L. (1978). An empirical test of a model of
resistance to persuasion. Human Communication Research, 5(1), 27–39.
Cohen, R. (1987). Analyzing the structure of argumentative discourse. Computational Linguistics,
13(1-2), 11–24.
Dafouz-Milne, E. (2008). The pragmatic role of textual and interpersonal metadiscourse markers in the
construction and attainment of persuasion: A cross-linguistic study of newspaper discourse.
Journal of Pragmatics, 40(1), 95–113.
Dung, P. M. (1995). On the acceptability of arguments and its fundamental role in nonmonotonic
reasoning, logic programming and n-person games. Artificial Intelligence, 77(2), 321–357.
Fuertes-Olivera, P. A., Velasco-Sacristán, M., Arribas-Baño, A., & Samaniego-Fernández, E. (2001).
Persuasion and advertising English: Metadiscourse in slogans and headlines. Journal of
Pragmatics, 33(8), 1291–1307.
Debate in the Wild - !18
Garramone, G. M. (1984). Voter responses to negative political ads. Journalism & Mass Communication
Quarterly, 61(2), 250–259.
Geer, J. G. (2008). In defense of negativity: Attack ads in presidential campaigns. Chicago, IL:
University of Chicago Press.
Jacobs, S. (2000). Rhetoric and dialectic from the standpoint of normative pragmatics. Argumentation,
14(3), 261–286.
Kahn, K. F., & Kenney, P. J. (1999). Do negative campaigns mobilize or suppress turnout? Clarifying
the relationship between negativity and participation. American Political Science Review, 93(4),
877–889.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis.
Discourse Processes, 25(2-3), 259–284.
Lefcheck, J. S. (2015). piecewiseSEM: Piecewise structural equation modeling in R for ecology,
evolution, and systematics. Methods in Ecology and Evolution, 7(5): 573-579.
Matlock, T. (2012). Framing political messages with grammar and metaphor. American Scientist, 100,
478-483.
Mercier, H., & Sperber, D. (2011). Why do humans reason? Arguments for an argumentative theory.
Behavioral and Brain Sciences, 34(2), 57–74.
Meyer, D., Dimitriadou, E., Hornik, K., Weingessel, A., and Leisch, F. (2015). e1071: Misc functions of
the Department of Statistics, Probability Theory Group: R package version 1.6-7 [Computer
software]. TU Wien: https://CRAN.R-project.org/package=e1071.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2007). Linguistic inquiry and word count: LIWC
[Computer software]. Austin, TX: liwc.net.
Debate in the Wild - !19
Roddy, B. L., & Garramone, G. M. (1988). Appeals and strategies of negative political advertising.
Journal of Broadcasting & Electronic Media, 32(4), 415–427.
van Eemeren, F. H. (1995). A world of difference: The rich state of argumentation theory. Informal
Logic, 17(2), 144–158.
van Eemeren, F. H., Garssen, B., Verheij, B., Krabbe, E. C. W., Henkemans, A. F. S., & Wagemans, J. H.
M. (2014). Handbook of argumentation theory. New York: Springer.
Yoshimi, J. (2004). Mapping the structure of debate. Informal Logic, 24(1), 1–21.
Debate in the Wild - !20
Appendix: Debater Ratings
Materials
Headshots of 357 debaters from 80 debates were downloaded from IQ2US (http://
intelligencesquaredus.org/debates/past-debates). Headshots were divided across 10 online surveys,
keeping all debaters within a debate in the same survey. Each survey presented 30-40 headshots
(M=39.3) to participants in random order. Headshots were 90-110 pixels (H) by 70-100 pixels (W).
Some debaters participated in multiple debates; these debaters’ headshot were presented only once in the
survey.
Participants rated headshots on 7 dimensions in random order. Five dimensions were presented
with 5 choices, with verbal anchors from Disagree strongly to Agree strongly: attractive, biased, hostile,
intelligent, and trustworthy. Two dimensions asked participants to rate along liberal/conservative and
expert/novice 5-point spectra (Looks very/somewhat X, Looks neither X nor Y, Looks somewhat/very Y).
Participants were not given information about the debater, debate(s), or specific opinion(s).
Participants
In return for credit, 497 unique participants completed the surveys through the online subject
pool system. Each participant completed an average of 1.9 individual surveys (range=1-9). Each survey
was completed by 91-97 participants (M=95). Participants were undergraduates at the University of
California, Merced and ranged in age (18-37; M=19 years), gender (females=324; males=155;
declined=2), and ethnicity (Asian=119; African-American=33; Caucasian=41; Hispanic=237; other=51).
The study was approved by the UC Merced Institutional Review Board.