Multilayer Networks

15
Multilayer Networks Machine Learning: Jordan Boyd-Graber University of Maryland INTRODUCTION Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 1/9

Transcript of Multilayer Networks

Page 1: Multilayer Networks

Multilayer Networks

Machine Learning: Jordan Boyd-GraberUniversity of MarylandINTRODUCTION

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 1 / 9

Page 2: Multilayer Networks

Deep Learning was once known as “Neural Networks”

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 2 / 9

Page 3: Multilayer Networks

But it came back . . .

Political Ideology Detection Using Recursive Neural Networks

Mohit Iyyer1, Peter Enns2, Jordan Boyd-Graber3,4, Philip Resnik2,4

1Computer Science, 2Linguistics, 3iSchool, and 4UMIACSUniversity of Maryland

{miyyer,peter,jbg}@umiacs.umd.edu, [email protected]

Abstract

An individual’s words often reveal their po-litical ideology. Existing automated tech-niques to identify ideology from text focuson bags of words or wordlists, ignoring syn-tax. Taking inspiration from recent work insentiment analysis that successfully modelsthe compositional aspect of language, weapply a recursive neural network (RNN)framework to the task of identifying the po-litical position evinced by a sentence. Toshow the importance of modeling subsen-tential elements, we crowdsource politicalannotations at a phrase and sentence level.Our model outperforms existing models onour newly annotated dataset and an existingdataset.

1 Introduction

Many of the issues discussed by politicians andthe media are so nuanced that even word choiceentails choosing an ideological position. For ex-ample, what liberals call the “estate tax” conser-vatives call the “death tax”; there are no ideolog-ically neutral alternatives (Lakoff, 2002). Whileobjectivity remains an important principle of jour-nalistic professionalism, scholars and watchdoggroups claim that the media are biased (Grosecloseand Milyo, 2005; Gentzkow and Shapiro, 2010;Niven, 2003), backing up their assertions by pub-lishing examples of obviously biased articles ontheir websites. Whether or not it reflects an under-lying lack of objectivity, quantitative changes in thepopular framing of an issue over time—favoringone ideologically-based position over another—canhave a substantial effect on the evolution of policy(Dardis et al., 2008).

Manually identifying ideological bias in polit-ical text, especially in the age of big data, is animpractical and expensive process. Moreover, bias

They dubbed it

the

death tax“ ” and created a big lie about

its adverse effectson small

businesses

Figure 1: An example of compositionality in ideo-logical bias detection (red ! conservative, blue !liberal, gray ! neutral) in which modifier phrasesand punctuation cause polarity switches at higherlevels of the parse tree.

may be localized to a small portion of a document,undetectable by coarse-grained methods. In this pa-per, we examine the problem of detecting ideologi-cal bias on the sentence level. We say a sentencecontains ideological bias if its author’s politicalposition (here liberal or conservative, in the senseof U.S. politics) is evident from the text.

Ideological bias is difficult to detect, even forhumans—the task relies not only on politicalknowledge but also on the annotator’s ability topick up on subtle elements of language use. Forexample, the sentence in Figure 1 includes phrasestypically associated with conservatives, such as“small businesses” and “death tax”. When we takemore of the structure into account, however, wefind that scare quotes and a negative propositionalattitude (a lie about X) yield an evident liberal bias.

Existing approaches toward bias detection havenot gone far beyond “bag of words” classifiers, thusignoring richer linguistic context of this kind andoften operating at the level of whole documents.In contrast, recent work in sentiment analysis hasused deep learning to discover compositional ef-fects (Socher et al., 2011b; Socher et al., 2013b).

Building from those insights, we introduce a re-cursive neural network (RNN) to detect ideologicalbias on the sentence level. This model requires

� More data

� Better tricks(regularization)

� Faster computers

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 3 / 9

Page 4: Multilayer Networks

And companies are investing . . .

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 4 / 9

Page 5: Multilayer Networks

And companies are investing . . .

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 4 / 9

Page 6: Multilayer Networks

And companies are investing . . .

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 4 / 9

Page 7: Multilayer Networks

Map inputs to output

Input

Vector x1 . . . xd

Output

f

i

Wixi +b

Activation

f (z)≡1

1+exp(−z)

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 5 / 9

Page 8: Multilayer Networks

Map inputs to output

Input

Vector x1 . . . xd

inputs encoded asreal numbers

Output

f

i

Wixi +b

Activation

f (z)≡1

1+exp(−z)

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 5 / 9

Page 9: Multilayer Networks

Map inputs to output

Input

Vector x1 . . . xd

Output

f

i

Wixi +b

multiply inputs byweights

Activation

f (z)≡1

1+exp(−z)

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 5 / 9

Page 10: Multilayer Networks

Map inputs to output

Input

Vector x1 . . . xd

Output

f

i

Wixi +b

add bias

Activation

f (z)≡1

1+exp(−z)

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 5 / 9

Page 11: Multilayer Networks

Map inputs to output

Input

Vector x1 . . . xd

Output

f

i

Wixi +b

Activation

f (z)≡1

1+exp(−z)

pass throughnonlinear sigmoid

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 5 / 9

Page 12: Multilayer Networks

Why is it called activation?

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 6 / 9

Page 13: Multilayer Networks

In the shallow end

� This is still logistic regression

� Engineering features x is difficult (and requires expertise)

� Can we learn how to represent inputs into final decision?

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 7 / 9

Page 14: Multilayer Networks

Better name: non-linearity

� Logistic / Sigmoid

f (x) =1

1+e−x(1)

� tanh

f (x) = tanh(x) =2

1+e−2x−1 (2)

� ReLU

f (x) =

0 for x < 0x for x ≥ 0

(3)

� SoftPlus: f (x) = ln(1+ex)

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 8 / 9

Page 15: Multilayer Networks

But it is not perfect

� Compare against baselines: randomized features, nearest-neighbors,linear models

� Optimization is hard (alchemy)

� Models are often not interpretable

� Requires specialized hardware and tons of data to scale

Machine Learning: Jordan Boyd-Graber | UMD Multilayer Networks | 9 / 9