Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

53
Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007

Transcript of Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Page 1: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Retrieval by Authority

Artificial Intelligence

CMSC 25000

February 1, 2007

Page 2: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Roadmap

• Problem: – Matching Topics and Documents

• Challenge I: Beyond literal matching– Expansion Strategies

• Challenge II: Authoritative source– Hubs & Authorities– Page Rank

Page 3: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Key Issue

• All approaches operate on term matching– If a synonym, rather than original term, is used,

approach fails

• Develop more robust techniques– Match “concept” rather than term

• Expansion approaches– Add in related terms to enhance matching

• Mapping techniques– Associate terms to concepts

» Aspect models, stemming

Page 4: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Expansion Techniques

• Can apply to query or document

• Thesaurus expansion– Use linguistic resource – thesaurus, WordNet

– to add synonyms/related terms

• Feedback expansion– Add terms that “should have appeared”

• User interaction– Direct or relevance feedback

• Automatic pseudo relevance feedback

Page 5: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Query Refinement

• Typical queries very short, ambiguous– Cat: animal/Unix command– Add more terms to disambiguate, improve

• Relevance feedback– Retrieve with original queries– Present results

• Ask user to tag relevant/non-relevant

– “push” toward relevant vectors, away from nr

– β+γ=1 (0.75,0.25); r: rel docs, s: non-rel docs– “Roccio” expansion formula

S

kk

R

jjii sS

rR

qq11

1

Page 6: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Compression Techniques

• Reduce surface term variation to concepts• Stemming

– Map inflectional variants to root• E.g. see, sees, seen, saw -> see• Crucial for highly inflected languages – Czech, Arabic

• Aspect models– Matrix representations typically very sparse– Reduce dimensionality to small # key aspects

• Mapping contextually similar terms together• Latent semantic analysis

Page 7: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Authoritative Sources

• Based on vector space alone, what would you expect to get searching for “search engine”?– Would you expect to get Google?

Page 8: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Issue

Text isn’t always best indicator of content

Example:

• “search engine” – Text search -> review of search engines

• Term doesn’t appear on search engine pages• Term probably appears on many pages that point

to many search engines

Page 9: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Hubs & Authorities

• Not all sites are created equal– Finding “better” sites

• Question: What defines a good site?– Authoritative– Not just content, but connections!

• One that many other sites think is good• Site that is pointed to by many other sites

– Authority

Page 10: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Conferring Authority

• Authorities rarely link to each other– Competition

• Hubs:– Relevant sites point to prominent sites on topic

• Often not prominent themselves• Professional or amateur

• Good Hubs Good Authorities

Page 11: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Computing HITS

• Finding Hubs and Authorities

• Two steps:– Sampling:

• Find potential authorities

– Weight-propagation:• Iteratively estimate best hubs and authorities

Page 12: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Sampling

• Identify potential hubs and authorities– Connected subsections of web

• Select root set with standard text query

• Construct base set:– All nodes pointed to by root set– All nodes that point to root set

• Drop within-domain links

– 1000-5000 pages

Page 13: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Weight-propagation

• Weights:– Authority weight: – Hub weight:

• All weights are relative

• Updating:

• Converges • Pages with high x: good authorities; y: good hubs

pxpy

qptqsqp

pqtqsqp

xy

yx

..

..

Page 14: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Weight Propagation

• Create adjacency matrix A– Ai,j = 1 if i links to j, o.w. 0

• Create vectors x and y of corresponding values

• Converges to principal eigenvector

yAAyAAAxy

xAAAxAyAx

AxyyAx

TT

TTT

T

)(

)(

;

Page 15: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Google’s PageRank

• Identifies authorities– Important pages are those pointed to by many other

pages• Better pointers, higher rank

– Ranks search results

– t: page pointing to A; C(t): number of outbound links• d: damping measure

– Actual ranking on logarithmic scale– Iterate

))(/)(...)(/)(()1()( 11 nn tCtprtCtprddApr

Page 16: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Contrasts

• Internal links– Large sites carry more weight

• If well-designed

– H&A ignores site-internals

• Outbound links explicitly penalized

• Lots of tweaks….

Page 17: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Web Search

• Search by content– Vector space model

• Word-based representation• “Aboutness” and “Surprise”• Enhancing matches• Simple learning model

• Search by structure– Authorities identified by link structure of web

• Hubs confer authority

Page 18: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Learning: Perceptrons

Artificial Intelligence

CMSC 25000

February 1, 2007

Page 19: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Agenda

• Neural Networks:– Biological analogy

• Perceptrons: Single layer networks• Perceptron training• Perceptron convergence theorem• Perceptron limitations

• Conclusions

Page 20: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neurons: The Concept

Axon

Cell Body

Nucleus

Dendrites

Neurons: Receive inputs from other neurons (via synapses) When input exceeds threshold, “fires”

Sends output along axon to other neuronsBrain: 10^11 neurons, 10^16 synapses

Page 21: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Artificial Neural Nets

• Simulated Neuron:– Node connected to other nodes via links

• Links = axon+synapse+link• Links associated with weight (like synapse)

– Multiplied by output of node

– Node combines input via activation function• E.g. sum of weighted inputs passed thru threshold

• Simpler than real neuronal processes

Page 22: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Artificial Neural Net

x

x

x

w

w

w

Sum Threshold +

Page 23: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptrons

• Single neuron-like element– Binary inputs– Binary outputs

• Weighted sum of inputs > threshold

Page 24: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Structure

x0=1 x1 x3x2 xn

w1

w0

. . .

w2w3

wn

y

otherwise 0

0 if 10i

n

iixwy

x0 w0 compensates for threshold

Page 25: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Convergence Procedure

• Straight-forward training procedure– Learns linearly separable functions

• Until perceptron yields correct output for all– If the perceptron is correct, do nothing– If the percepton is wrong,

• If it incorrectly says “yes”, – Subtract input vector from weight vector

• Otherwise, add input vector to weight vector

Page 26: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Convergence Example

• LOGICAL-OR:• Sample x1 x2 x3 Desired Output• 1 0 0 1 0• 2 0 1 1 1• 3 1 0 1 1• 4 1 1 1 1

• Initial: w=(0 0 0);After S2, w=w+s2=(0 1 1)• Pass2: S1:w=w-s1=(0 1 0);S3:w=w+s3=(1 1 1)

• Pass3: S1:w=w-s1=(1 1 0)

Page 27: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Convergence Theorem

• If there exists a vector W s.t. • Perceptron training will find it

• Assume

for all +ive examples x

• ||w||^2 increases by at most ||x||^2, in each iteration• ||w+x||^2 <= ||w||^2+||x||^2 <=k ||x||^2• v.w/||w|| > <= 1

Converges in k <= O steps

kwvxxxw k

,...21

otherwise 0

0 if 10i

n

iixwy

xv

kxk / 2/1

Page 28: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Learning

• Perceptrons learn linear decision boundaries• E.g.

+ ++

+ + +

0

00

000

0

x1

x2

But not

x2

x1

+

+ 0

0

xor

X1 X2 -1 -1 w1x1 + w2x2 < 01 -1 w1x1 + w2x2 > 0 => implies w1 > 01 1 w1x1 + w2x2 >0 => but should be false-1 1 w1x1 + w2x2 > 0 => implies w2 > 0

Page 29: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Example

• Digit recognition– Assume display= 8 lightable bars– Inputs – on/off + threshold – 65 steps to recognize “8”

Page 30: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Perceptron Summary

• Motivated by neuron activation

• Simple training procedure

• Guaranteed to converge – IF linearly separable

Page 31: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Nets

• Multi-layer perceptrons– Inputs: real-valued– Intermediate “hidden” nodes– Output(s): one (or more) discrete-valued

X1

X2

X3

X4

Inputs Hidden Hidden Outputs

Y1

Y2

Page 32: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Nets

• Pro: More general than perceptrons– Not restricted to linear discriminants– Multiple outputs: one classification each

• Con: No simple, guaranteed training procedure– Use greedy, hill-climbing procedure to train– “Gradient descent”, “Backpropagation”

Page 33: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Solving the XOR Problem

x1w13

w11

w21

o2

o1

w12

y

w03w22

-1x2

w23

w02

-1

w01

-1

NetworkTopology:2 hidden nodes1 output

Desired behavior:x1 x2 o1 o2 y 0 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 1 1 1 0

Weights:w11= w12=1w21=w22 = 1w01=3/2; w02=1/2; w03=1/2w13=-1; w23=1

Page 34: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Applications

• Speech recognition

• Handwriting recognition

• NETtalk: Letter-to-sound rules

• ALVINN: Autonomous driving

Page 35: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

ALVINN

• Driving as a neural network• Inputs:

– Image pixel intensities• I.e. lane lines

• 5 Hidden nodes• Outputs:

– Steering actions• E.g. turn left/right; how far

• Training:– Observe human behavior: sample images, steering

Page 36: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Backpropagation

• Greedy, Hill-climbing procedure– Weights are parameters to change– Original hill-climb changes one

parameter/step• Slow

– If smooth function, change all parameters/step

• Gradient descent– Backpropagation: Computes current output, works

backward to correct error

Page 37: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Producing a Smooth Function

• Key problem: – Pure step threshold is discontinuous

• Not differentiable

• Solution: – Sigmoid (squashed ‘s’ function): Logistic fn

n

izii e

zsxwz1

1)(

Page 38: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Training

• Goal: – Determine how to change weights to get

correct output• Large change in weight to produce large reduction

in error

• Approach:• Compute actual output: o• Compare to desired output: d• Determine effect of each weight w on error = d-o• Adjust weights

Page 39: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Exampley3

w03

w23

z3

z2

w02w22

w21

w12w11

w01

z1

-1

-1 -1x1 x2

w13 y1 y2 i

ii wxFyE 2* )),((2

1

xi : ith sample input vectorw : weight vector yi*: desired output for ith sample

Sum of squares error over training samples

))()((),( 03022221122301221111133 wwxwxwswwxwxwswswxFy

z3

z1 z2

Full expression of output in terms of input and weights

-

From 6.034 notes lozano-perez

Page 40: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Gradient Descent

• Error: Sum of squares error of inputs with current weights

• Compute rate of change of error wrt each weight– Which weights have greatest effect on error?– Effectively, partial derivatives of error wrt

weights• In turn, depend on other weights => chain rule

Page 41: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Gradient Descent

• E = G(w)– Error as function of

weights

• Find rate of change of error– Follow steepest rate of

change– Change weights s.t. error

is minimized

E

w

G(w)

dGdw

Localminima

w0w1

Page 42: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

MIT AI lecture notes, Lozano-Perez 2000

Gradient of Error

))()((),( 03022221122301221111133 wwxwxwswwxwxwswswxFy

z3

z1 z2

ji

j w

yyy

w

E

3

3* )(

13

31

3

3

13

3

3

3

13

3 )()(

)()(y

z

zszs

z

zs

w

z

z

zs

w

y

11

113

3

3

11

1

1

113

3

3

11

3

3

3

11

3 )()()()()(x

z

zsw

z

zs

w

z

z

zsw

z

zs

w

z

z

zs

w

y

i

ii wxFyE 2* )),((2

1

y3

w03

w23

z3

z2

w02w22

w21

w12w11

w01

z1

-1

-1 -1x1 x2

w13 y1 y2Note: Derivative of sigmoid:ds(z1) = s(z1)(1-s(z1)) dz1

-

From 6.034 notes lozano-perez

Page 43: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

From Effect to Update

• Gradient computation:– How each weight contributes to performance

• To train:– Need to determine how to CHANGE weight

based on contribution to performance– Need to determine how MUCH change to make

per iteration• Rate parameter ‘r’

– Large enough to learn quickly

– Small enough reach but not overshoot target values

Page 44: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Backpropagation Procedure

• Pick rate parameter ‘r’

• Until performance is good enough,– Do forward computation to calculate output– Compute Beta in output node with

– Compute Beta in all other nodes with

– Compute change for all weights with

zzz od

k

kkkkjj oow )1(

jjjiji oorow )1(

i j kjiw kjw

io jo)1( jj oo )1( kk oo

Page 45: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Backprop Exampley3

w03

w23

z3

z2

w02w22

w21

w12w11

w01

z1

-1

-1 -1x1 x2

w13 y1 y2

)( 3*33 yy

233332 )1( wyy

133331 )1( wyy

Forward prop: Compute zi and yi given xk, wl

)1()1( 3330303 yryww)1()1( 2220202 yryww

)1()1( 1110101 yryww

33311313 )1( yyryww

22211212 )1( yyrxww

11111111 )1( yyrxww

33322323 )1( yyryww

22222222 )1( yyrxww 11122121 )1( yyrxww

Page 46: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Backpropagation Observations

• Procedure is (relatively) efficient– All computations are local

• Use inputs and outputs of current node

• What is “good enough”?– Rarely reach target (0 or 1) outputs

• Typically, train until within 0.1 of target

Page 47: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Summary

• Training:– Backpropagation procedure

• Gradient descent strategy (usual problems)

• Prediction:– Compute outputs based on input vector & weights

• Pros: Very general, Fast prediction• Cons: Training can be VERY slow (1000’s of

epochs), Overfitting

Page 48: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Training Strategies

• Online training:– Update weights after each sample

• Offline (batch training):– Compute error over all samples

• Then update weights

• Online training “noisy”– Sensitive to individual instances– However, may escape local minima

Page 49: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Training Strategy

• To avoid overfitting:– Split data into: training, validation, & test

• Also, avoid excess weights (less than # samples)

• Initialize with small random weights– Small changes have noticeable effect

• Use offline training – Until validation set minimum

• Evaluate on test set – No more weight changes

Page 50: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Classification

• Neural networks best for classification task– Single output -> Binary classifier– Multiple outputs -> Multiway classification

• Applied successfully to learning pronunciation

– Sigmoid pushes to binary classification• Not good for regression

Page 51: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Example

• NETtalk: Letter-to-sound by net• Inputs:

– Need context to pronounce• 7-letter window: predict sound of middle letter• 29 possible characters – alphabet+space+,+.

– 7*29=203 inputs

• 80 Hidden nodes• Output: Generate 60 phones

– Nodes map to 26 units: 21 articulatory, 5 stress/sil• Vector quantization of acoustic space

Page 52: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Example: NETtalk

• Learning to talk:– 5 iterations/1024 training words: bound/stress– 10 iterations: intelligible– 400 new test words: 80% correct

• Not as good as DecTalk, but automatic

Page 53: Retrieval by Authority Artificial Intelligence CMSC 25000 February 1, 2007.

Neural Net Conclusions

• Simulation based on neurons in brain

• Perceptrons (single neuron)– Guaranteed to find linear discriminant

• IF one exists -> problem XOR

• Neural nets (Multi-layer perceptrons)– Very general– Backpropagation training procedure

• Gradient descent - local min, overfitting issues