Reinforcement Learning - Indian Institute of Technology...
Transcript of Reinforcement Learning - Indian Institute of Technology...
![Page 1: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/1.jpg)
Reinforcement Learning A (almost)quick(and very incomplete) introduction
Slides from David Silver, Dan Klein, Mausam, Dan Weld
![Page 2: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/2.jpg)
![Page 3: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/3.jpg)
Reinforcement Learning
At each time step t:
• Agent executes an action At
• Environment emits a reward Rt
• Agent transitions to state St
![Page 4: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/4.jpg)
Rat Example
![Page 5: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/5.jpg)
Rat Example
• What if agent state = last 3 items in sequence?
![Page 6: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/6.jpg)
Rat Example
• What if agent state = last 3 items in sequence?
• What if agent state = counts for lights, bells and levers?
![Page 7: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/7.jpg)
Rat Example
• What if agent state = last 3 items in sequence?
• What if agent state = counts for lights, bells and levers?
• What if agent state = complete sequence?
![Page 8: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/8.jpg)
Major Components of RL
An RL agent may include one or more of these components:
• Policy: agent’s behaviour function
• Value function: how good is each state and/or action
• Model: agent’s representation of the environment
![Page 9: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/9.jpg)
Policy
• A policy is the agent’s behaviour
• It is a map from state to action
• Deterministic policy: a = π(s)
• Stochastic policy: π(a|s) = P[At = a|St = s]
![Page 10: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/10.jpg)
Value function
• Value function is a prediction of future reward
• Used to evaluate the goodness/badness of states…
• …and therefore to select between actions
![Page 11: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/11.jpg)
Model
• A model predicts what the environment will do next
• It predicts the next state…
• …and predicts the next (immediate) reward
![Page 12: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/12.jpg)
Dimensions of RLModel-based vs. Model-free
• Model-based: Have/learn action
models (i.e. transition probabilities.
• Uses Dynamic Programming
• Model-free: Skip them and directly
learn what action to do when
(without necessarily finding out the
exact model of the action)
• e.g. Q-learning
On Policy vs. Off Policy
• On Policy: Makes estimates based on a
policy, and improves it based on estimates.
• Learning on the job.
• e.g. SARSA
• Off Policy: Learn a policy while following
another (or re-using experience from old
policy).
• Looking over someone's shoulder
• e.g. Q-learning
![Page 13: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/13.jpg)
Markov Decision Process• Set of states S = {si}
• Set of actions for each state A(s) = {asi} (often independent of state)
• Transition model T(s -> s’ | a) = Pr(s’ | a, s)
• Reward model R(s, a, s’)
• Discount factor γ
MDP = <S, A, T, R, γ>
![Page 14: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/14.jpg)
Bellman Equation for Value
Function
![Page 15: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/15.jpg)
Bellman Equation for Action-Value
Function
![Page 16: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/16.jpg)
Q vs V
![Page 17: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/17.jpg)
Exploration vs Exploitation• Restaurant Selection
• Exploitation: Go to your favourite restaurant
• Exploration: Try a new restaurant
• Online Banner Advertisements
• Exploitation: Show the most successful advert
• Exploration: Show a different advert
• Oil Drilling
• Exploitation: Drill at the best known location
• Exploration: Drill at a new location
• Game Playing
• Exploitation: Play the move you believe is best
• Exploration: Play an experimental move
![Page 18: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/18.jpg)
ε-Greedy solution
• Simplest idea for ensuring continual exploration
• All m actions are tried with non-zero probability
• With probability 1 − ε choose the greedy action
• With probability ε choose an action at random
![Page 19: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/19.jpg)
Off Policy Learning• Evaluate target policy π(a|s) to compute vπ(s) or qπ(s,a) while following behaviour
policy μ(a|s)
{s1,a1,r2,...,sT} ∼ μ
• Why is this important?
• Learn from observing humans or other agents
• Re-use experience generated from old policies π1, π2, ..., πt−1
• Learn about optimal policy while following exploratory policy
• Learn about multiple policies while following one policy
![Page 20: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/20.jpg)
Q - Learning
• We now consider off-policy learning of action-values Q(s,a)
• Next action is chosen using behaviour policy At+1 ∼ μ(·|St)
• But we consider alternative successor action A′ ∼ π(·|St)
• And update Q(St,At) towards value of alternative action
![Page 21: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/21.jpg)
Q - Learning
• We now allow both behaviour and target policies to improve
• The target policy π is greedy w.r.t. Q(s,a)
• The behaviour policy μ is e.g. ε-greedy w.r.t. Q(s,a)
• The Q-learning target then simplifies:
![Page 22: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/22.jpg)
Q - Learning
![Page 23: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/23.jpg)
Q - Learning
![Page 24: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/24.jpg)
Deep RL
• We seek a single agent which can solve any human-level task
• RL defines the objective
• DL gives the mechanism
• RL + DL = general intelligence (David Silver)
![Page 25: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/25.jpg)
Function Approximators
![Page 26: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/26.jpg)
Deep Q-Networks
• Q Learning diverges using neural networks due to:
• Correlations between samples
• Non-stationary targets
![Page 27: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/27.jpg)
Solution: Experience Replay
• Fancy biological analogy
• In reality, quite simple
![Page 28: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/28.jpg)
Solution: Experience Replay
![Page 29: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/29.jpg)
Improving Information Extraction by
Acquiring External Evidence with
Reinforcement LearningKarthik Narasimhan, Adam Yala, Regina Barzilay
CSAIL, MIT
Slides from Karthik Narasimhan
![Page 30: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/30.jpg)
![Page 31: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/31.jpg)
![Page 32: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/32.jpg)
![Page 33: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/33.jpg)
![Page 34: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/34.jpg)
![Page 35: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/35.jpg)
![Page 36: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/36.jpg)
Why try to reason, when someone else can do it for you
![Page 37: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/37.jpg)
![Page 38: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/38.jpg)
![Page 39: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/39.jpg)
![Page 40: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/40.jpg)
![Page 41: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/41.jpg)
![Page 42: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/42.jpg)
![Page 43: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/43.jpg)
![Page 44: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/44.jpg)
![Page 45: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/45.jpg)
![Page 46: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/46.jpg)
![Page 47: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/47.jpg)
![Page 48: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/48.jpg)
![Page 49: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/49.jpg)
![Page 50: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/50.jpg)
![Page 51: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/51.jpg)
![Page 52: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/52.jpg)
![Page 53: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/53.jpg)
![Page 54: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/54.jpg)
![Page 55: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/55.jpg)
![Page 56: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/56.jpg)
![Page 57: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/57.jpg)
![Page 58: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/58.jpg)
![Page 59: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/59.jpg)
![Page 60: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/60.jpg)
![Page 61: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/61.jpg)
![Page 62: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/62.jpg)
![Page 63: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/63.jpg)
![Page 64: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/64.jpg)
![Page 65: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/65.jpg)
![Page 66: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/66.jpg)
![Page 67: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/67.jpg)
![Page 68: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/68.jpg)
![Page 69: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/69.jpg)
![Page 70: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/70.jpg)
Doubts*• Algo 1 line# 19. The process should end when "d" == "end_episode" and not q. [Prachi]
Error.
• The dimension of the match vector should be equivalent to the number of columns to ve
extracted. But Fig 3 has twice the number of dim. [Prachi] Error.
• Is RL the best approach. [Non believers].
• Experience Replay [Anshul]. Hope it is clear now.
• Why is RL-extract better than meta classifier? Explanation provided in paper about "long
tail of noisy, irrelevant documents" is unclear. [Yash]
• The meta-classifier should also cut off at top-20 results per search like the RL system to
be completely fair. [Anshul]
* most mean questions
![Page 71: Reinforcement Learning - Indian Institute of Technology Delhimausam/courses/col864/spring2017/slides/09-rlie.pdf · Reinforcement Learning Karthik Narasimhan, Adam Yala, Regina Barzilay](https://reader034.fdocuments.net/reader034/viewer/2022042021/5e78adab86e40a6d1924977a/html5/thumbnails/71.jpg)
Discussions• Experiments
• People are happy!
• Queries
• Cluster documents and learn queries [Yashoteja]
• Many other query formulations [Surag (lowest confidence entity), Barun (LSTM), Gagan (highest confidence entity), DineshR]
• Fixed set of queries [Akshay]
• Simplicity. Search engines are robust.
• Reliance on News articles {Gagan]
• Where else would you get News from?
• Domain limitations
• Too narrow [Barun, Himanshu]. Domain specific [Happy]. Small ontology [Akshay]
• It is not Open IE. It is task specific. Can be applied to any domain.
• Better meta-classifiers [Surag]
• Effect of more sophisticated RL algorithms (A3C, TRPO) [esp. if increasing action space by LSTM queries], and their effect on performance and training time.