ETH Zurich Andreas Krause Martin Vechev Learning Programs ... · Martin Vechev Andreas Krause ETH...

108
Learning Programs from Noisy Data Veselin Raychev Pavol Bielik Martin Vechev Andreas Krause ETH Zurich

Transcript of ETH Zurich Andreas Krause Martin Vechev Learning Programs ... · Martin Vechev Andreas Krause ETH...

  • Learning Programs from Noisy Data

    Veselin RaychevPavol BielikMartin VechevAndreas Krause

    ETH Zurich

  • Why learn programs from examples?

    2

    Input/output examples

    often easier to provide examples than specification

    (e.g. in FlashFill)

  • Why learn programs from examples?

    3

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?often easier to provide

    examples than specification(e.g. in FlashFill)

  • Why learn programs from examples?

    4

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?often easier to provide

    examples than specification(e.g. in FlashFill)

    the user may make a mistake in the

    examples

  • Why learn programs from examples?

    5

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?

    Actual goal: produce p that the user really wanted and tried to specify

    often easier to provide examples than specification

    (e.g. in FlashFill)

    the user may make a mistake in the

    examples

  • Why learn programs from examples?

    6

    Input/output examples p such that

    p( )=

    ...p( )=

    learn a function

    ?

    Actual goal: produce p that the user really wanted and tried to specify

    often easier to provide examples than specification

    (e.g. in FlashFill)

    the user may make a mistake in the

    examples

    Key problem of synthesis: overfits, not robust to noise

  • Learning Programs from Data: Defining Dimensions

    7

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

  • Learning Programs from Data: Defining Dimensions

    8

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

  • Learning Programs from Data: Defining Dimensions

    9

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

  • Learning Programs from Data: Defining Dimensions

    10

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

  • Learning Programs from Data: Defining Dimensions

    11

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    expands capabilities ofexisting synthesizers

  • Learning Programs from Data: Defining Dimensions

    12

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    new state-of-the-art precision for programming tasks

    expands capabilities ofexisting synthesizers

  • Learning Programs from Data: Defining Dimensions

    13

    Number of Examples

    Handling errors in the dataset

    Learned program complexity

    Program synthesis(PL)

    no

    tens

    interestingprograms

    Deep learning(ML)

    yes

    millions

    simple, butunexplainable

    functions

    This paperbridges a gap

    yes

    millions

    interestingprograms

    new state-of-the-art precision for programming tasks

    expands capabilities ofexisting synthesizers

    Bridges gap between ML and PL

    Advances both areas

  • In this paper

    14

    ● A general framework that handles○ errors in training dataset○ learns statistical models on data○ handles synthesis with millions of examples

    ● Instantiated with two synthesizers○ generalize existing works

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    15

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    16

    Handling noise

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Synthesis with noise: usage model

    17

    Input/output examples

  • Synthesis with noise: usage model

    18

    Input/output examples

    DomainSpecific

    Language

  • Synthesis with noise: usage model

    19

    Input/output examples synthesizer

    DomainSpecific

    Language

  • Synthesis with noise: usage model

    20

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

  • Synthesis with noise: usage model

    21

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

  • Synthesis with noise: usage model

    22

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠

  • Synthesis with noise: usage model

    23

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠new kind of feedback

    from synthesizer

  • Synthesis with noise: usage model

    24

    Input/output examples synthesizer

    DomainSpecific

    Language

    p such that p( )=

    ...p( )=

    incorrect example(e.g. a typo)

    p( )≠new kind of feedback

    from synthesizer● Tell user to

    remove suspicious example, or

    ● Ask for more examples

  • if returnif returnif returnif returnreturn

    Issue: such a program makesno errorsand it may be the only solution to the minimization problem

    Handling noise: problem statement

    25

    D: Input/output examples

    incorrect examples

    synthesizer pbest = arg min errors(D, p)p∈P

    Too long program, hardcodes the input/outputs.Synthesis must penalize such answers

    dataset of input/output

    examples

    space of possible programs in DSL

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    error rate

  • Noisy synthesis using SMT

    26

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

  • Noisy synthesis using SMT

    27

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

  • Ψ

    Noisy synthesis using SMT

    28

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    29

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    30

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    31

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    32

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    33

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    34

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    SAT

    formula given to SMT solver

  • costnumber of errors

    0 1 2 3

    r

    1 0.6 1.6 2.6 3.6

    2 1.2 2.2 3.2 4.2

    3 1.8 2.8 3.8 4.8

    Ψ

    Noisy synthesis using SMT

    35

    encoding

    err1 = if p( )= then 0 else 1err2 = if p( )= then 0 else 1err3 = if p( )= then 0 else 1

    errors = err1 + err2 + err3p ∈ Pr (with r instructions)

    pbest = arg min errors(D, p) + λr(p) p∈P

    number of instructions

    total solution

    cost

    Ask a number of SMT queriesin increasing value of solution cost

    e.g. for

    λ = 0.6costs are

    UNSAT

    UNSAT

    UNSAT

    UNSAT

    SAT

    best program is with two instructions and makes one error

    formula given to SMT solver

  • Noisy synthesizer: example

    36

    Take an actual synthesizer andshow that we can make it handle noise

  • Implementation: BitSyn

    37

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

  • Implementation: BitSyn

    38

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

  • Implementation: BitSyn

    39

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

  • Implementation: BitSyn

    40

    For BitStream programs, using Z3

    similar to Jha et al.[ICSE’10] and Gulwani et al.[PLDI’11]

    Example program:function check_if_power_of_2(int32 x) { var o = add(x, 1) return bitwise_and(x, o)}

    synthesized, short loop-free programs

    Question: how well does our synthesizer discover noise?(in programs from prior work)

    best area to be in. empirically pick λ here

  • So far… handling noise

    ● Problem statement and regularization● Synthesis procedure using SMT● Presented one synthesizer

    Handling noise enables us to solve new classes of problems beyond normal synthesis

    41

  • Handling large datasets

    Handling noise

    New probabilistic modelsContributions

    42

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling large datasetsHandling large datasets

    Handling noise

    New probabilistic modelsContributions

    43

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Fundamental problem

    44

    Large number of examples:

    pbest = arg min cost(D, p)p∈P

  • Fundamental problem

    45

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    p∈P

  • Fundamental problem

    46

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    p∈P

  • Fundamental problem

    47

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    Synthesis: practically intractable

    p∈P

  • Fundamental problem

    48

    Large number of examples:

    pbest = arg min cost(D, p)

    D

    Millions ofinput/output

    examples

    computing cost(D, p)

    O( |D| )

    Synthesis: practically intractable

    Key idea: iterative synthesis onfraction of examples

    p∈P

  • Our solution: two components

    given dataset d, finds best program

    49

    pbest = arg min cost(d, p)p∈P

    Program generator

    Synthesizer for small number of examples

  • Our solution: two components

    given dataset d, finds best program

    50

    pbest = arg min cost(d, p)p∈P

    Program generator

    Datasetsampler

    Picks dataset d ⊆ D

    Synthesizer for small number of examples

    We introduce representative dataset sampler

    Generalize a user providing input/output examples

  • In a loop

    51

    Representative dataset sampler

    Program generator

  • In a loop

    52

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

  • In a loop

    53

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

  • In a loop

    54

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Representative dataset samplerd

  • In a loop

    55

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset samplerd

  • In a loop

    56

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    Representative dataset samplerd

  • In a loop

    57

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    pbest

    Representative dataset samplerd

  • In a loop

    58

    Representative dataset sampler

    Program generator

    Start with a small random sample d⊆D

    Iteratively generate programs and samples.

    Program generator p1

    Program generator p1 , p2

    Representative dataset sampler

    pbest

    Representative dataset samplerd

    Algorithm generalizes synthesis by examples techniques

  • Representative dataset sampler

    59

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    60

    Costs on small dataset d

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    61

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    62

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

  • Representative dataset sampler

    63

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

    p1 p2

  • Representative dataset sampler

    64

    Costs on small dataset d

    Costs on full dataset D

    p1 p2

    p1 p2

    Idea: pick a small dataset d for which a set of already generated programs p1,...,pn behave like on the full dataset

    d = arg min d ⊆ D maxi∊1..n | cost(d, pi) - cost(D, pi) |

    Theorem: this sampler shrinks the candidate program search spaceIn evaluation: significant speedup of synthesis

  • So far... handling large datasets

    ● Iterative combination of synthesis and sampling

    ● New way to perform approximateempirical risk minimization

    ● Guarantees (in the paper)

    65

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    66

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Handling noise

    New probabilistic models

    Handling large datasets

    Contributions

    67

    Input/output examples

    incorrect examples

    New probabilistic models

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

  • Statistical programming tools

    68

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    69

    1. Train machine learning model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    70

    1. Train machine learning model

    2. Make predictionswith model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

  • Statistical programming tools

    71

    1. Train machine learning model

    2. Make predictionswith model

    A new breed of tools:

    Learn from large existing codebases (e.g. Big Code) to make predictions about programs

    hard-coded modellow precision

  • Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    72

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    73

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    74

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    75

  • Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    76

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    77

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    78

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    79

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    80

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

  • Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12]

    Existing machine learning models

    Essentially remember mapping from context in training data to prediction (with probabilities)

    81

    Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

  • Problem of existing systems

    Precision. They rarely predict the next statement

    82

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

  • Problem of existing systems

    Precision. They rarely predict the next statement

    83

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

  • Problem of existing systems

    Precision. They rarely predict the next statement

    84

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

    Low precision for JavaScript

  • Problem of existing systems

    Precision. They rarely predict the next statement

    85

    Raychev et al.[PLDI’14]

    Hindle et al.[ICSE’12] Learn a mapping

    Model will predict slice when it sees it after “+ name .”

    This model comes from NLP

    + name . slice

    Learn a mapping

    Model will predict slice when it sees it after “charAt”Relies on static analysis

    charAt slice

    Very low precision

    Low precision for JavaScript

    Core problem:Existing machine learning

    models are limited and notexpressive enough

  • Key idea: second-order learning

    86

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    87

    1. Synthesizeprogram

    describing a model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    88

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    89

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model 3. Make predictionswith this model

    Learn a program that parametrizes a probabilistic model that makes predictions.

  • Key idea: second-order learning

    90

    1. Synthesizeprogram

    describing a model

    i.e. learn the mapping

    2. Train model 3. Make predictionswith this model

    Learn a program that parametrizes a probabilistic model that makes predictions.

    prior models are described by simple hard-coded programs

    Our approach:learn a better program

  • output

    Training and evaluation

    Training example:

    91

    slice

    input

  • output

    Training and evaluation

    Training example:

    92

    slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    93

    slice

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    94

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    input

  • output

    Training and evaluation

    Training example:

    95

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    Compute context with program p

    input

  • output

    Training and evaluation

    Training example:

    96

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    slice

    Compute context with program p predict completion

    input

  • output

    Training and evaluation

    Training example:

    97

    slice

    Evaluation example:

    Learn a mapping

    toUpperCase slice

    Compute contextwith program p

    slice

    Compute context with program p predict completion

    input

  • Observation

    Synthesis of probabilistic model can be done with the same optimization problem as before!

    98

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    evaluation data: input/output

    examples

  • Observation

    Synthesis of probabilistic model can be done with the same optimization problem as before!

    99

    Our problem formulation:

    pbest = arg min errors(D, p) + λr(p) p∈P

    regularizerpenalizes long

    programs

    regularization constant

    evaluation data: input/output

    examplescost(D, p)

  • So far...

    100

    Handling noise

    Synthesizing a model

    Representative dataset sampler

    Techniques are generally applicable to program synthesis

    Next, application for “Big Code” called DeepSyn

  • DeepSyn: Training

    Trained on 100’000 JavaScript files from GitHub

    101

    Program synthesizer

    Representative dataset sampler

    Program generator

    D

  • DeepSyn: Training

    Trained on 100’000 JavaScript files from GitHub

    102

    Program synthesizer

    Representative dataset sampler

    Program generator

    D

    Train model on full dataand best program p

    pD

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    103

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    104

    Conditioning program p Accuracy

    Last two tokens, Hindle et al.[ICSE’12] 22.2%

    Last two APIs per object, Raychev et al.[PLDI’14] 30.4%

    Program synthesis with noise 46.3%

    Program synthesis with noise + dataset sampler 50.4%This

    wor

    k

  • DeepSyn: Evaluation

    50’000 evaluation files (not used in training or synthesis)

    API completion task

    105

    Conditioning program p Accuracy

    Last two tokens, Hindle et al.[ICSE’12] 22.2%

    Last two APIs per object, Raychev et al.[PLDI’14] 30.4%

    Program synthesis with noise 46.3%

    Program synthesis with noise + dataset sampler 50.4%

    We can explain best program. It looks at API preceding completion position and at tokens prior to these APIs.

    This

    wor

    k

  • Scalability

    Handling noise

    Second-order learning

    Handling large datasets

    Q&A

    106

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

    Synthesis of probabilistic

    models

    Extending synthesizers to handle noise

  • Scalability

    Handling noise

    Second-order learning

    Handling large datasets

    Q&A

    107

    Input/output examples

    incorrect examples

    1. synthesizep

    2. use probabilistic

    modelparametrized

    with p

    Representative dataset sampler

    Program generator

    Synthesis of probabilistic

    models

    Extending synthesizers to handle noise

    Bridges gap between ML and PLAdvances both areas

  • What did we synthesize?

    108

    Left PrevActor WriteAction WriteValue PrevActor WriteAction PrevLeaf WriteValue PrevLeaf WriteValue