Inductive Transfer Retrospective & Review

Inductive Transfer Retrospective & Review

Rich CaruanaRich Caruana

Computer Science DepartmentComputer Science Department

Cornell UniversityCornell University

Inductive Transfer: a.k.a. …Inductive Transfer: a.k.a. …

Bias LearningBias Learning Multitask learningMultitask learning Learning (Internal) RepresentationsLearning (Internal) Representations Learning-to-learnLearning-to-learn Lifelong learningLifelong learning Continual learningContinual learning Speedup learningSpeedup learning HintsHints Hierarchical BayesHierarchical Bayes ……

Rich Sutton [1994] Constructive Induction Workshop:Rich Sutton [1994] Constructive Induction Workshop:

““Everyone knows that good representations are key to 99% of good learning Everyone knows that good representations are key to 99% of good learning performance. Why then has constructive induction, the science of finding good performance. Why then has constructive induction, the science of finding good representations, been able to make only incremental improvements in performance?representations, been able to make only incremental improvements in performance?

People can learn amazingly fast because they bring good representations to the People can learn amazingly fast because they bring good representations to the problem, representations they learned on previous problems. For people, then, problem, representations they learned on previous problems. For people, then, constructive induction does make a large difference in performance. …constructive induction does make a large difference in performance. …

The standard machine learning methodology is to consider a single concept to be The standard machine learning methodology is to consider a single concept to be learned. That itself is the crux of the problem…learned. That itself is the crux of the problem…

This is not the way to study constructive induction! … The standard one-concept This is not the way to study constructive induction! … The standard one-concept learning task will never do this for us and must be abandoned. Instead we should learning task will never do this for us and must be abandoned. Instead we should look to natural learning systems, such as people, to get a better sense of the real look to natural learning systems, such as people, to get a better sense of the real task facing them. When we do this, I think we find the key difference that, for all task facing them. When we do this, I think we find the key difference that, for all practical purposes, people face not one task, but a series of tasks. The different practical purposes, people face not one task, but a series of tasks. The different tasks have different solutions, but they often share the same useful representations.tasks have different solutions, but they often share the same useful representations.

… … If you can come to the nth task with an excellent representation learned from the If you can come to the nth task with an excellent representation learned from the preceding n-1 tasks, then you can learn dramatically faster than a system that does preceding n-1 tasks, then you can learn dramatically faster than a system that does not use constructive induction. A system without constructive induction will learn not use constructive induction. A system without constructive induction will learn no faster on the nth task than on the 1st. …”no faster on the nth task than on the 1st. …”

1986: Sejnowski & Rosenberg – NETtalk1986: Sejnowski & Rosenberg – NETtalk 1990: Dietterich, Hild, Bakiri – ID3 vs. NETtalk1990: Dietterich, Hild, Bakiri – ID3 vs. NETtalk 1990: Suddarth, Kergiosen, & Holden – rule injection (ANNs)1990: Suddarth, Kergiosen, & Holden – rule injection (ANNs) 1990: Abu-Mostafa – hints (ANNs)1990: Abu-Mostafa – hints (ANNs) 1991: Dean Pomerleau – ALVINN output representation (ANNs)1991: Dean Pomerleau – ALVINN output representation (ANNs) 1991: Lorien Pratt – speedup learning (ANNs)1991: Lorien Pratt – speedup learning (ANNs) 1992: Sharkey & Sharkey – speedup learning (ANNs) 1992: Sharkey & Sharkey – speedup learning (ANNs) 1992: Mark Ring – continual learning1992: Mark Ring – continual learning 1993: Rich Caruana – MTL (ANNs, 1993: Rich Caruana – MTL (ANNs, KNNKNN, , DTDT)) 1993: Thrun & Mitchell – EBNN1993: Thrun & Mitchell – EBNN 1994: Virginia de Sa – minimizing disagreement1994: Virginia de Sa – minimizing disagreement 1994: Jonathan Baxter – representation learning (and theory)1994: Jonathan Baxter – representation learning (and theory) 1994: Thrun & Mitchell – learning one more thing1994: Thrun & Mitchell – learning one more thing 1994: J. Schmidhuber – learning how to learn learning strategies1994: J. Schmidhuber – learning how to learn learning strategies

Transfer through the AgesTransfer through the Ages

1994: Dietterich & Bakiri: ECOC outputs1994: Dietterich & Bakiri: ECOC outputs 1995: Breiman & Friedman – Curds & Whey1995: Breiman & Friedman – Curds & Whey 1995: Sebastian Thrun – LLL (learning-to-learn, lifelong-learning)1995: Sebastian Thrun – LLL (learning-to-learn, lifelong-learning) 1996: Danny Silver – parallel transfer (ANNs)1996: Danny Silver – parallel transfer (ANNs) 1996: O’Sullivan & Thrun – task clustering (KNN)1996: O’Sullivan & Thrun – task clustering (KNN) 1996: Caruana & de Sa – inputs better as outputs (ANNs)1996: Caruana & de Sa – inputs better as outputs (ANNs) 1997: Munro & Parmanto – committee machines (ANNs)1997: Munro & Parmanto – committee machines (ANNs) 1998: Blum & Mitchell – co-training1998: Blum & Mitchell – co-training 2002: Ben-David, Gehrke, Schuller – theoretical framework2002: Ben-David, Gehrke, Schuller – theoretical framework 2003: Bakker & Heskes – Bayesian MTL (and task clustering)2003: Bakker & Heskes – Bayesian MTL (and task clustering) 2004: Tony Jebara – MTL in SVMs (feature and kernel selection)2004: Tony Jebara – MTL in SVMs (feature and kernel selection) 2004: Pontil & Micchelli – Kernels for MTL2004: Pontil & Micchelli – Kernels for MTL 2004: Lawrence & Platt – MTL in GP (info vector machine)2004: Lawrence & Platt – MTL in GP (info vector machine) 2005: Yu, Tresp, Schwaighofer – MTL in GP2005: Yu, Tresp, Schwaighofer – MTL in GP 2005: Lia & Carin – MTL for RBF Networks2005: Lia & Carin – MTL for RBF Networks

A Quick Romp Through Some Stuff

A Quick Romp Through Some Stuff

1 Task vs. 2 Tasks vs. 4 Tasks

STL vs. MTL Learning Curves

courtesy Joseph O’Sullivan

STL vs. MTL Learning Curves

A Different Kind of Learning Curve

MTL for Bayes Net Structure LearningMTL for Bayes Net Structure Learning

B

C

E

D

AB

C

E

D

AB

C

E

D

A

Yeast 1 Yeast 2 Yeast 3

Bayes Nets for these three species overlap significantlyBayes Nets for these three species overlap significantly Learn structures from data for each species separately? No.Learn structures from data for each species separately? No. Learn one structure for all three species? No.Learn one structure for all three species? No. Bias learning to favor shared structure while allowing some Bias learning to favor shared structure while allowing some

differences? Yes -- makes most of limited data.differences? Yes -- makes most of limited data.

When to Use Inductive Transfer? multiple tasks occur naturallymultiple tasks occur naturally using future to predict presentusing future to predict present time seriestime series decomposable tasksdecomposable tasks multiple error metricsmultiple error metrics focus of attentionfocus of attention different data distributions for same/similar problemsdifferent data distributions for same/similar problems hierarchical taskshierarchical tasks some input features work better as outputssome input features work better as outputs……

Multiple Tasks Occur Naturally

Mitchell’s Calendar Apprentice (CAP)Mitchell’s Calendar Apprentice (CAP)– time-of-day (9:00am, 9:30am, ...)time-of-day (9:00am, 9:30am, ...)– day-of-week (M, T, W, ...)day-of-week (M, T, W, ...)– duration (30min, 60min, ...)duration (30min, 60min, ...)– location (Tom’s office, Dean’s office, 5409, ...)location (Tom’s office, Dean’s office, 5409, ...)

Using Future to Predict Present

medical domainsmedical domains autonomous vehicles autonomous vehicles

and robotsand robots time seriestime series

– stock marketstock market

– economic forecastingeconomic forecasting

– weather predictionweather prediction

– spatial seriesspatial series

many moremany more

Decomposable Tasks

DireOutcomeDireOutcome = = ICU v Complication v DeathICU v Complication v Death

INPUTS

Focus of AttentionFocus of Attention

Single-Task ALVINN Multi-Task ALVINN

Different Data Distributions

Hospital 1: 50 cases, rural (Ithaca)Hospital 1: 50 cases, rural (Ithaca) Hospital 2: 500 cases, mature urban (Des Moines)Hospital 2: 500 cases, mature urban (Des Moines) Hospital 3: 1000 cases, elderly suburbs (Florida)Hospital 3: 1000 cases, elderly suburbs (Florida) Hospital 4: 5000 cases, young urban (LA,SF)Hospital 4: 5000 cases, young urban (LA,SF)

Some Inputs are Better as Outputs

And many more uses of Xfer…And many more uses of Xfer…

A Few Issues That Arise With Xfer

A Few Issues That Arise With Xfer

Issue #1: Interference

Issue #2: Task Selection/WeightingIssue #2: Task Selection/Weighting Analogous to feature selectionAnalogous to feature selection Correlation between tasksCorrelation between tasks

– heuristic works well in practiceheuristic works well in practice– very suboptimalvery suboptimal

Wrapper-based methodsWrapper-based methods– expensiveexpensive– benefit from single tasks can be too small to detect reliablybenefit from single tasks can be too small to detect reliably– does not examine tasks in setsdoes not examine tasks in sets

Task weighting: MTL ≠ one model for all tasksTask weighting: MTL ≠ one model for all tasks– main task vs. all tasksmain task vs. all tasks– even harder than task selectioneven harder than task selection– but yields best resultsbut yields best results

Issue #3: Parallel vs. Serial Transfer

Where possible, use parallel transferWhere possible, use parallel transfer– All info about a task is in the training set, not All info about a task is in the training set, not

necessarily a model trained on that train setnecessarily a model trained on that train set– Information useful to other tasks can be lost Information useful to other tasks can be lost

training one task at a timetraining one task at a time– Tasks often benefit each other mutuallyTasks often benefit each other mutually

When serial is necessary, implement via When serial is necessary, implement via parallel task rehearsalparallel task rehearsal

Storing all experience not always feasibleStoring all experience not always feasible

Issue #4: Psychological Plausibility

?

Issue #5: Xfer vs. Hierarchical BayesIssue #5: Xfer vs. Hierarchical Bayes

Is Xfer just regularization/smoothing?Is Xfer just regularization/smoothing?Yes and NoYes and NoYes:Yes:

– Similar models for different problem instancesSimilar models for different problem instancese.g. similar stocks, data distributions, …e.g. similar stocks, data distributions, …

No:No:– Focus of attentionFocus of attention– Task selection/clustering/rehearsalTask selection/clustering/rehearsal

related related helps learning (e.g., copy task) helps learning (e.g., copy task)

Issue #6: What does Related Mean?

related related helps learning (e.g., copy task) helps learning (e.g., copy task)helps learning helps learning related (e.g., noise task) related (e.g., noise task)


related related helps learning (e.g., copy task) helps learning (e.g., copy task)helps learning helps learning related (e.g., noise task) related (e.g., noise task) related related correlated (e.g., A+B, A-B) correlated (e.g., A+B, A-B)


Why Doesn’t Xfer Rule the Earth?Why Doesn’t Xfer Rule the Earth?

Tabula rasa learning surprisingly effectiveTabula rasa learning surprisingly effective the UCI problemthe UCI problem

Use Some Features as Outputs

Why Doesn’t Xfer Rule the Earth?Why Doesn’t Xfer Rule the Earth?

Xfer opportunities abound in real problemsXfer opportunities abound in real problemsSomewhat easier with ANNs (and Bayes nets)Somewhat easier with ANNs (and Bayes nets)Death is in the detailsDeath is in the details

– Xfer often hurts more than it helps if not carefulXfer often hurts more than it helps if not careful– Some important tricks counterintuitiveSome important tricks counterintuitive

don’t share too muchdon’t share too much give tasks breathing roomgive tasks breathing room focus on one task at a time focus on one task at a time

Tabula rasa learning surprisingly effectiveTabula rasa learning surprisingly effective the UCI problemthe UCI problem

What Needs to be Done?What Needs to be Done?

Have algs for ANN, KNN, DT, SVM, GP, BN, …Have algs for ANN, KNN, DT, SVM, GP, BN, …Better prescription of where to use XferBetter prescription of where to use XferPublic data setsPublic data setsComparison of MethodsComparison of Methods Inductive Transfer Competition?Inductive Transfer Competition?Task selection, task weighting, task clusteringTask selection, task weighting, task clusteringExplicit (TC) vs. Implicit (backprop) XferExplicit (TC) vs. Implicit (backprop) XferTheory/definition of task relatednessTheory/definition of task relatedness

Kinds of TransferKinds of Transfer

Human ExpertiseHuman Expertise– ConstraintsConstraints– Hints (monotonicity, smoothness, …)Hints (monotonicity, smoothness, …)

ParallelParallel– Multitask LearningMultitask Learning

SerialSerial– Learning-To-LearnLearning-To-Learn– Serial via parallel (rehearsal)Serial via parallel (rehearsal)

Motivating Example

4 tasks defined on eight bits B4 tasks defined on eight bits B11-B-B88::

all tasks ignore input bits Ball tasks ignore input bits B77-B-B88

Task 1= B1 ∨ (Parity B2 −B6 )

2Task =¬B1 ∨ (Parity B2 −B6 )

3Task = B1 ∧ (Parity B2 −B6 )

4Task =¬B1 ∧ (Parity B2 −B6 )

Goals of MTL

improve predictive accuracyimprove predictive accuracy– not intelligibilitynot intelligibility– not learning speednot learning speed

exploit “background” knowledgeexploit “background” knowledge applicable to many learning methodsapplicable to many learning methods exploit strength of current learning methods:exploit strength of current learning methods:

surprisingly good tabula rasa performancesurprisingly good tabula rasa performance

Problem 2: 1D-Doors

color camera on Xavier robotcolor camera on Xavier robot main tasks: main tasks: doorknob location and door typedoorknob location and door type 8 extra tasks (training signals collected by mouse):8 extra tasks (training signals collected by mouse):

– doorway widthdoorway width

– location of doorway centerlocation of doorway center

– location of left jamb, right jamblocation of left jamb, right jamb

– location of left and right edges of doorlocation of left and right edges of door

Predicting Pneumonia RiskPredicting Pneumonia Risk

PneumoniaRisk

Age

Gen

der

Blo

od P

ress

ure

Che

st X

-Ray

Pre-Hospital Attributes

Alb

umin

Blo

od p

O2

Whi

te C

ount

RB

C C

ount

In-Hospital Attributes

PneumoniaRisk

Age

Gen

der

Blo

od P

ress

ure

Che

st X

-Ray

Pre-Hospital Attributes

Pneumonia #1: Medis

Pneumonia #1: Results

-10.8% -11.8% -6.2% -6.9% -5.7%

Use imputed values for missing lab tests as extra inputs?

Use imputed values for missing lab tests as extra inputs?

Pneumonia #1: Feature Nets

Pneumonia #2: Results

MTL reduces error >10%

Related? Ideal:Ideal:

Func (MainTask, ExtraTask, Alg) = 1Func (MainTask, ExtraTask, Alg) = 1

iffiff

Alg (MainTask || ExtraTask) > Alg (MainTask)Alg (MainTask || ExtraTask) > Alg (MainTask)

unrealisticunrealistic try all extra tasks (or all combinations)?try all extra tasks (or all combinations)? need need heuristicsheuristics to help us find potentially useful extra to help us find potentially useful extra

tasks to use for MTL:tasks to use for MTL:

Related TasksRelated Tasks

related related helps learning (e.g., copy tasks) helps learning (e.g., copy tasks)

Related?

related related helps learning (e.g., copy task) helps learning (e.g., copy task)helps learning helps learning related (e.g., noise task) related (e.g., noise task)

Related?

related related helps learning (e.g., copy task) helps learning (e.g., copy task)helps learning helps learning related (e.g., noise task) related (e.g., noise task) related related correlated (e.g., A+B, A-B) correlated (e.g., A+B, A-B)

Related?

120 Synthetic Tasks backprop net not told how tasks are related, but ...backprop net not told how tasks are related, but ... 120 120 Peaks FunctionsPeaks Functions: A,B,C,D,E,F : A,B,C,D,E,F (0.0,1.0) (0.0,1.0)

– P 001 = If (A > 0.5) Then B, Else CP 001 = If (A > 0.5) Then B, Else C

– P 002 = If (A > 0.5) Then B, Else DP 002 = If (A > 0.5) Then B, Else D

– P 014 = If (A > 0.5) Then E, Else CP 014 = If (A > 0.5) Then E, Else C

– P 024 = If (B > 0.5) Then A, Else FP 024 = If (B > 0.5) Then A, Else F

– P 120 = If (F > 0.5) Then E, Else DP 120 = If (F > 0.5) Then E, Else D

MTL MTL netsnets clustercluster tasks tasks

by by functionfunction

Peaks Functions: Clustering

Focus of Attention

1D-ALVINN:1D-ALVINN:– centerlinecenterline– left and right edges of roadleft and right edges of road

removing centerlines from 1D-ALVINN images hurts removing centerlines from 1D-ALVINN images hurts MTL accuracy more than STL accuracyMTL accuracy more than STL accuracy

Some Inputs are Better as Outputs MainTask = Sigmoid(A)+Sigmoid(B)MainTask = Sigmoid(A)+Sigmoid(B) A, B A, B Inputs A and B coded via 10-bit binary codeInputs A and B coded via 10-bit binary code

Inputs Better as Outputs: Results

MTL in K-Nearest Neighbor

Most learning methods can MTL:Most learning methods can MTL:– shared representation shared representation – combine performance of extra taskscombine performance of extra tasks– control the effect of extra taskscontrol the effect of extra tasks

MTL in K-Nearest Neighbor:MTL in K-Nearest Neighbor:– shared representation: distance metricshared representation: distance metric– MTLPerf = (1-MTLPerf = (1-))MainPerf + MainPerf + ((ExtraPerf)ExtraPerf)

Summary

inductive transfer improves learninginductive transfer improves learning>15 problem types where MTL is applicable:>15 problem types where MTL is applicable:

– using the future to predict the presentusing the future to predict the present– multiple metricsmultiple metrics– focus of attentionfocus of attention– different data populationsdifferent data populations– using inputs as extra tasksusing inputs as extra tasks– . . . (at least 10 more). . . (at least 10 more)

most real-world problems fit one of thesemost real-world problems fit one of these

Summary/Contributions

applied MTL to a dozen problems, some not applied MTL to a dozen problems, some not created for MTLcreated for MTL– MTL helps most of the timeMTL helps most of the time

– benefits range from 5%-40%benefits range from 5%-40%

ways to improve MTL/Backpropways to improve MTL/Backprop– learning rate optimizationlearning rate optimization

– private hidden layers private hidden layers

– MTL Feature NetsMTL Feature Nets

MTL nets do unsupervised learning/clusteringMTL nets do unsupervised learning/clustering algorithms for MTL: ANN, KNN, SVMs, DTsalgorithms for MTL: ANN, KNN, SVMs, DTs

Open Problems

output selectionoutput selection scale to 1000’s of extra tasksscale to 1000’s of extra tasks compare to Bayes Netscompare to Bayes Nets theory of MTLtheory of MTL task weightingtask weighting features as both inputs and extra outputsfeatures as both inputs and extra outputs

Features as Both Inputs & Outputs some features help when used as inputssome features help when used as inputs some of those also help when used as outputssome of those also help when used as outputs get both benefits in one net?get both benefits in one net?


focus on mainfocus on main task improves performancetask improves performance>15 problem types where MTL is applicable:>15 problem types where MTL is applicable:

– using the future to predict the presentusing the future to predict the present– multiple metricsmultiple metrics– focus of attentionfocus of attention– different data populationsdifferent data populations– using inputs as extra tasksusing inputs as extra tasks– . . . (at least 10 more). . . (at least 10 more)

most real-world problems fit one of thesemost real-world problems fit one of these


applied MTL to a dozen problems, some not applied MTL to a dozen problems, some not created for MTLcreated for MTL– MTL helps most of the timeMTL helps most of the time

– benefits range from 5%-40%benefits range from 5%-40%

ways to improve MTL/Backpropways to improve MTL/Backprop– learning rate optimizationlearning rate optimization

– private hidden layers private hidden layers

– MTL Feature NetsMTL Feature Nets

MTL nets do unsupervised clusteringMTL nets do unsupervised clustering algs for MTL kNN and MTL Decision Treesalgs for MTL kNN and MTL Decision Trees

Future MTL Work

output selectionoutput selection scale to 1000’s of extra tasksscale to 1000’s of extra tasks theory of MTLtheory of MTL compare to Bayes Netscompare to Bayes Nets task weightingtask weighting ““features” as both inputs and extra outputsfeatures” as both inputs and extra outputs

Inputs as Outputs: DNA Domaingiven sequence of 60 DNA nucleotides, predict if given sequence of 60 DNA nucleotides, predict if

sequence is {Isequence is {IE, EE, EI, neither}I, neither}

... ACAGTACGTTGCATTACCCTCGTT...... ACAGTACGTTGCATTACCCTCGTT... {I{IE, EE, EI, neither}I, neither}

nucleotides {A,C,G,T} coded with 3 bitsnucleotides {A,C,G,T} coded with 3 bits3 * 60 = 180 inputs; 3 binary outputs3 * 60 = 180 inputs; 3 binary outputs

Making MTL/Backprop Better

Better training algorithm: Better training algorithm:

– learning rate optimizationlearning rate optimization

Better architectures:Better architectures:

– private hidden layers (overfitting in hidden unit space)private hidden layers (overfitting in hidden unit space)

– using features as both inputs and outputsusing features as both inputs and outputs

– combining MTL with Feature Netscombining MTL with Feature Nets

Private Hidden Layers many tasks: need many hidden unitsmany tasks: need many hidden units many hidden units: “hidden unit selection problem”many hidden units: “hidden unit selection problem” allow sharing, but without too many hidden units? allow sharing, but without too many hidden units?

Related Work– Sejnowski, Rosenberg [1986]: NETtalkSejnowski, Rosenberg [1986]: NETtalk– Pratt, Mostow [1991-94]: serial transfer in bp netsPratt, Mostow [1991-94]: serial transfer in bp nets– Suddarth, Kergiosen [1990]: 1st MTL in bp netsSuddarth, Kergiosen [1990]: 1st MTL in bp nets– Abu-Mostafa [1990-95]: catalytic hintsAbu-Mostafa [1990-95]: catalytic hints– Abu-Mostafa, Baxter [92,95]: transfer PAC modelsAbu-Mostafa, Baxter [92,95]: transfer PAC models– Dietterich, Hild, Bakiri [90,95]: bp vs. ID3Dietterich, Hild, Bakiri [90,95]: bp vs. ID3– Pomerleau, Baluja: other uses of hidden layersPomerleau, Baluja: other uses of hidden layers– Munro [1996]: extra tasks to decorrelate expertsMunro [1996]: extra tasks to decorrelate experts– Breiman [1995]: Curds & WheyBreiman [1995]: Curds & Whey– de Sa [1995]: minimizing disagreementde Sa [1995]: minimizing disagreement– Thrun, Mitchell [1994,96]: EBNNThrun, Mitchell [1994,96]: EBNN– O’Sullivan, Mitchell [now]: EBNN+MTL+RobotO’Sullivan, Mitchell [now]: EBNN+MTL+Robot

MTL vs. EBNN on Robot Problem

courtesy Joseph O’Sullivan

Theoretical Models of Parallel Xfer

PAC models based on VC-dim or MDLPAC models based on VC-dim or MDL– unreasonable assumptionsunreasonable assumptions

fixed size hidden layersfixed size hidden layers all tasks generated by one hidden layerall tasks generated by one hidden layer backprop is ideal search procedurebackprop is ideal search procedure

– predictions do not fit observationspredictions do not fit observations have to add hidden unitshave to add hidden units

– main problems: main problems: can't take behavior of backprop into accountcan't take behavior of backprop into account not enough is known about capacity of backprop netsnot enough is known about capacity of backprop nets

Learning Rate Optimization optimize learning rates of extra tasksoptimize learning rates of extra tasks goal is maximize generalization of main taskgoal is maximize generalization of main task ignore performance of extra tasksignore performance of extra tasks expensive!expensive!

performance on extra tasks improves 9%!performance on extra tasks improves 9%!

MTL Feature Nets

Acknowledgements

advisors: Mitchell & Simonadvisors: Mitchell & Simon committee: Pomerleau & Dietterichcommittee: Pomerleau & Dietterich CEHC: Cooper, Fine, Buchanan, et al.CEHC: Cooper, Fine, Buchanan, et al. co-authors: Baluja, de Sa, Freitagco-authors: Baluja, de Sa, Freitag robot Xavier: O’Sullivan, Simmonsrobot Xavier: O’Sullivan, Simmons discussion: Fahlman, Moore, Touretzkydiscussion: Fahlman, Moore, Touretzky funding: NSF, ARPA, DEC, CEHC, JPRCfunding: NSF, ARPA, DEC, CEHC, JPRC SCS/CMU: SCS/CMU: a great place to do researcha great place to do research spouse: Dianespouse: Diane

Inductive Transfer Retrospective & Review

Documents

Transcript of Inductive Transfer Retrospective & Review