Student Records Training Team [email protected] 612 625-2803.
111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering...
-
Upload
pamela-hudson -
Category
Documents
-
view
220 -
download
0
Transcript of 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering...
![Page 1: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/1.jpg)
111
Data-Driven Knowledge Discovery
andPhilosophy of Science
Electrical and Computer Engineering
Vladimir Cherkassky University of Minnesota
Presented at Ockham’s Razor Workshop, CMU, June 2012
![Page 2: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/2.jpg)
222
OUTLINE
• Motivation + Background- changing nature of knowledge discovery- scientific vs empirical knowledge- induction and empirical knowledge
• Philosophical interpretation• Predictive learning framework• Practical aspects and examples• Summary
![Page 3: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/3.jpg)
333
Disclaimer• Philosophy of science (as I see it)
- philosophical ideas form in response to major scientific/ technological advances
Meaningful discussion possible only in the context of these scientific developments
• Ockham’s Razor- general vaguely stated principle- originally interpreted for classical science- in statistical inference ~ justification for model complexity control (model selection)
![Page 4: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/4.jpg)
444
Historical View: data-analytic modeling• Two theoretical developments
- classical statistics ~ mid 20-th century- Vapnik-Chervonenkis theory ~ 1970’s
• Two related technological advances- applied statistics- machine learning, neural nets, data mining etc.
• Statistical(probabilistic) vs predictive modeling- philosophical difference (not widely understood)- interpretation of Ockham’s Razor
![Page 5: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/5.jpg)
555
Scientific Discovery• Combines ideas/models and facts/data
• First-principle knowledge:hypothesis experiment theory
~ deterministic, simple causal models
• Modern data-driven discovery:Computer program + DATA knowledge
~ statistical, complex systems
• Two different philosophies
![Page 6: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/6.jpg)
6
Scientific Knowledge
• Classical Knowledge (last 3-4 centuries):
- objective
- recurrent events (repeatable by others)
- quantifiable (described by math models)
• Knowledge ~ causal, deterministic, logical
• Humans cannot reason well about
- noisy/random data
- multivariate high-dimensional data
![Page 7: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/7.jpg)
7
Cultural and Psychological Aspects
• All men by nature desire knowledge
• Man has an intense desire for assured knowledge
• Assured Knowledge ~ belief in
- religion
- reason (causal determinism)
- science / pseudoscience
- empirical data-analytic models
• Ockham’s Razor ~ methodological belief (?)
![Page 8: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/8.jpg)
888
Gods, Prophets and Shamans
![Page 9: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/9.jpg)
9
Knowledge Discovery in Digital Age
• Most information in the form of data from sensors (not human sense perceptions)
• Can we get assured knowledge from data?
• Naïve realism: data knowledge
Wired Magazine, 16/07: We can stop looking for (scientific) models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot
![Page 10: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/10.jpg)
10
(Over) Promise of Science
Archimedes: Give me a place to stand, and a lever long enough, and I will move the world
Laplace: Present events are connected with preceding ones by a tie based upon the evident principle that a thing cannot occur without a cause that produces it.
Digital Age:
more data new knowledge
more connectivity more knowledge
![Page 11: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/11.jpg)
111111
REALITY• Many studies have questionable value
- statistical correlation vs causation • Some border nonsense
- US scientists at SUNY discovered Adultery Gene !!! (based on a sample of 181 volunteers interviewed about sexual life)
• Usual conclusion- more research is needed …
![Page 12: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/12.jpg)
121212
Three Types of Knowledge
• Growing role of empirical knowledge
• New demarcation problems:- First-principle vs empirical knowledge- Empirical knowledge vs beliefs
![Page 13: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/13.jpg)
131313
Philosophical Challenges• Empirical data-driven knowledge
- different from classical knowledge
• Philosophical Interpretation- first-principle: hypothetico-deductive- empirical knowledge: ???- fragmentation in technical fields, e.g. statistics, machine learning, neural nets, data mining etc.
• Predictive Learning (VC-theory)- provides consistent framework for many apps- different from classical statistical approach
![Page 14: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/14.jpg)
141414
What is a ‘good’ data-analytic model?
• All models are mental constructs that (hopefully) relate to real world
• Two goals of modeling- explain available data ~ subjective- predict future data ~ objective
• True science makes non-trivial predictions Good data-driven models can predict well,
so the goal is to estimate predictive models
![Page 15: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/15.jpg)
15
Learning from Data ~ InductionInduction ~ function estimation from data:
Deduction ~ prediction for new inputs:
Note: statistical induction is different from logical
![Page 16: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/16.jpg)
161616
OUTLINE
• Motivation + Background• Philosophical interpretation• Predictive learning framework• Practical aspects and examples• Summary
![Page 17: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/17.jpg)
17
Observations, Reality and MindPhilosophy is concerned with relationship between
- Reality (Nature)- Sensory Perceptions- Mental Constructs (interpretations of reality)
Three Philosophical Schools• REALISM:
- objective physical reality perceived via senses- mental constructs reflect objective reality
• IDEALISM:- primary role belongs to ideas (mental constructs)- physical reality is a by-product of Mind
• INSTRUMENTALISM:- the goal of science is to produce useful theories
Which one should be adopted (by scientists+ engineers)??
![Page 18: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/18.jpg)
18
Three Philosophical Schools
• Realism
(materialism)
• Idealism
• Instrumentalism
![Page 19: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/19.jpg)
1919
Realistic View of Science
• Every observation/effect has its cause
~ prevailing view and cultural attitude
• Isaac Newton: Hypotheses non fingo
scientific knowledge can be derived from observations + experience
• More data better model
(closer approximation to the truth)
![Page 20: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/20.jpg)
2020
Alternative Views
• Karl Popper: Science starts from problems, and not from observations
• Werner Heisenberg: What we observe is not nature itself, but nature exposed to our method of questioning
• Albert Einstein:
- Reality is merely an illusion, albeit a very persistent one.
Science ~ creation of human mind???
![Page 21: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/21.jpg)
2121
Empirical Knowledge
These methodological/philosophical issues have not been properly addressed
• Can it be obtained from data alone? • How is it different from ‘beliefs’ ?• Role of a priori knowledge vs data ?• What is ‘the method of questioning’ ?
![Page 22: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/22.jpg)
222222
OUTLINE
• Motivation + Background• Philosophical perspective• Predictive learning framework
- classical statistics vs predictive learning- standard inductive learning setting- Ockham’s Razor vs VC-dimension
• Practical aspects and examples• Summary
![Page 23: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/23.jpg)
232323
Method of Questioning• Learning Problem Setting ~
- assumptions about training + test data- goals of learning (model estimation)
• Classical statistics:- data generated from a parametric distribution- estimate /approximate true probabilistic model
• Predictive modeling (VC-theory):
- data generated from unknown distribution- estimate useful (~ predictive) model
![Page 24: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/24.jpg)
242424
Critique of Statistical Approach (L. Breiman)
• The Belief that a statistician can invent a reasonably good parametric class of models for a complex mechanism devised by nature
• Then parameters are estimated and conclusions are drawn
• But conclusions are about - the model’s mechanism- not about nature’s mechanism
![Page 25: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/25.jpg)
25
Inductive Learning: problem setting• The learning machine observes samples (x ,y), and
returns an estimated response
• Two modes of inference: identification vs imitation• Goal is minimization of Risk
Note: - estimation problem is ill-posed (finite sample size)
- probabilistic model P(x,y) is never evaluated
),(ˆ wfy x
min,y),w)) dP(Loss(y, f( xx
![Page 26: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/26.jpg)
26
Binary ClassificationGiven: data samples (~ training data)
Estimate: a model (function) that
- explains this data
- predicts future data
Classification problem:
Learning ~ function estimation
![Page 27: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/27.jpg)
272727
Statistical vs Predictive Approach• Binary Classification problem estimate decision boundary from training data
where y ~ binary class label (0/1)Assuming distribution P(x,y) is known:
(x1,x2) space
ii y,x
-2 0 2 4 6 8 10-6
-4
-2
0
2
4
6
8
10
x1
x2
![Page 28: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/28.jpg)
282828
Classical Statistical Approach(1) parametric form of unknown distribution P(x,y) is known (2) estimate parameters of P(x,y) from the training data (3) Construct decision boundary using estimated distribution
and given misclassification costs
Estimated boundary
Modeling assumption:Parametric distribution isknown and it can be estimated from training data
-2 0 2 4 6 8 10
-6
-4
-2
0
2
4
6
8
10
x1
x2
![Page 29: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/29.jpg)
292929
Predictive Approach(1) parametric form of decision boundary f(x,w) is given (2) Explain available data via fitting f(x,w), or minimization of
some loss function (i.e., squared error)(3) A function f(x,w*) providing smallest fitting error is then
used for predictiion
Estimated boundary
Modeling assumptions- Need to specify f(x,w) andloss function a priori.
- No need to estimate P(x,y) -2 0 2 4 6 8 10
-6
-4
-2
0
2
4
6
8
10
x1
x2
![Page 30: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/30.jpg)
303030
Classification with High-Dimensional Data• Digit recognition 5 vs 8:
each example ~ 28 x 28 pixel image 784-dimensional vector x
Medical Interpretation- Each pixel ~ genetic marker- Each patient (sample) described by 784 genetic markers - Two classes ~ presence/ absence of a disease• Estimation of P(x,y) with finite data is not possible• Accurate estimation of decision boundary in 784-dim.
space is possible, using just a few hundred samples
![Page 31: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/31.jpg)
31
• High dimensional data: genomic data, brain imaging data, social networks, etc.
• Available data matrix X where d >> n• Predictive modeling ~ estimating f(x) is very ill-posed
- Curse of dimensionality (under classical setting)- is generalization possible? - what is a priori knowledge?- understanding high-dimensional models
d
n X
![Page 32: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/32.jpg)
3232
Predictive Modeling Predictive approach
- estimates certain properties of unknown P(x,y) that are useful for predicting the output y.
- based on mathematical theory (VC-theory)
- successfully used in many apps
BUT its methodology + concepts are very different from classical statistics:- formalization of the learning problem (~ requires understanding of application domain)
- a priori specification of a loss function
- interpretation of predictive models is hard
- many good models estimated from the same data
![Page 33: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/33.jpg)
333333
VC-dimension• Measures of model ‘complexity’
- number of ‘free’ parameters/ entities- VC-dimension
• Classical statistics: Ockham’s Razor- estimate simple (~interpretable) models- typical strategy: feature selection- trade-off between simplicity and accuracy
• Predictive modeling (VC-theory):- complex black-box models- multiplicity of good models- prediction is controlled by VC-dimension
![Page 34: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/34.jpg)
343434
VC-dimension• Example: spherical decision functions f(c,r,x)
can shatter 3 points BUT cannot shatter 4 points
![Page 35: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/35.jpg)
353535
VC-dimension• Example: set of functions Sign [Sin (wx)]
can shatter any number of points:
![Page 36: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/36.jpg)
36
VC-dimension vs number of parameters
• VC-dimension can be equal to DoF (number of parameters) Example: linear estimators
• VC-dimension can be smaller than DoFExample: penalized estimators
• VC-dimension can be larger than DoFExample: feature selection
sin (wx)
![Page 37: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/37.jpg)
37
Philosophical interpretation: VC-falsifiability
• Occam’s Razor: Select the model that explains available data and has the small number of entities (free parameters)
• VC theory: Select the model that explains available data and has low VC-dimension (i.e. can be easily falsified)
New Principle of VC falsifiability
![Page 38: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/38.jpg)
383838
OUTLINE
• Motivation + Background• Philosophical perspective• Predictive learning framework• Practical aspects and examples
- philosophical interpretation of data-driven knowledge discovery- trading international mutual funds- handwritten digit recognition
• Summary
![Page 39: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/39.jpg)
393939
Philosophical Interpretation
• What is primary in data-driven knowledge:- observed data or method of questioning ?- what is ‘method of questioning’?
• Is it possible to achieve good generalization with finite samples ?
• Philosophical interpretation of the goal of learning & math conditions for generalization
![Page 40: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/40.jpg)
404040
VC-Theory provides answers• Method of questioning is
- the learning problem setting- should be driven by app requirements
• Standard inductive learning commonly used (not always the best choice)
• Good generalization depends on two factors- (small) training error- small VC-dimension ~ large ‘falsifiability’
• Occam’s Razor does not explain successful methods: SVM, boosting, random forests, ...
![Page 41: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/41.jpg)
414141
Application Examples
• Both use binary classification
• ISSUES- good prediction/generalization- interpretation of estimated models, especially for high-dimensional data- multiple good models
input pattern
feature extraction
Xclassifier decision
(class label)
![Page 42: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/42.jpg)
424242
Timing of International Funds• International mutual funds
- priced at 4 pm EST (New York time)
- reflect price of foreign securities traded at European/ Asian markets
- Foreign markets close earlier than US market
Possibility of inefficient pricing
Market timing exploits this inefficiency.
• Scandals in the mutual fund industry ~2002
• Solution adopted: restrictions on trading
![Page 43: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/43.jpg)
434343
Binary Classification Setting • TWIEX ~ American Century Int’l Growth• Input indicators (for trading) ~ today
- SP 500 index (daily % change) ~ x1
- Euro-to-dollar exchange rate (% change) ~ x2
• Output : TWIEX NAV (% change) ~next day
• Model parameterization (fixed):- linear- quadratic
• Decision rule (estimated from training data):
1 1 2 2 0( , )g w x w x w x w2 2
1 1 2 2 3 1 4 2 5 1 2 0( , )g w x w x w x w x w x x w x w
( ) ( ( , ))D Sign g x x w
![Page 44: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/44.jpg)
444444
VC theoretical Methodology
• When a trained model can predict well?
(1) Future/test data is similar to training datai.e., use 2004 period for training, and 2005 for testing
(2) Estimated model is ‘simple’ and provides good performance during training period
i.e., trading strategy is consistently better than buy-and-hold during training period
![Page 45: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/45.jpg)
454545
Empirical Results: 2004 -2005 data Linear model
Training data 2004 Training period 2004
can expect good performance with test data
0 50 100 150 200 250-10
-5
0
5
10
15
20
25
30
Days
Cu
mu
lativ
e G
ain
/Lo
ss (
%)
TradingBuy and Hold
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500 ( %)
EU
RU
SD
( %
)
![Page 46: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/46.jpg)
464646
Empirical Results: 2004 -2005 data Linear model
Test data 2005 Test period 2005
confirmed good prediction performance
0 50 100 150 200 250-5
0
5
10
15
20
25
Days
Cum
ulat
ive
Gai
n /L
oss
(%)
TradingBuy and Hold
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500( %)
EU
RU
SD
( %
)
![Page 47: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/47.jpg)
474747
Empirical Results: 2004 -2005 data Quadratic model
Training data 2004 Training period 2004
can expect good performance with test data
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500( %)
EU
RU
SD
( %
)
0 50 100 150 200 250-10
-5
0
5
10
15
20
25
30
35
Days
Cu
mu
lativ
e G
ain
/L
oss (
%)
TradingBuy and Hold
![Page 48: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/48.jpg)
484848
Empirical Results: 2004 -2005 data Quadratic model
Test data 2005 Test period 2005
confirmed good test performance
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500( %)
EU
RU
SD
( %
)
0 50 100 150 200 250-5
0
5
10
15
20
25
30
Days
Cum
ulat
ive
Gai
n/Lo
ss (%
)
TradingBuy and Hold
![Page 49: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/49.jpg)
494949
Interpretation vs Prediction• Two good trading strategies estimated from
2004 training data
• Both models predict well for test period 2005• Which model is true?
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500 ( %)
EU
RU
SD
( %
)
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
SP500( %)E
UR
US
D(
%)
![Page 50: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/50.jpg)
50
Handwritten digit recognition
Digit “5” Digit “8”
28 pixels
28 pixels
28 pixels
28 pixels
Binary classification task: digit “5” vs. digit “8”• No. of Training samples = 1000 (500 per class).• No. of Validation samples = 1000 (used for model selection).• No. of Test samples = 1866.• Dimensionality of input space = 784 (28 x 28).• RBF SVM yields good generalization (similar to humans)
![Page 51: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/51.jpg)
515151
Interpretation vs Prediction• Humans cannot provide interpretation
even when they make good prediction
• Interpretation of black-box modelsNot unique/ subjective Depends on parameterization: i.e. kernel type
![Page 52: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/52.jpg)
525252
Interpretation of SVM models
How to interpret high-dimensional models?
Strategy 1: dimensionality reduction/feature selection prediction accuracy usually suffers
Strategy 2: interpretation of a high-dimensional model utilizing properties of SVM (~ separation margin)
![Page 53: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/53.jpg)
53
Univariate histogram of projections• Project training data onto normal vector w of the trained SVM
( ( )) ( )y sign f sign b x w x
b w x
0-1 +1
W
0
-1
+1
![Page 54: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/54.jpg)
54
TYPICAL HISTOGRAMS OF PROJECTIONS
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.50
50
100
150
200
250
-3 -2 -1 0 1 2 30
50
100
150
200
250
(a) Projections of training data. Training error=0
(c) Projections of test data: Test error =1.23%
-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 20
50
100
150
(b) Projections of validation data.Validation error=1.7%
• Selected SVM parameter values
62~
101~
orC
![Page 55: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/55.jpg)
555555
SUMMARY• Philosophical issues + methodology:
important for data-analytic modeling• Important distinction between first-principle
knowledge, empirical knowledge, beliefs• Black-box predictive models
- no simple interpretation (many variables)- multiplicity of good models
• Simple/interpretable parameterizations do not predict well for high-dimensional data
• Non-standard and non-inductive settings
![Page 56: 111 Data-Driven Knowledge Discovery and Philosophy of Science Electrical and Computer Engineering Vladimir Cherkassky University of Minnesota cherk001@umn.edu.](https://reader035.fdocuments.net/reader035/viewer/2022081513/5697c00c1a28abf838cc8e8d/html5/thumbnails/56.jpg)
56
References• V. Vapnik, Estimation of Dependencies Based on Empirical
Data. Empirical Inference Science: Afterword of 2006 Springer
• L. Breiman, “Statistical Modeling: the Two Cultures”, Statistical Science, vol. 16(3), pp. 199-231, 2001
• V. Cherkassky and F. Mulier, Learning from Data, second
edition, Wiley, 2007
• V. Cherkassky, Predictive Learning, 2012 (to appear)- check Amazon.com in early Aug 2012- developed for upper-level undergrad course for engineering and computer science students at U. of Minnesota with significant Liberal Arts content (on philosophy) - see http://www.ece.umn.edu/users/cherkass/ee4389/