Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State...
Transcript of Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State...
![Page 1: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/1.jpg)
Harrison B. Prosper Florida State University
Bari Lectures 30, 31 May, 1 June 2016
1 Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016
![Page 2: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/2.jpg)
h Lecture 1 h Introduction h Classification h Grid Searches h Decision Trees
h Lecture 2 h Boosted Decision Trees
h Lecture 3 h Neural Networks h Bayesian Neural Networks
2
![Page 3: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/3.jpg)
https://support.ba.infn.it/redmine/projects/statistics/wiki/MVA
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 3
![Page 4: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/4.jpg)
Today, we shall apply neural networks to two problems:
h Wine Tasting
h Approximating a 19-parameter function.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 4
![Page 5: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/5.jpg)
Recall that our goal is to classify wines as good or bad using the variables highlighted in blue, below:
variables description acetic acetic acid
citric citric acid sugar residual sugar salt NaCl SO2free free sulfur dioxide SO2tota total sulfur dioxide
pH pH sulfate potassium sulfate http://www.vinhoverde.pt/en/history-of-vinho-verde alcohol alcohol content
quality (between 0 and 1)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 5
![Page 6: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/6.jpg)
![Page 7: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/7.jpg)
7
(One version of Problem 13): Prove that it is possible to do the following: f(x1,…,xn) = F( g1(x(1),…, x(m)),…, gk(x(1),…, x(m))) m < n for all n
In 1957, Kolmogorov proved that it was possible with m = 3. Today, we know that functions of the form
can provide arbitrarily accurate approximations of real functions of I real variables. (Hornik, Stinchcombe, and White, Neural Networks 2, 359-366 (1989))
f (x1,!,xI ) = a + bj tanh cj + d jixi
i=1
I
∑⎡
⎣⎢
⎤
⎦⎥
j=1
H
∑
![Page 8: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/8.jpg)
8
f (x,w) = a + bj tanh cj + djixii=1
I
∑⎡⎣⎢
⎤⎦⎥j=1
H
∑
n(x, w)
x1
x2
cj
a
f is used for regression n is used for classification w = a, b, c, d are free parameters to be found by the training.
bj dji
![Page 9: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/9.jpg)
We consider the same two variables SO2tota and alcohol and the NN function shown here:
We use data for 500 good wines and 500 “bad” wines.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 9
![Page 10: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/10.jpg)
To construct the NN, we minimize the empirical risk function
where y = 1 for good wines and y = 0 for bad wines,
which yields an approximation to
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 10
R(w) = 12
yi − n(xi ,w)[ ]2i=1
1000
∑
D(x) = p(x | good)p(x | good) + p(x | bad)
x = (SO2tota, alcohol)
![Page 11: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/11.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 11
The distribution of D(x) and the Receiver Operating Characteristic (ROC) curve. In this example, the ROC curve shows the fraction f of each class of wine accepted.
![Page 12: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/12.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 12
Actual p(good | x)
Best fit NN
![Page 13: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/13.jpg)
![Page 14: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/14.jpg)
14
![Page 15: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/15.jpg)
15
Given, the posterior density p(w | T), and some new data x one can compute the predictive distribution of y
If a definite estimate of y is needed for every x, this can be obtained by minimizing a suitable risk function,
Using, for example, L = (y – f )2 yields the following estimate
In general, a different L, will yield a different estimate of f (x).
R[ fw] = L( y, f ) p( y | x,T ) dy∫
f (x) ! y(x, T ) ≡ y p( y | x,T ) dy∫
![Page 16: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/16.jpg)
16
For function approximation, the likelihood p(y | x, w) for a given (y, x) is typically chosen to be Gaussian
For classification one uses
Setting y = 1, in gives
which is an estimate of the probability that x belongs to the class for which y = 1.
p(y | x,w) = exp −12yi − f (xi ,w)[ ]2 /σ 2⎡
⎣⎢⎤⎦⎥
p(y | x,w) = [n(x,w)]y[1− n(x,w)]1− y , y = 0, or 1
p(1 | x,T ) = n(x,w) p(w | T ) dw∫
![Page 17: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/17.jpg)
(GeV)T4l
p
0 50 100 150 200
)T
4l
p(S
|p
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
17
![Page 18: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/18.jpg)
To construct the BNN, we sample points w from the posterior density
using a Markov Chain Monte Carlo method. This generates an ensemble of neural networks, just as in the 1-D example on the previous slide.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 18
p(w |T ) = p(T |w)p(w)p(T )
![Page 19: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/19.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 19
The distribution of D(x) (using the average of 100 NNs) and the ensemble of ROC curves, one for each NN.
![Page 20: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/20.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 20
<NN> Actual p(good | x)
![Page 21: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/21.jpg)
We see that if good and bad wines are equally likely and we are prepared to accept 40% of bad wines, while accepting 80% of good ones, then the BDT does slightly better. But, if we lower the acceptance rate of bad wines, all three methods work equally well.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 21
![Page 22: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/22.jpg)
![Page 23: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/23.jpg)
Now that the Higgs boson has been found, the primary goal of researchers at the LHC is to look for, and we hope, discover new physics.
There are two basic strategies: 1. Look for any deviation from the
predictions of our current theory of particles (the Standard Model).
2. Look for the specific deviations predicted by theories of new physics, such as those based on a proposed symmetry of Nature called supersymmetry.
![Page 24: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/24.jpg)
While finishing a paper related to his PhD dissertation, my student Sam Bein had to approximate a function of 19 parameters!
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 24
![Page 25: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/25.jpg)
This is what I suggested he try. 1. Write the function to be approximated as
2. Use NNs to approximate each gi(θi) by setting y(θ) = g.
3. Use another neural network to approximate the ratio
where the “background” is sampled from the g functions and the “signal” is the original distribution of points.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 25
f (θ) = C(θ) gi (θi )i=1
19
∏
D(θ) = p(θ)
p(θ) + gi (θi )i=1
19
∏
![Page 26: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/26.jpg)
4. Finally, approximate the function f using
The intuition here is that much of the dependence of f on θ is captured by the product function, leaving (we hoped!) a somewhat gentler 19-dimensional function to be approximated.
The next slide shows the neural network approximations of a few of the g functions. Of course, 1-D functions can be approximated using a wide variety of methods. The point here is to show the flexibility of a single class of NNs.
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 26
f (θ) = D(θ)1− D(θ)
gi (θi )i=1
19
∏
![Page 27: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/27.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 27
Ab-6000-4000-2000 0 2000 40006000
0.040.050.060.070.080.090.1
-310×
At-6000-4000-2000 0 2000 400060000
0.020.040.060.080.10.120.140.16
-310×
M1-3000 -2000 -1000 0 1000 2000 300000.10.20.30.40.50.60.70.8
-310×
M2-3000 -2000 -1000 0 1000 2000 30000
0.05
0.1
0.15
0.2
0.25-310×
![Page 28: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/28.jpg)
Lectures on Multivariate Methods Harrison B. Prosper Bari, 2016 28
![Page 29: Harrison B. Prosperharry/teaching/ML/Bari_Prosper3_MVA.pdf · Harrison B. Prosper Florida State University Bari Lectures 30, 31 May, 1 June 2016 Lectures on Multivariate Methods Harrison](https://reader033.fdocuments.net/reader033/viewer/2022050520/5fa3d9db67f2cb1eb505395b/html5/thumbnails/29.jpg)
29
h Multivariate methods can be applied to many aspects of data analysis, including classification and function approximation. Even though there are hundreds of methods, almost all are numerical approximations of the same mathematical quantity.
h We considered random grid searches, decision trees, boosted decision trees, neural networks, and the Bayesian variety.
h Since no method is best for all problems, it is good practice to try a few of them, say BDTs and NNs, and check that they give approximately the same results.