Enhancement in NDT Inspection for Operational E ectiveness ...
The unreasonable e ectiveness of mathematics, revisitedThe typical interpretation of Wigner test is...
Transcript of The unreasonable e ectiveness of mathematics, revisitedThe typical interpretation of Wigner test is...
-
The unreasonable e�ectiveness of mathematics,
revisited
Big data and neuroscience
Jaime Gómez-Ramírez
Fundación Reina So�a. Centre for Research in Neurodegenarative Diseases
April 11 2018
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited1 / 1
-
Outline
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited2 / 1
-
The e�ectiveness of mathematics
Einstein: The mostincomprehensible thingabout the world is that is
comprehensible
Wigner: The unreasonablee�ectiveness ofmathematics
Gelfand: The UnreasonableIne�ectiveness of
Mathematics in biology
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited3 / 1
-
The e�ectiveness of mathematics
heat loss in co�eedQdT = As(Tcoffe − Troom)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited4 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humanspremise is unreasonable to think that those same impulses are
e�ectiveobservation nevertheless it so happens that they are e�ectiveconsequence it follows that math concepts are unreasonably
e�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humans
premise is unreasonable to think that those same impulses aree�ective
observation nevertheless it so happens that they are e�ectiveconsequence it follows that math concepts are unreasonably
e�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humanspremise is unreasonable to think that those same impulses are
e�ective
observation nevertheless it so happens that they are e�ectiveconsequence it follows that math concepts are unreasonably
e�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humanspremise is unreasonable to think that those same impulses are
e�ectiveobservation nevertheless it so happens that they are e�ective
consequence it follows that math concepts are unreasonablye�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humanspremise is unreasonable to think that those same impulses are
e�ectiveobservation nevertheless it so happens that they are e�ectiveconsequence it follows that math concepts are unreasonably
e�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner's 1960 essay "the enormous usefulness of mathematics innatural science is something bordering on the mysterious"
The typical interpretation of Wigner test is as follows:
premise math concepts arise from aesthetic impulse in humanspremise is unreasonable to think that those same impulses are
e�ectiveobservation nevertheless it so happens that they are e�ectiveconsequence it follows that math concepts are unreasonably
e�ective (assuming that the aesthetic premise asvalid)
e.g imaginary numbers, tensor. Math concepts appear and propagate
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited5 / 1
-
The e�ectiveness of mathematics
Wigner did seminal work on group theory applied to discoversymmetry principles
group theory replaced previous methods of analysis in quantummechanics, group pest, �nding invariants instead of seeking forexplicit solution by calculus
The goal of science is not to explain nature (the black box) but toexplain the regularities in the behavior of the object "Not thethings in themselves but the relationships between the things.
(Poincaré)
The search for causal explanation in terms of mathematicalprinciples necessitates the belief of the mathematical structure ofthe universe the c-word
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited6 / 1
-
The e�ectiveness of mathematics
Wigner did seminal work on group theory applied to discoversymmetry principles
group theory replaced previous methods of analysis in quantummechanics, group pest, �nding invariants instead of seeking forexplicit solution by calculus
The goal of science is not to explain nature (the black box) but toexplain the regularities in the behavior of the object "Not thethings in themselves but the relationships between the things.
(Poincaré)
The search for causal explanation in terms of mathematicalprinciples necessitates the belief of the mathematical structure ofthe universe the c-word
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited6 / 1
-
The e�ectiveness of mathematics
Wigner did seminal work on group theory applied to discoversymmetry principles
group theory replaced previous methods of analysis in quantummechanics, group pest, �nding invariants instead of seeking forexplicit solution by calculus
The goal of science is not to explain nature (the black box) but toexplain the regularities in the behavior of the object "Not thethings in themselves but the relationships between the things.
(Poincaré)
The search for causal explanation in terms of mathematicalprinciples necessitates the belief of the mathematical structure ofthe universe the c-word
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited6 / 1
-
The e�ectiveness of mathematics
We are "lucky" that regularities exist and that we can grasp themmathematically
This is Newton's contribution and this is in essence why deeplearning works
Regularities are invariant with respect to space and time.A,B...→ X,Y.... under T T (A), T (B)→ T (X), T (Y )Convolutional networks exploit image invariance to work (A cat isa cat is a cat)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited7 / 1
-
The e�ectiveness of mathematics
We are "lucky" that regularities exist and that we can grasp themmathematically
This is Newton's contribution and this is in essence why deeplearning works
Regularities are invariant with respect to space and time.A,B...→ X,Y.... under T T (A), T (B)→ T (X), T (Y )Convolutional networks exploit image invariance to work (A cat isa cat is a cat)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited7 / 1
-
The e�ectiveness of mathematics
We are "lucky" that regularities exist and that we can grasp themmathematically
This is Newton's contribution and this is in essence why deeplearning works
Regularities are invariant with respect to space and time.A,B...→ X,Y.... under T T (A), T (B)→ T (X), T (Y )Convolutional networks exploit image invariance to work (A cat isa cat is a cat)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited7 / 1
-
>>
t =√
2sg
What makes possible for us to discoverregularities is the division betweeninitial conditions and regularities.
Laws of nature are IF initialconditions THEN event.
That's why causality is so hard, weneed to include/exclude all possiblecombination of antecedents (initialconditions)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited8 / 1
-
>>
Good doesn't play dice eg. stochasticbrownian motion
Our knowledge of nature contains 'a strangehierarchy' (Events we observed → Laws(regularities to discover) → Symmetry(invariance principles)
The future is always uncertain butnevertheless there are correlations - laws-that we can discover
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited9 / 1
-
>>
Good doesn't play dice eg. stochasticbrownian motion
Our knowledge of nature contains 'a strangehierarchy' (Events we observed → Laws(regularities to discover) → Symmetry(invariance principles)
The future is always uncertain butnevertheless there are correlations - laws-that we can discover
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited9 / 1
-
>>
Good doesn't play dice eg. stochasticbrownian motion
Our knowledge of nature contains 'a strangehierarchy' (Events we observed → Laws(regularities to discover) → Symmetry(invariance principles)
The future is always uncertain butnevertheless there are correlations - laws-that we can discover
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited9 / 1
-
AI, Machine Learning, Deep Learning
>>
AI
Machine Learning
ANN are non linear mapping systems whosefunctioning principles are vaguely based onthe nervous systems of mammals
Deep learning
Data the most valuable asset and computation is a cheap commodity(informationwants to be free)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited10 / 1
-
Perceptron
>>
y = f(∑k
wxxk) (1)
"A Logical Calculus of Ideas Immanent in NervousActivity McCulloch, Pitts, 1943"'If it doesn't rain (x1w1) and homework done(x2w2), go to the movies y (output)'
neurons with a binary threshold activationfunction analogous to �rst order logicsentences
By itself a neuron (or an ann) does very littlebut a su�ciently large network withappropriate structure and properly chosenweights can approximate with arbitraryaccuracy any function
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited11 / 1
-
Perceptron
A perceptron is any feedforward network of nodes with responseslike equation ??.
y = f(∑k
wxxk) = f(z) (2)
In general, f is bounded nondecreasing nonlinear squeezing functionfunction, eg. the sigmoid
f(z) =1
1 + e−z, f ′(z) =
e−z
(1 + e−z)2
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited12 / 1
-
Perceptron
A perceptron is any feedforward network of nodes with responseslike equation ??.
y = f(∑k
wxxk) = f(z) (2)
In general, f is bounded nondecreasing nonlinear squeezing functionfunction, eg. the sigmoid
f(z) =1
1 + e−z, f ′(z) =
e−z
(1 + e−z)2
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited12 / 1
-
Perceptron
Other choices are the tanh, step function and more recently therelu function .
y = ReLU(z) = max(0, z), y′ = 1, z > 0
ReLu works better, faster (gradient constant), y'(0) approximatedy = ln(1.0 + ex)
Reduced likelihood of gradient to vanish
Sparsity produced when z ≤ 0, sigmoids on the other hand tend torepresent more dense representations
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited13 / 1
-
Perceptron
Other choices are the tanh, step function and more recently therelu function .
y = ReLU(z) = max(0, z), y′ = 1, z > 0
ReLu works better, faster (gradient constant), y'(0) approximatedy = ln(1.0 + ex)
Reduced likelihood of gradient to vanish
Sparsity produced when z ≤ 0, sigmoids on the other hand tend torepresent more dense representations
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited13 / 1
-
Perceptron
Other choices are the tanh, step function and more recently therelu function .
y = ReLU(z) = max(0, z), y′ = 1, z > 0
ReLu works better, faster (gradient constant), y'(0) approximatedy = ln(1.0 + ex)
Reduced likelihood of gradient to vanish
Sparsity produced when z ≤ 0, sigmoids on the other hand tend torepresent more dense representations
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited13 / 1
-
Perceptron
Other choices are the tanh, step function and more recently therelu function .
y = ReLU(z) = max(0, z), y′ = 1, z > 0
ReLu works better, faster (gradient constant), y'(0) approximatedy = ln(1.0 + ex)
Reduced likelihood of gradient to vanish
Sparsity produced when z ≤ 0, sigmoids on the other hand tend torepresent more dense representations
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited13 / 1
-
What can and can't perceptrons do?
(Single-layer) perceptrons can correctly classify only data sets thatare linearly separable (they can be separated by a hyperplane)
The XOR function is famously non linearly separable and this isvery important because many classi�cation problems are notlinearly separable.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited14 / 1
-
What can and can't perceptrons do?
(Single-layer) perceptrons can correctly classify only data sets thatare linearly separable (they can be separated by a hyperplane)
The XOR function is famously non linearly separable and this isvery important because many classi�cation problems are notlinearly separable.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited14 / 1
-
What can and can't perceptrons do?
There are 22dboolean functions of d boolean input variables and
only O(222) are linearly separable.
For d=2 14/16 are linearly separable (XOR and its complement arethe exceptions), but for d = 4, only 1882/65536 are linearlyseparable.
Although at that time it was known that multilayer networks weremore powerful than single layer ones, the learning algorithms formultilayer architectures were not known
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited15 / 1
-
What can and can't perceptrons do?
There are 22dboolean functions of d boolean input variables and
only O(222) are linearly separable.
For d=2 14/16 are linearly separable (XOR and its complement arethe exceptions), but for d = 4, only 1882/65536 are linearlyseparable.
Although at that time it was known that multilayer networks weremore powerful than single layer ones, the learning algorithms formultilayer architectures were not known
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited15 / 1
-
Deep networks
ANN learn by example and use backpropagation
If data are well-behaved it will learn not only the trainingexamples but also the underlying relationships
ANN are adaptive and self-repairing, also has some fault tolerancedue to its redundant parallel structure (dense connectivity makes itresilient to minor damage, graceful degradation)
Units within a layer are independent so they can be be evaluatedsimultaneously eg. network with 2,000 nodes in two layers willproduce a response in 2 time steps rather than in 2,000 steps ifeach neuron required to be processes serially (dependent)
Until the advent of GPUs this advantage were not fully exploitedby computers
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited16 / 1
-
Deep networks
ANN learn by example and use backpropagation
If data are well-behaved it will learn not only the trainingexamples but also the underlying relationships
ANN are adaptive and self-repairing, also has some fault tolerancedue to its redundant parallel structure (dense connectivity makes itresilient to minor damage, graceful degradation)
Units within a layer are independent so they can be be evaluatedsimultaneously eg. network with 2,000 nodes in two layers willproduce a response in 2 time steps rather than in 2,000 steps ifeach neuron required to be processes serially (dependent)
Until the advent of GPUs this advantage were not fully exploitedby computers
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited16 / 1
-
Deep networks
ANN learn by example and use backpropagation
If data are well-behaved it will learn not only the trainingexamples but also the underlying relationships
ANN are adaptive and self-repairing, also has some fault tolerancedue to its redundant parallel structure (dense connectivity makes itresilient to minor damage, graceful degradation)
Units within a layer are independent so they can be be evaluatedsimultaneously eg. network with 2,000 nodes in two layers willproduce a response in 2 time steps rather than in 2,000 steps ifeach neuron required to be processes serially (dependent)
Until the advent of GPUs this advantage were not fully exploitedby computers
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited16 / 1
-
Deep networks
ANN learn by example and use backpropagation
If data are well-behaved it will learn not only the trainingexamples but also the underlying relationships
ANN are adaptive and self-repairing, also has some fault tolerancedue to its redundant parallel structure (dense connectivity makes itresilient to minor damage, graceful degradation)
Units within a layer are independent so they can be be evaluatedsimultaneously eg. network with 2,000 nodes in two layers willproduce a response in 2 time steps rather than in 2,000 steps ifeach neuron required to be processes serially (dependent)
Until the advent of GPUs this advantage were not fully exploitedby computers
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited16 / 1
-
Deep networks
ANN learn by example and use backpropagation
If data are well-behaved it will learn not only the trainingexamples but also the underlying relationships
ANN are adaptive and self-repairing, also has some fault tolerancedue to its redundant parallel structure (dense connectivity makes itresilient to minor damage, graceful degradation)
Units within a layer are independent so they can be be evaluatedsimultaneously eg. network with 2,000 nodes in two layers willproduce a response in 2 time steps rather than in 2,000 steps ifeach neuron required to be processes serially (dependent)
Until the advent of GPUs this advantage were not fully exploitedby computers
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited16 / 1
-
Deep networks
Table: ANN versus real nervous system
MLP Nervous System
feedforward recurrentdense(fullyconnected) sparse(local)O(102,3,4) O(1010), O(1015)static dynamic:spike trains, synchronization, fatigue
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited17 / 1
-
A frame
>> >>
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited18 / 1
-
Why MLP is better than one layer?
>>
y = mx is a system with oneparameter, m, what kind of datasets
can separate? only the linearly
separable ones
>>
y = sin(kx), also has oneparameter, the frequency k, but can
separate any arbitrary distribution
of points in the x-axis
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited19 / 1
-
Universality of MLP
Any bounded function can be approximated with arbitraryaccuracy if enough hidden units are available -multilayerperceptrons are universal approximators
How many layers do we need for this astounding property(universal approximators)? Kolmogorov showed that one hiddenlayer is su�cient
Any continuous function with n variables to a m-dimensionaloutput can be implemented by a network with one hiddden layer
Unfortunately the proof is not constructive, that is, it does not tellhow the weights should be chosen to produce such a function
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited20 / 1
-
Universality of MLP
Any bounded function can be approximated with arbitraryaccuracy if enough hidden units are available -multilayerperceptrons are universal approximators
How many layers do we need for this astounding property(universal approximators)? Kolmogorov showed that one hiddenlayer is su�cient
Any continuous function with n variables to a m-dimensionaloutput can be implemented by a network with one hiddden layer
Unfortunately the proof is not constructive, that is, it does not tellhow the weights should be chosen to produce such a function
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited20 / 1
-
Universality of MLP
Any bounded function can be approximated with arbitraryaccuracy if enough hidden units are available -multilayerperceptrons are universal approximators
How many layers do we need for this astounding property(universal approximators)? Kolmogorov showed that one hiddenlayer is su�cient
Any continuous function with n variables to a m-dimensionaloutput can be implemented by a network with one hiddden layer
Unfortunately the proof is not constructive, that is, it does not tellhow the weights should be chosen to produce such a function
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited20 / 1
-
Universality of MLP
Any bounded function can be approximated with arbitraryaccuracy if enough hidden units are available -multilayerperceptrons are universal approximators
How many layers do we need for this astounding property(universal approximators)? Kolmogorov showed that one hiddenlayer is su�cient
Any continuous function with n variables to a m-dimensionaloutput can be implemented by a network with one hiddden layer
Unfortunately the proof is not constructive, that is, it does not tellhow the weights should be chosen to produce such a function
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited20 / 1
-
How important is the universality of MLP?
Is it universal approximation a rare property? Not really, manyother systems such as polynomials, trigonometric polynomials (egFourier series), wavelets, kernel regression systems (svm) have alsouniversal properties
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited21 / 1
-
Architecture
>>
>>
First layer detects the edges, andthe second has the abstract conceptof loop and straight lines, this isactually the hope of having a layerstructure and it works becausewhat Wigner already said
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited22 / 1
-
Gradient descent
Cost C(w), the gradient dC(w)dw = 0 (huge column vector with784 + 16 ∗ 16 + 16 ∗ 10 + 16 + 16 + 10 dimensions).The negative of the gradient which is the direction of the steepestincrease gives the direction to take to decrease the error(cost) morequickly
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited23 / 1
-
Backprop
The method to calculate the gradient vector, which tells you whichdirection to take and how step the step is
1 compute ∇C2 take step in −∇C direction3 repeat
Learning is �nding the minimizing the weight function.Backprop is the algo used in gradient descent.Learning is 'just' �nding the right weights and biases.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited24 / 1
-
Backprop in action, chain rule
The cost of one training example isC0 = (a
L−y)2, the last activationis aL = σ(wLa
L−1 + bL) = σ(zL)
How sensitive is the Cost function to smallchanges in the weight?
∂C0∂wL
= ∂zL
∂wL∂aL
∂zL∂C0∂aL
∂C0∂aL
= 2(aL − y), ∂aL∂zL
= σ′(zL),∂zL
∂wL= aL−1
Average over all training examples∂C∂wL
= 1n∑n−1
k=0∂Ck∂wL
∇C = [ ∂C∂w1
, ∂C∂b1, ..., ∂C
∂wL]
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited25 / 1
-
Curse of dimensionality
Curse of dimensionality refers to the apparent intractability ofsystematically searching through a high-dimensional space
As n get bigger it gets harder and harder to sample all the boxes,with n dimensions each allowing for m states, we will have mn
possible combinations
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited26 / 1
-
Blessing of dimensionality
In MLP approximation error decreases with the number of trainingsamples error O(1/sqrtN) and also with the number of hiddenunits error O(1/M) and unlike other systems, eg polynomials thisis independent of the input size and avoid the curse ofdimensionality problem.
From these results we can build bounds, for example
N > O(Mp/�) (3)
where N is the number of samples, M the hidden nodes, p inputdimension (Mp number of parameters) and � the desiredapproximation error.
More layers is better and do not harm
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited27 / 1
-
Bias variance trade o�
Bias�variance tradeo� is the problem of simultaneously minimizing twosources of error in the estimand. The bias-variance decomposition:
MSE = E((θ̂−θ)2) = E(θ̂−θ)2+V ar(θ̂)+ = (Bias(θ))2+V ar(θ) (4)
The bias/variance trade o� in deep learning is not exactly a trade o� itcan be tackle algorithmically
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited28 / 1
-
Bias variance trade o�
Table: Bias variance
high var high bias high bias and var low bias and var
2% 15% 15% 0.5%11% 15% 30% 1%
you don't have the dialectical tension one thing or the other but in thetable we have 4 cases rather than a trade o� and luckily we can takeaction that �t every case.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited29 / 1
-
Bias variance trade o�
A bigger network will improve your�tting without hurting the varianceproblem, with the caveat that youregularize properly.
Before we couldn't make better onewithout hurting the other, now we canget both better.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited30 / 1
-
Ensemble models
Idea: you don't want an organization with all the same('good') youmay want to introduce variability
decision trees are grown by introducing a random element, eg ateach node choose randomly the features to split the node
Random forest (randomly constructed trees), each voting for aclass, Bagging: boosting + aggregation.Great predictors but interpretability is obscured by the complexityof the model -accuracy generally requires more complex prediction methods-
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited31 / 1
-
Computational Topology
>>
topology is concernedwith the properties ofspace that are preser-ved under continuousdeformations: stretching,crumpling and bending,but not tearing or gluing
Topology is an intermediate analysis mediumthat focuses on coarse structures.
Why to use topology over Big data?
It studies the invariants of continuousformations of the shape of data -resistant tothreshold selection problem-It allows measures of shape (clumps, holesand voids) which are invariant across scales
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited32 / 1
-
Persistent homology
>>
>>
Edges in a graph capturedyadic relationships.
Graphs can't capture highorder relationships butsimplicial complex can
A simplicial complex is ageneralized graph consisting onvertices, edges, triangles andsimplices of higher dimensionglue together.
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited33 / 1
-
Persistent homology
>>
C0(X) =< v1, v2, v3, v4 >,C1(X) =<e1, e2, e3, e4, e5 >,C2(X) = σ1
boundaryoperatorρ : C1(X)→C0(X), ρ2 :C2(X)→ C1(X)when applied toan edge it yieldsa di�erence ofvertices, higherorder operatorto act ontriangles(2-simplices),
Loop is when wehaveρ1(e1+e2+e3) =0 =ρ1(e1 + e5 + e4),both loops are inthe kernel of ρ1,Ker(ρ1) = {x ∈C1(X), ρ1(x) =0}
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited34 / 1
-
Persistent homology
>>
e1 + e2 + e3 is obtained as the image of thetriangle σ1 under the map ρ2, whereas is notthe image of a triangle, in other words,Im(ρ2) = {y ∈ C1, ∃x ∈ C2(X), ρ2(x) = y}),then e1 + e2 + e3 ∈ Im(ρ2) ande1 + e5 + e4 6∈ Im(ρ2).The 1-D homology is the quotient spaceH1(X) = [Ker(ρ1)/Im(ρ2)]
Hi(X) =Ker(ρi)
Im(ρi+1(5)
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited35 / 1
-
Persistent homology
>>
>>
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited36 / 1
-
Conclusions
With enough imagination a classi�er(regression) can be useful tosolve a large a number of problems
Deep learning works because there is structure in the world but wedon't know why because we don't know anything about the initialconditionslaws of nature are precise beyond anything reasonable; we know virtuallynothing about the initial conditions (Wigner)
There are other ways to reduce complexity in big data whilepreserving maximal intrinsic information -computational topology
Occam's dilemma (lex parsimoniae): accuracy generally requiresmore complex prediction methods, simple and interpretablefunctions do not make the most accurate predictions
The curse of dimensionality can be a blessing
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited37 / 1
-
Thanks!
Jaime Gómez-RamírezThe unreasonable e�ectiveness of mathematics, revisited38 / 1