Compressed sensing meets symbolic regression: SISSO · Compressed sensing: SISSO SIS:...
Transcript of Compressed sensing meets symbolic regression: SISSO · Compressed sensing: SISSO SIS:...
-
Compressed sensingmeets
symbolic regression:SISSO
- Part 2 -
Luca M. Ghiringhelli
On-line course on Big Data and Artificial Intelligence in Materials Sciences
-
P = c1d1 + c2d2 + … + cndn
Compressed sensing, not only LASSO
-
Residual1P (property)
d1d2
P = c1d1 + c2d2 + … + cndn
Residual1P (property)
d1d1*d2
d2*
Compressed sensing, not only LASSO
Greedy method:Orthogonal Matching Pursuit
Limitation of greedy methods:
-
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802
-
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact (by enumeration) overSimilarity criterion in SIS step:● Scalar product (Pearson correlation)● Spearman correlation (captures nonlinear monotonicity)● Mutual information, …● However: computational cost is to be factored in
-
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact solution of:
by enumeration over
Ouyang et al., PRM 2018, DOI: 10.1103/PhysRevMaterials.2.083802
-
Compressed sensing: SISSO
SIS:Sure-Independence Screening
S2DS1D
featuresResidual1D
features
P (property)
SO:Sparsifying Operator
Exact (by enumeration) over
In practice:0. i = 1, S = Ø1. Rank features according to similarity to Residuali-1 (Property = Residual0). 2. Add first k features to S. 3. Perform least-square regression over all n-tuples in S.4. The lowest error model is the i-dimensional SISSO model.5. i ← i+1; goto 1.
P = c1d1 + c2d2 + … + cndn
-
Predicting crystal structures from the composition
Octet binaries (NaCl, ZnS, BN, KF, GaAs, CaO, …)Rock-salt or Zinc-blende structure?
Learning the relative stability from the property of the isolated atomic species
Rock salt6-fold coordinationIonic bonding
Zinc blende4-fold coordinationCovalent bonding
-
KS le
vels
[eV]
Valence p
Valence sRadius @ max
example: Sn (Tin)
Valence p (HOMO)
Valence sKS level s [eV]
LUMO
Atomic features
-
exp(x)
xn
Energy2 Energy1
| x - y |
x / y
Length1 Length2
x / y
exp(-x)
ln(x)
Systematic construction of candidates
Length1 Length2
x + y
x·y
arctan(x)
Length1 Length2
x / y
exp(-x)
Energy2 Energy1
| x - y |
-
Systematic construction of candidates
P = c1d1 + c2d2 + … + cndnEach feature (column in the matrix), is a tree-represented candidate function, projected onto the training data.The (selected) descriptor has as component the features selected by the sparse recovery algorithm (here, SISSO).
-
Structure map from SISSOstarting from 7x2 atomic features
LMG et al., PRL 2015 DOI: 10.1103/PhysRevLett.114.105503LMG et al., NJP 2017 DOI: 10.1088/1367-2630/aa57bf
Predicting crystal structures from the composition
P = c1d1 + c2d2 + …
-
In SISSOthe “hyperparameters” are:
The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …
The size of the feature spacedetermined by the complexity of the tree
Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set
Data-driven model complexity
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
-
In SISSO the “hyperparameters” are:
The level of sparsity i.e., the number of “activated” features in P = c1d1 + c2d2 + …
The size of the feature spacedetermined by the complexity of the tree
Tuned via cross-validation: Iterated random selection of a subset of the data for training + test on the left-out set
Two levels of the tree, formulas like
Three levels of the tree, formulas like
Data-driven model complexity
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
-
Compressed-sensing-based model identification:Shares concepts with
● Regularized regression. But: Massive sparsification.
● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors
● Features (basis-set) selection. But: non-greedy solver.
● Symbolic regression. But: deterministic solver.
Few bits of taxonomy for SISSO
-
Compressed-sensing-based model identification:Shares concepts with
● Regularized regression. But: Massive sparsification.
● Dimensionality reduction. But supervised, and yielding sparse, “interpretable” descriptors
● Features (basis-set) selection. But: non-greedy solver.
● Symbolic regression. But: deterministic solver.
Few bits of taxonomy for SISSO
Open challenges of symbolic regression + compressed sensing approach:● Efficiently include constants and scaling factors in the symbolic tree● Include known, physical invariances in the symbolic-tree construction● Include vectors (and tensors) as features. Contractions?
-
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
-
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,
it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.
● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→
-
x Atomic fractionIE Ionization energyχ Electronegativity
Intepretability: what might endow us with
-
x Atomic fractionIE Ionization energyχ Electronegativity
Intepretability: what might endow us with
-
HgTe (std pressure, ZB)GaAs (std pressure, ZB)
CdTe (std pressure, ZB)
(9 GPa, RS)(29 GPa, oI4)
(4 GPa, RS)
Intepretability: what might endow us with
-
Intepretability: what might endow us with
-
Multi-task learning
-
Application: multi-phase stability diagramProperties: crystal-structure formation energies
d1
d2 RS
CsCl
Multi-task learning
-
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
-
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
-
MT-SISSO is remarkably
data-parsimonious
Multi-task learning
Ouyang et al. J. Phys. Mater. 2019 DOI: 10.1088/2515-7639/ab077b
-
Intepretability
James, Witten, Hastie, Tibshirani, An Introduction to Statistical Learning, Springer (2013)
Flexibility/complexity
Inte
rpre
tabi
lity
● Sparsifying methods, LASSO, SISSO, Symbolic regression● Linear regression
● Kernelized regression● Trees
● Forests● Support vector machine
● Neural Networks
Model Interpretability: related to sparse features selection
In general, with symbolic regression:● If the exact equation is within reach of the searching/optimizing algorithm,
it is found. Simple model does not necessarily mean less accurateFor other powerful ML methods (kernel regression, regression treesand forests, deep learning, this is not the case.
● The few fitting parameters yield stability with respect to noise (low complexity no overfitting)→
Slide 1Slide 2Slide 6Slide 8Slide 12Slide 14Slide 20Slide 21Slide 22Slide 23Slide 25Slide 26Slide 31Slide 32Slide 37Slide 38Slide 39Slide 40Slide 41Slide 43Slide 44Slide 45Slide 50Slide 51Slide 52Slide 53Slide 54Slide 55