DATA ANALYTICS IN NANOMATERIALS DISCOVERY...10 | Data Analytics for Nanomaterials| Michael Fernandez...
Transcript of DATA ANALYTICS IN NANOMATERIALS DISCOVERY...10 | Data Analytics for Nanomaterials| Michael Fernandez...
www.data61.csiro.au
DATA ANALYTICS IN NANOMATERIALS DISCOVERY
Michael Fernandez | OCE-Postdoctoral Fellow September 2016
Data Analytics for Nanomaterials| Michael Fernandez3 |
Materials Discovery Process
Integrating computational methods and informationwith sophisticated computational and analyticaltools to shorten the duration of materialsdevelopment from 10-20 years to 2 or 3 years.
Materials Genome Project
Data Analytics for Nanomaterials| Michael Fernandez2 |
Materials and Molecular Modeling
• Deep learning and GPU computationsChris Watkins
• Big data and HPC integrationPiotr SzulYulia Arzhaeva
Collaborators
Team leaderAmanda Barnard
Sun Baichuan
Data Analytics for Nanomaterials| Michael Fernandez2 |
Outline
• Material Discovery Process• Methods for atomistic simulations of materials
• Experimental confirmation
• Data-driven Computatioanl Nanomaterials Discovery• Hypothetical material space sampling (structure generation and statistics)
• In silico high-throughput characterization (atomistic simulations and machine learning)
• Data storage, analytics, exploitation and integration
Data Analytics for Nanomaterials| Michael Fernandez4 |
Atomistic Simulations of MaterialsTheory can predict properties of materials
𝐻 = 𝐸𝑛𝑢𝑐𝑙𝑒𝑖({𝑅𝐼})-σ𝑖=1𝑁𝑒 𝛻𝑖
2 + 𝑉𝑛𝑢𝑐𝑙𝑒𝑖 𝑟𝑖 +1
2σ𝑖≠𝑗𝑁𝑒 1
𝑟𝑖−𝑟𝑗
• Quantum chemistry methods can cover any chemistry
• Empirical potentials exist for large number of elements
• Computation is scalable and generically deployable
Data Analytics for Nanomaterials| Michael Fernandez5 |
Atomistic Simulations of Materials
Computational predictions later confirmed by experiments
• Self-assembly mechanism of nanodiamonds
Barnard, A. S. et al. Nanoscale 3, 958–62 (2011)
The arrow indicates (111)|(111) interface between two 4 nm sized nanodiamonds.
DFTB simulations of the surface electrostatic potential of dodecahedral diamond nanoparticles of a) 2.2 nm and b) 2.5 nm
Data Analytics for Nanomaterials| Michael Fernandez6 |
Materials libraries
Experiment design
DatabasePerformance
Measurement
Lead materials scale up
Knowledge for rational
design
Banks of materials
Hypothetical materials
In-silico screening
Modern materials discovery cycle
• Polydispersive systems• Exponential increase in
complexity and diversity• Nearly infinite combinatorial
problem
Potyrailo, R. et al. ACS Comb. Sci. 13,579–633 (2011).
Nano-
Theory, modeling and
informatics
Data Analytics for Nanomaterials| Michael Fernandez7 |
Nanomaterials ScreeningDeparting from the Edisonian approach
vs.
We can accurately predict a property, so it can be computed for entire materials spaces
Combinatorial In silico design
In silico structure generation
Data Analytics for Nanomaterials| Michael Fernandez7 |
Nanomaterials Screening
• Systematic and extensive materials performance ranking.
• “Big data” discovery of structure-property relationships in unknown materials domains.
• Accelerated identification of high potential candidates and rational design principles.
Data Analytics for Nanomaterials| Michael Fernandez7 |
Data Analytics Challenges
Information representation• Fingerprints
Information extraction• Multivariate statistical analysis
Knowledge discovery• Data mining and machine learning
Knowledge representation• Visualization
Data Analytics for Nanomaterials| Michael Fernandez9 |
Polydispersity Challenge in Nanomaterials
Polydispersive sample Quasi-monodisperse sample
Purification
Controlled synthesis
$$$
Polydispersity can be detrimental for high-performing applications
Purification of polydispersive nanoparticles samples is expensive
Data Analytics for Nanomaterials| Michael Fernandez10 |
Data Analytics of Nanocarbons
Nanodiamonds Graphenes
Virtual structures relaxed using TB-DFT
Data Analytics for Nanomaterials| Michael Fernandez10 |
Archetypal Analysis (AA)
The predictors of Xi are finite mixtures of archetypes Zj, which are convex combinations of the observations.
Finds a k m matrix Z that corresponds to the archetypal or ”pure patterns” in the data in such away that each data point can be represented as a mixture of those archetypes. In other words, thearchetypal analysis yields the two n k coefficient matrices α and β, which minimize the residualsum of squares:
Cutler, A. & Breiman, L. Archetypal Analysis. Technometrics 36, 338–347 (1994).
Data Analytics for Nanomaterials| Michael Fernandez11 |
Archetypal Analysis of Nanocarbons
Nanodiamonds
Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980–11992 (2015).
Graphene nanoflakes
Data Analytics for Nanomaterials| Michael Fernandez13 |
Nanocarbons Prototypes
Nanodiamonds prototypes
Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980–11992 (2015).
Graphene prototypes
Data Analytics for Nanomaterials| Michael Fernandez13 |
Estimation of Nanodiamonds Properties
Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980–11992 (2015).
Data Analytics for Nanomaterials| Michael Fernandez13 |
Structural Diversity Challenge
Defects, oxidation and edge passivation yield large structural diversity
Graphene nanoflakes
Trigonal Rectangular Hexagonal
Data Analytics for Nanomaterials| Michael Fernandez13 |
PP
P
Silicon qbits
Structural Diversity Challenge
Single Si substitution by P yields a large structural diversity
Data Analytics for Nanomaterials| Michael Fernandez13 |
Metal-Organic Framework (MOF)
nitroimidazole
benzimidazole
Zn2+ CO2 capture and sequestration(Science, 2008)
ZIF-68
Structural Diversity Challenge
Data Analytics for Nanomaterials| Michael Fernandez13 |
Metal-Organic Framework (MOF)
In-silico Combinatorial design
Structural Diversity Challenge
Data Analytics for Nanomaterials| Michael Fernandez13 |
Metal-Organic Framework (MOF)
Modification with of 35 functional groups gives a total of ~1.5 million (354) unique combinations
MOF ZBPOrganic Linkersa) b)
Structural Diversity Challenge
Data Analytics for Nanomaterials| Michael Fernandez13 |
Machine Learning Approach
Machine learning prediction of functional properties
Machine learningFeature fingerprints
Data Analytics for Nanomaterials| Michael Fernandez13 |
Data Analytics Challenge
Binary decision tree of the Band Gap of graphene
Fernandez, M., Shi, H. & Barnard, A. S Carbon (2016). doi:10.1016/j.carbon.2016.03.005
Features• Surface area• Number of atoms• Shape aspect ratio
Accuracy80%
Data Analytics for Nanomaterials| Michael Fernandez13 |
Machine Learning vs. Atomistic Simulations
Estimation of the graphene Band Gap from topological features
Pi and Pj , are the values of a bond order of thecarbon atoms in graphene, while L is the topologicaldistance, whilst Lij is a delta function delta function
Fernandez, M. et al. ACS Comb. Sci. (2016) doi:10.1021/acscombsci.6b00094
Data Analytics for Nanomaterials| Michael Fernandez13 |
Machine Learning of Graphene
Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506
Radial Distribution Function (RDF) scores for graphene
the summation is over the N atom pairs in the graphene structure, andrij is the distance of these pairs and B is a smoothing parameter set to 10.
Data Analytics for Nanomaterials| Michael Fernandez13 |
Machine Learning vs. Atomistic Simulations
Ionization Potential
Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506
Energy of the Fermi level
From RDF scores
Data Analytics for Nanomaterials| Michael Fernandez13 |
Machine Learning vs. Atomistic SimulationsMachine learning prediction of gas adsorption in MOF
Fernandez, M., et al. J. Phys. Chem. C 117, 14095–14105 (2013).
Data Analytics for Nanomaterials| Michael Fernandez13 |
Accuracy and Coverage Challenges
Syst
em
siz
eAccuracy of electronic calculations methods vs. system size
TBDF/Semiempirical
Density Functional
Accuracy
Quantum Monte Carlo
Coupled Cluster
5000 atoms
500 atoms
100 atoms
20 atoms
Data Analytics for Nanomaterials| Michael Fernandez15 |
Data-driven ChallengeMachine learning for large material spaces
∆3
Machine learning predictions:-Functional property value or threshold-Accuracy of different quantum-chemistry methods =f( )structure
Machine Learning
Partial Database Screening
Full DatabasePredictions
Ramakrishnan, R. et al. J. Chem. Theory Comput. 11, 2087–2096 (2015). Fernandez, M et al. J. Chem. Inf. Model. (2015), 55, 2500-2506
Data Analytics for Nanomaterials| Michael Fernandez13 |
Challenges and Limitations
E
QMC
B3LYP
6,095 isomers of C7H10O2
big gapbig gap
Accuracy Gap Between Different Levels of Theory
Data Analytics for Nanomaterials| Michael Fernandez13 |
Accuracy Gap Predictions
Predictions
Machine learning calibration
Data-driven High-throughput Screening
outputsData storage
Input jobsQueue management
Computational resources
Resubmit or kill failed runs
Finished runs
Accuracy refinement
Data Analytics for Nanomaterials| Michael Fernandez13 |
Self-organization Map (SOM) of NPs
Data Analytics for Nanomaterials| Michael Fernandez13 |
Ag-NP
SOM
Electrostatic Potential
Deep Learning for Nanomaterials
Data Analytics for Nanomaterials| Michael Fernandez13 |
C2
Feature maps10x10
5x5convolution
2x2subsampling
5x5convolution
2x2subsampling Fully connected
input32x32
C1
Feature maps28x28
S1
Feature maps14x14
S2
Feature maps5x5
B1B2
Output
ClassificationFeature extraction
Image32x32
www.data61.csiro.au
Data61Michael Fernandeze [email protected]
w https://research.csiro.au/mmm/
Thank you