DATA ANALYTICS IN NANOMATERIALS DISCOVERY...10 | Data Analytics for Nanomaterials| Michael Fernandez...

www.data61.csiro.au

DATA ANALYTICS IN NANOMATERIALS DISCOVERY

Michael Fernandez | OCE-Postdoctoral Fellow September 2016

Data Analytics for Nanomaterials| Michael Fernandez3 |

Materials Discovery Process

Integrating computational methods and informationwith sophisticated computational and analyticaltools to shorten the duration of materialsdevelopment from 10-20 years to 2 or 3 years.

Materials Genome Project


Materials and Molecular Modeling

• Deep learning and GPU computationsChris Watkins

• Big data and HPC integrationPiotr SzulYulia Arzhaeva

Collaborators

Team leaderAmanda Barnard

Sun Baichuan


Outline

• Material Discovery Process• Methods for atomistic simulations of materials

• Experimental confirmation

• Data-driven Computatioanl Nanomaterials Discovery• Hypothetical material space sampling (structure generation and statistics)

• In silico high-throughput characterization (atomistic simulations and machine learning)

• Data storage, analytics, exploitation and integration


Atomistic Simulations of MaterialsTheory can predict properties of materials

𝐻 = 𝐸𝑛𝑢𝑐𝑙𝑒𝑖({𝑅𝐼})-σ𝑖=1𝑁𝑒 𝛻𝑖

2 + 𝑉𝑛𝑢𝑐𝑙𝑒𝑖 𝑟𝑖 +1

2σ𝑖≠𝑗𝑁𝑒 1

𝑟𝑖−𝑟𝑗

• Quantum chemistry methods can cover any chemistry

• Empirical potentials exist for large number of elements

• Computation is scalable and generically deployable


Atomistic Simulations of Materials

Computational predictions later confirmed by experiments

• Self-assembly mechanism of nanodiamonds

Barnard, A. S. et al. Nanoscale 3, 958–62 (2011)

The arrow indicates (111)|(111) interface between two 4 nm sized nanodiamonds.

DFTB simulations of the surface electrostatic potential of dodecahedral diamond nanoparticles of a) 2.2 nm and b) 2.5 nm


Materials libraries

Experiment design

DatabasePerformance

Measurement

Lead materials scale up

Knowledge for rational

design

Banks of materials

Hypothetical materials

In-silico screening

Modern materials discovery cycle

• Polydispersive systems• Exponential increase in

complexity and diversity• Nearly infinite combinatorial

problem

Potyrailo, R. et al. ACS Comb. Sci. 13,579–633 (2011).

Nano-

Theory, modeling and

informatics


Nanomaterials ScreeningDeparting from the Edisonian approach

vs.

We can accurately predict a property, so it can be computed for entire materials spaces

Combinatorial In silico design

In silico structure generation


Nanomaterials Screening

• Systematic and extensive materials performance ranking.

• “Big data” discovery of structure-property relationships in unknown materials domains.

• Accelerated identification of high potential candidates and rational design principles.


Data Analytics Challenges

Information representation• Fingerprints

Information extraction• Multivariate statistical analysis

Knowledge discovery• Data mining and machine learning

Knowledge representation• Visualization

https://dl.dropboxusercontent.com/u/31192991/sompca-big/som.html


Polydispersity Challenge in Nanomaterials

Polydispersive sample Quasi-monodisperse sample

Purification

Controlled synthesis

$$$

Polydispersity can be detrimental for high-performing applications

Purification of polydispersive nanoparticles samples is expensive


Data Analytics of Nanocarbons

Nanodiamonds Graphenes

Virtual structures relaxed using TB-DFT


Archetypal Analysis (AA)

The predictors of Xi are finite mixtures of archetypes Zj, which are convex combinations of the observations.

Finds a k m matrix Z that corresponds to the archetypal or ”pure patterns” in the data in such away that each data point can be represented as a mixture of those archetypes. In other words, thearchetypal analysis yields the two n k coefficient matrices α and β, which minimize the residualsum of squares:

Cutler, A. & Breiman, L. Archetypal Analysis. Technometrics 36, 338–347 (1994).


Archetypal Analysis of Nanocarbons

Nanodiamonds

Fernandez, M. & Barnard, A. S. ACS Nano 9, 11980–11992 (2015).

Graphene nanoflakes


Nanocarbons Prototypes

Nanodiamonds prototypes


Graphene prototypes


Estimation of Nanodiamonds Properties



Structural Diversity Challenge

Defects, oxidation and edge passivation yield large structural diversity

Graphene nanoflakes

Trigonal Rectangular Hexagonal


PP

P

Silicon qbits


Single Si substitution by P yields a large structural diversity


Metal-Organic Framework (MOF)

nitroimidazole

benzimidazole

Zn2+ CO2 capture and sequestration(Science, 2008)

ZIF-68




In-silico Combinatorial design




Modification with of 35 functional groups gives a total of ~1.5 million (354) unique combinations

MOF ZBPOrganic Linkersa) b)



Machine Learning Approach

Machine learning prediction of functional properties

Machine learningFeature fingerprints


Data Analytics Challenge

Binary decision tree of the Band Gap of graphene

Fernandez, M., Shi, H. & Barnard, A. S Carbon (2016). doi:10.1016/j.carbon.2016.03.005

Features• Surface area• Number of atoms• Shape aspect ratio

Accuracy80%


Machine Learning vs. Atomistic Simulations

Estimation of the graphene Band Gap from topological features

Pi and Pj , are the values of a bond order of thecarbon atoms in graphene, while L is the topologicaldistance, whilst Lij is a delta function delta function

Fernandez, M. et al. ACS Comb. Sci. (2016) doi:10.1021/acscombsci.6b00094


Machine Learning of Graphene

Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506

Radial Distribution Function (RDF) scores for graphene

the summation is over the N atom pairs in the graphene structure, andrij is the distance of these pairs and B is a smoothing parameter set to 10.


Machine Learning vs. Atomistic Simulations

Ionization Potential

Fernandez, M.; Shi, H.; Barnard, A et al. J. Chem. Inf. Model. (2015), 55, 2500-2506

Energy of the Fermi level

From RDF scores


Machine Learning vs. Atomistic SimulationsMachine learning prediction of gas adsorption in MOF

Fernandez, M., et al. J. Phys. Chem. C 117, 14095–14105 (2013).


Accuracy and Coverage Challenges

Syst

em

siz

eAccuracy of electronic calculations methods vs. system size

TBDF/Semiempirical

Density Functional

Accuracy

Quantum Monte Carlo

Coupled Cluster

5000 atoms

500 atoms

100 atoms

20 atoms


Data-driven ChallengeMachine learning for large material spaces

∆3

Machine learning predictions:-Functional property value or threshold-Accuracy of different quantum-chemistry methods =f( )structure

Machine Learning

Partial Database Screening

Full DatabasePredictions

Ramakrishnan, R. et al. J. Chem. Theory Comput. 11, 2087–2096 (2015). Fernandez, M et al. J. Chem. Inf. Model. (2015), 55, 2500-2506


Challenges and Limitations

E

QMC

B3LYP

6,095 isomers of C7H10O2

big gapbig gap

Accuracy Gap Between Different Levels of Theory


Accuracy Gap Predictions

Predictions

Machine learning calibration

Data-driven High-throughput Screening

outputsData storage

Input jobsQueue management

Computational resources

Resubmit or kill failed runs

Finished runs

Accuracy refinement


Self-organization Map (SOM) of NPs


Ag-NP

SOM

Electrostatic Potential

Deep Learning for Nanomaterials


C2

Feature maps10x10

5x5convolution

2x2subsampling

5x5convolution

2x2subsampling Fully connected

input32x32

C1

Feature maps28x28

S1

Feature maps14x14

S2

Feature maps5x5

B1B2

Output

ClassificationFeature extraction

Image32x32

www.data61.csiro.au

Data61Michael Fernandeze [email protected]

w https://research.csiro.au/mmm/

Thank you

DATA ANALYTICS IN NANOMATERIALS DISCOVERY...10 | Data Analytics for Nanomaterials| Michael Fernandez...

Documents

Transcript of DATA ANALYTICS IN NANOMATERIALS DISCOVERY...10 | Data Analytics for Nanomaterials| Michael Fernandez...