Post on 29-Mar-2022
1
Visualization Case StudyCrystal Fingerprinting and STM4 for USPEX output analysis
Ing. Mario Valle
CINECA – 12/06/2012
New CSCS building – computer room
2
Started with postprocessing
Nice images for publications
Movies for conference talks
Fra
nce
sco
Ge
rva
sio
–E
TH
Zü
rich
Data from disjoint subfields
Mario Valle - Visualization Case Study - 12/06/2012
Macromolecules
Crystallography
Help overcome tools inflexibility
For example there are nice crystallographyprograms that do not support dynamic data and do not allow customization
3
STM4 is a frameworkfor the development of
unusual and enhanced techniquesfor chemistry visualization
Offer broader set of techniques
Provide enhanced techniques
Available data with the standard isosurface
Isosurface with the new volume interpolator
4
STM3 Gallery
STM3 modules
The LEGO DNA
5
One of the continuing scandals…
Prediction of the stable crystal structure on the basis of only the chemical composition is one of the central problems of condensed matter physics, which for a long time remained unsolved
The ability to solve this problem would open new ways also for the understanding of the behaviour of materials
Mario Valle - Visualization Case Study - 12/06/2012
Idea: copy genetic evolution
Mario Valle - Visualization Case Study - 12/06/2012
6
Evolution of crystal structures
Mario Valle - Visualization Case Study - 12/06/2012
Examples of USPEX predictions
Novel high pressure
phases of CaCO3
Low-energy 3D carbon structure
40-atom cell of MgSiO3 post-perovskite
7
USPEX discovered new materials
Boron at 1 atm: USPEX easily found the complex α-B structure...
...and discovered also the superhardboron γ-B28 phase (Nature)
Prof. A. Oganov
USPEX structure cancer problem
Mario Valle - Visualization Case Study - 12/06/2012
Different colors means different crystal structures
US
PE
X s
tru
ctu
re c
ance
rNormal structure cluster generation
GenerationGeneration
The problem
Mario Valle - Visualization Case Study - 12/06/2012
USPEX is a crystal structure predictor based on an evolutionary algorithm
Each run produces hundred of putative crystal structures…
…but many of them are equal
So an intensive manual labor is needed to prune duplicated structures
Project: develop a (semi)automaticway to extractunique structuresfrom USPEX outputs
8
Proposed solution from High-Dim
Mario Valle - Visualization Case Study - 12/06/2012
Compute unique coordinates
Define distance measure
Add grouping criteria
Space 100-3000dimensional
Each group describes a distinct structure
Structure “coordinates”
Set of distancesfor each atomin the unit cell
Distance sets concatenated for all atoms in the structure
agg
(coordinate)
Structure “fingerprint” CrystalFp
Set of distancesfor each atomin the unit cell
Distance sets concatenated for all atoms in the structure
agg
(coordinate)
9
Visual design and validation
Mario Valle - Visualization Case Study - 12/06/2012
Built a tool to explore
algorithm choices and
parameters settings
This tool wraps the
classifier library, called
CrystalFp, and provides
various interactive
visual diagnostics to
check classifier
behavior
It is built inside STM4,
the molecular
visualization toolkit
developed at CSCS
1. Load structures
2. Filter on energy
3. Compute fingerprints
4. Compute distances
5. Group structures
The application interface gives access to all CrystalFp algorithms and their parameters in a clear process workflow
STM4 provided an environment that accelerated the implementation
Workflow support
10
Visual diagnostics tools
1. 2D maps
2. Charts
3. Picking for details
4. 2D data export
Various visualization and analysis tools to check and validateCrystalFp algorithms behavior
Visual diagnostics: distance matrix
Mario Valle - Visualization Case Study - 12/06/2012
Distances between structures Distances ordered by group
Clustering visual diagnostic
Mario Valle - Visualization Case Study - 12/06/2012
DFS grouping Pseudo SNN (K=1)
Pseudo SNN (K=5) SNN (K=5)
DFS: Deep first search of the neighbors nodes
Pseudo SNN: Maintain connection between nodes only if they share at least K neighbors
SNN: As above plus a DBSCAN pass
11
Visual diagnostics: scatterplot
Colored by “stress” to detect local minima traps
Colored by group
Diagnostic chart:distances in 2D vs. distances in High‐Dim space
The scatterplot tool in CrystalFp tries to map High‐Dim space points to 2D preserving their relative distances
USPEX problem solved: an example
Hydrogen at 600 GPa (16 atoms)
The USPEX run produced 1274 structures
From these the 794 within 0.5 eV from the lowest energy value found are selected
Manual analysis to remove duplicated structures from this set: ~20h of work
Using the CrystalFp classifier: ~10min
At the end found only 4 unique structures: One α-Ga type (top) One Cs-IV (bottom), the ground state (i.e. the
lower energy structure), and two closely related structures
Mario Valle - Visualization Case Study - 12/06/2012
12
Classifier integration in USPEX
Original USPEX.A lot of identical structures.
USPEX after the classifier integration.No more “structure cancer”!
USPEX structure cancer
GenerationGeneration
Sol
ved
at
the
root
!
From the problem solution …
Mario Valle - Visualization Case Study - 12/06/2012
Compute unique coordinates
Define distance measure
Space 100-3000dimensional
Add grouping criteria
Each group describes a distinct structure
13
… to a new paradigm
Mario Valle - Visualization Case Study - 12/06/2012
Compute unique coordinates
Define distance measure
Space 100-3000dimensional
High Dim space tools
To look at crystal structures from a novel perspective
Unfold data to lower dimensions
Mario Valle - Visualization Case Study - 12/06/2012
One famous test dataset (right I said right!) contains points on a rolled sheet that forms a 3D shape called the “Swiss roll” (a superb example on the left)
Multidimensional scaling projects points from high dimensional space to a lower dimensional one preserving distances between points as faithfully as possible
Sammon mapping to 2D
CCA mapping
CrystalFp multi dim. scaling
Mario Valle - Visualization Case Study - 12/06/2012
The scatterplot tool in
CrystalFp implements a
Force Directed Placement
multidimensional scaling
algorithm (here the points
are colored by energy)
Nanocluster data from Dr. Gareth Tribello (USI)
14
15
Study of energy landscapes
Mario Valle - Visualization Case Study - 12/06/2012
A. R. Oganov and M. Valle,How to quantify energy landscapes of solids,The Journal of Chemical Physics, vol. 130,p. 104504, 2009.
Energy landscape of Au8Pd4 system
More complex landscapes
Mario Valle - Visualization Case Study - 12/06/2012
Energy landscape for MgO with 32 atoms/cell
New quantities: quasi-entropy
For each given structure, quasi‐entropy is a measure of disorder and complexity of that structure.
Sstr is better correlated to energy than Steinhardt’s Q6Sstr is better correlated to energy than Steinhardt’s Q6
Si structures developing defects
16
(Totally) unexpected correlations
• We found unexpected correlations between distance and other physical variables
• For example the deceptively simple H2O shows clear correlations and grouping
• This and other datasets motivated us to continue the exploration of the crystal fingerprints’ space…
“And roughly the only mechanism for suggesting questions is exploratory”
A conversation with John W. Tukey and Elizabeth TukeyLuisa T. Fernholz and Stephan Morgenthaler
Statistical ScienceVolume 15, Number 1 (2000), 79-94
Atoms per cell
Fin
gerp
rint c
utof
f
General law
Cry
sta
lFp
–P
ara
me
tric
stu
dy
17
Usage:CrystalFp [options] POSCARfile [ENERGIESfile]
‐v ‐‐verbose (optional argument)Verbose level (if no argument, defaults to 1)
‐? ‐h ‐‐help (no argument)This help
‐t ‐‐elements (required argument)List of chemical elements
‐es ‐‐max‐step ‐‐end‐step (required argument)Last step to load (default: all)
‐ss ‐‐start‐step (required argument)First step to load (default: first)
‐et ‐‐energy‐per‐structure (no argument)Energy from file is per structure, not per atom
‐e ‐‐energy‐threshold (required argument)Energy threshold
‐r ‐‐threshold‐from‐min (required argument)Threshold from minimum energy
‐c ‐‐cutoff‐distance (required argument)Fingerprint forced cutoff distance
‐n ‐‐nano‐clusters ‐‐nanoclusters (no argument)The structures are nanoclusters, not crystals
‐b ‐‐bin‐size (required argument)Bin size for the pseudo‐diffraction methods
‐p ‐‐peak‐size (required argument)Peak smearing size
...
18
10 20 30
0.1
0.2
0.3
0.4
Depth vs. Order
Degree.of.order..F2.
Dep
th
2 5 10 20
0.0
50
.10
0.2
0
Depth vs. Order
Degree.of.order..F2.
Dep
th
2 5 10 20
0.0
50
.10
0.2
0
Depth vs. Order
Degree.of.order..F2.
Dep
th
19
Interesting correlations
Mario Valle - Visualization Case Study - 12/06/2012
Searching an explanation
Mario Valle - Visualization Case Study - 12/06/2012
Distance vs. dimensionality
GaAs 8 atoms/cellcutoff: 3 – 30 Ådimensionality: 180 – 1800
20
Distance decomposition
Mario Valle - Visualization Case Study - 12/06/2012
GaAs 8 atoms/cellcutoff: 30 Ådimensionality: 1800
But this one?
Mario Valle - Visualization Case Study - 12/06/2012
Structure: Au8Pd4
Cutoff: 30 ÅDimension: 1800
Synthetic datasets
8 atoms with uniformly distributed random fractional coordinates in a cubic unit cell with 5 Å side
Distance distribution vs. embedding dimension
Intrinsic dimension vs. embedding dimension
21
Lessons learned
Mario Valle - Visualization Case Study - 12/06/2012
From the Modeling side Using known concepts in unusual contexts is a
source of unexpected insights Discoveries happen on the boundaries of disciplines “Seeing is believing” and convincing. Then the
domain experts become a source of ideas
From the Visual Analysis side Quick prototyping and experimentation capabilities
are critical (that is, STM4 is a big help) No need of fancy visualizations. What are needed
are visualizations tuned to the problem at hand Data management is critical to keep order in the
data exploration
http://mariovalle.name/CrystalFphttp://mariovalle.name/CrystalFp
http://mariovalle.name/STM4http://mariovalle.name/STM4
Going together…
Thank you!Thank youfor your attention!
And don’t forget: mvalle@cscs.ch