CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed...
-
Upload
vuonghuong -
Category
Documents
-
view
223 -
download
0
Transcript of CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed...
![Page 1: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/1.jpg)
CRYSTAL in parallel: replicated and distributed
(MPP) data (MPP) data Ian Bush
Numerical Algorithms Group Ltd, HECToR CSE
![Page 2: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/2.jpg)
Introduction
� Why parallel ?
� What is in a parallel computer
� When parallel ?
� Pcrystal
� MPPcrystal� MPPcrystal
� Examples of MPPcrystal calculations
![Page 3: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/3.jpg)
Why Parallel
� Actually a number of questions:1. Why do we need parallel computers ?• Can’t we just build a big, really powerful serial
computer ?computer ?
2. Why should I be interested in using parallel computers
![Page 4: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/4.jpg)
Why Do We Need Parallel Computers ?
To avoid meltdown !
Also to keep costs down• Use standard components• Including running costs
As a result all computers are parallel today...
4
![Page 5: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/5.jpg)
Ian’s Old Laptop at Warrington, Torino, Alessandria, South Africa, India, Oxford…
Made by Toshiba• Paid for by ME out of my hard earnt wages
• 2 Intel x86 CPUs (Core2Duo)• 1 Gbyte memory
� 0.5 Gbyte per CPU•• Not in the top 500• Reasonably typical modern machine
� All machines are parallel today !
• Runs Window$ XP and Linux� The former under protest
• The screen needs cleaning
5
![Page 6: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/6.jpg)
What Is In A Parallel Computer ?
The building block for modern parallel computers is the Symmetric Multiprocessor (SMP)• A number of CPUs all share the same memory
� Typically 2-16 CPUs� Sometimes called a “shared memory” computer
• Simple example Ian’s laptop
6
CPU 0 CPU 1 CPU 2 CPU 3
Memory
![Page 7: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/7.jpg)
What Is In A Parallel Computer ?Larger Parallel Computers are clusters of SMP “nodes” connected by an interconnect• E.g. ethernet, myrinet, infiniband …• A good interconnect is usually at least ½ price of the whole machine
7
CPU 0 CPU 1 CPU 2 CPU 3
Memory Memory
Interconnect
![Page 8: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/8.jpg)
How WE Shall View Parallel Computers
A number of CPUs each with their own memory, all connected together by an interconnect
•The presence of SMPs is usually just a small complication• i.e. a cluster of “normal” serial computers
8
CPU 0 CPU 1 CPU 2 CPU 3
Mem Mem Mem Mem
Interconne
ct
![Page 9: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/9.jpg)
Why Use Parallel Computers ?
� Faster time to solution• Jobs that take days or weeks can take hours
� More available memory• 32 Processors means in principle 32 times more memory so can run larger jobs
� BUT• More complex, how to run and how to run well can vary markedly with your local set up.
![Page 10: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/10.jpg)
When Parallel?
![Page 11: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/11.jpg)
When Parallel ?
Do you always want to run in parallel ?
NO!� Assigning 10 people to a job does not necessarily get the job done 10 times quicker.Similarly assigning 10 processors to a � Similarly assigning 10 processors to a computation will not make it run 10 times quicker.
So when do you want to run in parallel?
![Page 12: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/12.jpg)
Amdahl’s Law (or the bad news)
If running on n processors
S(n)=(S+P)/(S+P/n)S+P=1
This is a somewhat frightening equation!
![Page 13: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/13.jpg)
Amdahl’s Law
![Page 14: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/14.jpg)
Gustafson’s Law (or the good news)
BUT the relative values of S and P are a function of system size� Parallelize the more expensive parts (first)� These typically become rapidly more expensive as the system size is increased
Parallelism is good for large systems
![Page 15: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/15.jpg)
Load Imbalance
Say we have twenty totally independent tasks and twenty processors� Easy to parallelize – give each task to one of the processors, but …
� what if the tasks don’t take all the same time?� The time taken will be that of the longest job� The time taken will be that of the longest jobBecause of Load Imbalance our speed up is less than perfect – we have too few tasks for too many processors
Don’t use too many processors for too small a job
![Page 16: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/16.jpg)
Communications
But what if the tasks are not independent ?� The processors will need to talk to each other� This is known as communication� This is known as communication� This uses the interconnect, so we need to understand that better
![Page 17: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/17.jpg)
The Interconnect
The only difference between a parallel computer and normal “serial” computers is the interconnect through which data is passed. The important point is that
INTERCONNECTS INTERCONNECTS ARE SLOW
COMPARED TO CPUS
17
![Page 18: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/18.jpg)
The Interconnect
The time to transfer N bytes of data over an interconnect usually roughly obeys
t=α+βN
where• α is the latency
� Typically ~1-100µs for modern networks� Typically ~1-100µs for modern networks� It is the time to transfer zero bytes of data� Dominates the time for short messages
• 1/β is the bandwidth� Typically ~100MByte/s-2GByte/s for modern networks� Dominates the time for long messages
Transferring data is not doing science!18
![Page 19: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/19.jpg)
The Interconnect
t=α+βN
� α is the latency• Typically ~1-100µs for modern networks
So the SHORTEST POSSIBLE time taken in using the interconnect is around 1µs
In that time a modern CPU can do roughly 2,000 floating point operations !•To put it another way, you can do 2,000 operations that are “doing science” in the time it takes to do 1 that does no science at all
THE INTERCONNECT IS A NECESSARY EVIL
19
![Page 20: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/20.jpg)
Communications
� So Communication is SLOW � But usually the computation requirement scales more rapidly than the communication • e.g matrix mult, work scales as N3, comms as N2
Parallelism is good for large systems
![Page 21: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/21.jpg)
Speed Up for Linear Equation Solve
40
50
60
70
80
Sp
eed
Up
Speed Up of PDGESV on HECToR
N=20000
0
10
20
30
0 50 100 150 200 250 300
Sp
eed
Up
Number of Processors
N=20000
N=10000
N=5000
![Page 22: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/22.jpg)
A word on I/O
Depending on how the machine is set up I/O on parallel machines can be VERY slow.
So in general it is best to run direct
This may not be true for medium sized jobs on machines where each processor has a fast local disk.
![Page 23: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/23.jpg)
CRYSTAL – Parallel Implementations
� Pcrystal
• Replicated data• All of CRYSTAL implemented• Good for medium to large problems on small to medium processor counts
MPPcrystal� MPPcrystal
• Distributed data• Much of CRYSTAL implemented• Good for large problems on large processor counts
![Page 24: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/24.jpg)
CRYSTAL – basic algorithm
HR = PR . IR IR = sum of independent integralsFor each k point independentlyHk=FT(HR) Hk = QkT Hk QkHkψk = εkψkPk = | ψk |2
End for each k pointPR=FT(Pk)Repeat until converged
![Page 25: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/25.jpg)
Pcrystal - Implementation
� Standard compliant• Fortran 90• MPI for message passing
� Replicated data• Each processor has a complete copy of all the matrices used in the linear algebra
• Makes implementation very simple
![Page 26: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/26.jpg)
Pcrystal – Parallel Integrals
� Coulomb, Exchange and DFT terms all involve many independent tasks:• Coulomb/Exchange have to evaluate integrals of the form <φiφj||φkφl> for all of i, j, k, l
• Each integral independent• So give a subset to each of the processors
� DFT terms are a numerical integration over a grid� DFT terms are a numerical integration over a grid• Each point of the grid independent• So give a subset of the grid to each processor
� Almost perfectly parallel !• Only global sum at end required• Limit on scaling is load imbalance
![Page 27: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/27.jpg)
Pcrystal – Linear Algebra
� Each k point independent• So each processor performs the linear algebra for a subset of the k points that the job requires
• Again very few comms so potentially good scaling, but …• Potential load imbalance• Number of k points limits the number of processors that can be exploited� What if only a Γ point only calculation ?
� Limit on size of job that can be performed does not scale with number of processors• Memory limitations
![Page 28: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/28.jpg)
Pcrystal – Changes to Your Input
![Page 29: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/29.jpg)
Pcrystal – How to run
� How to run Pcrystal will be system dependent• Generally run in bacth• Ask other users or consult local documentation• Generally will be something like mpirun –np 4 crystal
� BIG difference is that your input file must be called INPUT, and must be accesible by the processor
![Page 30: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/30.jpg)
Pcrystal - Summary
� In general scales very well provided the number of processors ≤ number of k points• Will gain something due to integrals• But large jobs in general require few k points
� The limit on the size of job is given by the � The limit on the size of job is given by the memory required to store the linear algebra matrices for one k point• More processors do not mean larger jobs can be run
![Page 31: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/31.jpg)
MPP Crystal - Implementation
� Uses common standards• Fortran 90• MPI for message passing• ScaLAPACK 1.7 (Dongarra et al.) for linear algebra on distributed matrices� www.netlib.org/scalapack/scalapack_home.html
� Distributed data� Distributed data• Each processor hold only a part of each of the matrices used in the linear algebra
� More complex to implement
![Page 32: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/32.jpg)
MPPcrystal – Parallel Integrals
� More or less as Pcrystal• Works well so why reinvent the wheel ?• However requires replicated HR,PR• Ultimate limit on size of job
� However less demanding limit than Pcrystal because these are stored in sparse format because these are stored in sparse format
� Option to distribute DFT grid� Might want to turn off Bipolar expansion
![Page 33: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/33.jpg)
MPPcrystal – Linear algebra
� As distributed data comms are required to perform the linear algebra, unlike for Pcrystal• However N3 operations but only N2 data to communicate• Scaling gets better for larger systems
� Very rough rule of thumb – if N basis functions can exploit up to around N/15 processorscan exploit up to around N/15 processors
� Number of processors that can be exploited is NOT limited by the number of k points• Great for large Γ point calculations !
![Page 34: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/34.jpg)
MPPcrystal – other issues
� By default runs direct• 100s or processors writing to/reading from one disk not a good idea!
� Most but not all of CRYSTAL implemented� Most but not all of CRYSTAL implemented• Will fail quickly and cleanly if requested feature not implemented
![Page 35: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/35.jpg)
MPPCrystal - Changes to Your Input
TEST08 - SILICON BULK: STO-3GCRYSTAL0 0 02275.42114 .125 .125 .125END14 31 0 3 2. 0.1 1 3 8. 0.1 1 3 4. 0.99 0ENDSHRINK8 4 8MPPEND
![Page 36: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/36.jpg)
MPPCrystal – Other Changes to Your Input
DISTGRID
� In the DFT input� Use a distributed DFT grid� Useful for large calculationsDCDIAG
� In the last section� Use a faster but less numerically stable diagonalizer
![Page 37: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/37.jpg)
MPPcrystal – How to run
� Exactly the same comments as Pcrystal apply:� How to run MPPcrystal will be system dependent• Generally run in batch• Ask other users or consult local documentation• Generally will be something like mpirun –np 4 crystal• Generally will be something like mpirun –np 4 crystal
� Your input file must be called INPUT and must be accessible by the processor
![Page 38: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/38.jpg)
MPPcrystal - summary
� For large systems can scale well, but not so good for small to medium size ones.
� Size of linear algebra matrices is, at present, not an issue given enough processors.an issue given enough processors.
� Memory limitation is from the replicated HR,PR
![Page 39: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/39.jpg)
Pcrystal and MPPcrystal
� Pcrystal• Few comms means scales very well• However scaling limited by number of k points• Memory usage limits size systems that can be studied• Load imbalance in linear algebra may be an issue
MPPcrystal� MPPcrystal• More comms but scales well for large system• Scaling not limited by number of k points• Distributing the matrices allows larger systems to be studied
![Page 40: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/40.jpg)
MPPcrystal – Two example
I will illustrate the behaviour of MPPcrystal with two calculations
� On a small protein, Crambin� On a small protein, Crambin� On a series of amorphous silica slabs
![Page 41: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/41.jpg)
Crambin
� Small protein (46 residues)
� Crystal structure characterized to very high precision by XRD studies (0.52 Å)studies (0.52 Å)
� PDB entry ( 1EJG ) includes hydrogens (this is unusual
![Page 42: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/42.jpg)
2 Chains in unit Cell
1284 Atoms
6-31G * * basis set (12354 functions)(12354 functions)
All calculations B3LYP
![Page 43: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/43.jpg)
SCF Scaling
SCF
6
8
10
12
Cycle
s p
er
Ho
ur
.
XT3
BG/L
IBM p575
0
2
4
0 500 1000 1500
Number of Processors
Cycle
s p
er
Ho
ur
.
IBM p575
![Page 44: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/44.jpg)
Integral Scaling
Integrals
10
15
20
25
30
Evalu
ati
on
s/h
ou
r
XT3
BG/L
IBM p575
0
5
10
0 500 1000 1500
Number of Processors
Evalu
ati
on
s/h
ou
r
IBM p575
![Page 45: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/45.jpg)
Linear Algebra Scaling
Diagonalisation
30
40
50
Evalu
ati
on
s/h
ou
r
XT3
BG/L
0
10
20
0 500 1000 1500
Number of Processors
Evalu
ati
on
s/h
ou
r
BG/L
IBM p575
![Page 46: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/46.jpg)
Forces Scaling
Forces
1.5
2
2.5
3
3.5
Evalu
ati
on
s/h
ou
r
XT3
BG/L
0
0.5
1
1.5
0 500 1000 1500
Number of Processors
Evalu
ati
on
s/h
ou
r
IBM p575
![Page 47: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/47.jpg)
Electrostatic Potential
![Page 48: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/48.jpg)
Amorphous Silica Slabs
� Inputs kindly provided by Piero Uliengo
� 3 different sizes slabs compared (the bigger two are supercells of the smallest one)are supercells of the smallest one)
![Page 49: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/49.jpg)
MPPCrystal Speed Up on BG/P
400
500
600
700
800
900
1000
Sp
eed
Up
7756 Basis Function (579 Atoms)
15512 Basis Functions (1158 Atoms)
23268 Basis Functions (1737 Atoms)
0
100
200
300
0 200 400 600 800 1000 1200
Number Of Processors
23268 Basis Functions (1737 Atoms)
![Page 50: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/50.jpg)
MPPCrystal Performace - 23268 Basis Functions
15
20
25
30
35
SC
F C
ycle
s p
er
ho
ur
BG/P
p575
XT4
0
5
10
0 500 1000 1500 2000 2500 3000 3500 4000
SC
F C
ycle
s p
er
ho
ur
Number Of processors
XT4
![Page 51: CRYSTAL in parallel: replicated and distributed (MPP) … in parallel: replicated and distributed (MPP) data Ian Bush Numerical Algorithms Group Ltd, HECToR CSE. Introduction Why parallel](https://reader031.fdocuments.net/reader031/viewer/2022030510/5aba98bc7f8b9a441d8bd6a5/html5/thumbnails/51.jpg)
Summary
� Crystal can use to parallelization strategies• Pcrystal uses replicated data
� Good for medium to large problems� Memory limits size of problem that may be addressed� Scales well up to number of k points� The one you’ll use most often
• MPPcrystal uses distributed data� Memory limitations much less stringent than Pcrystal� For a big enough problem can scale very well