Data acquisition, handling, and analysis at the Advanced ... · 0.000 01 0.000 1 0.001 0.01 0.1 1...
Transcript of Data acquisition, handling, and analysis at the Advanced ... · 0.000 01 0.000 1 0.001 0.01 0.1 1...
Data acquisition, handling, and analysis at the Advanced Photon Source
Chris JacobsenAssociate Division Director, X-ray Science
Division, Advanced Photon SourceProfessor, Physics & Astronomy; Applied
Physics; Chemistry of Life Processes Institute; Northwestern University
0.000 010.000 1
0.0010.01
0.11
10100
1000
1-ID
-11-
ID-2
1-ID
-31-
ID-4
3-ID
-B3-
ID-C 7
8-BM
911
-ID-B
11-ID
-C11
-ID-D
12-B
M12
-ID-B
12-ID
-C/D
15-ID 20
21-ID
21-ID
-D21
-ID-E
21-ID
-F21
-ID-G 22
23-ID
-D23
-ID-B 30
32-ID
-1
Data Volume at present (TB/day). Cumulative: 165 TB/day. Imaging highlighted.
Advanced Photon Source (APS) Beamline
34-ID
32-ID
-2
2-B
M2-
ID-B
2-ID
-E2-
ID-D
8-ID
-I8-
ID-E
Beamlinecomputer(control, collect,visualize, analyze,distribute)
Light sourcecluster (store,analyze, track)
Computing centercluster (analyze)
Computing centerpetascale storage(archive, track,distribute)
Globus Online/GridFTPTracking: where is the data? Verified? Provenance throughout analysis?Portable hard drives
Detector
1
APS data rate: ~100 TB/day at present
1
LHC: abo
ut 15 pe
taBytes/year or ~
80 TB/day
h:p://pu
blic.web
.cern.ch/pub
lic/en/LH
C/Co
mpu
Bng-‐en
.htm
l
Detectors are part of a data handling system.Move to cloud computing using Hadoop?
10
DECTRIS/PSI detectors• Pilatus 6M, 20 bits at 12 frames/sec: 380 TB/hour flat out• Eiger 16M, 12 bits at 187 frames/sec: 16,000 TB/hour flat out
10
May 2012: Workshop on high speed data and HDF5 files held at Paul Scherrer InsBtut. Nicholas Schwarz of APS parBcipated.
APS/XSD beamline scientists
MCS datapipelining
Scientificcollaborators(ANL and beyond)
EMC electronmicroscopists
Experiments, datacollection, analysis,interpretation
APS/AES SSGSoftwareServices
APS/AES BCDABeamlinecontrols
APS/XSD DETDetectors
APS/AES ITInformation tech
Software for dataaquisition
Initial data transfer,long term storage
Data aggregation,distribution
MCS mathematicalanalysis
Mathematicalmethods and tools
12
Organizational complexity
12
• APS X-ray Science Division: beamline scientists, Detector Gruop, Scientific Software group (2 people)
• APS Engineering Services (AES): Beamline Controls and Data Aquisition (BCDA), Software Support Group (SSG), Information Technology (IT)
• Mathematics and Computer Science: multiple research groups, multiple large computing systems
experiment(s)
data reduction
data analysis
reduced data
raw data
preliminary model
re-interpretation
re!nement
optimizer
transformation
modeling
At the FacilityAt the Home Institution
experiment(s)
data reduction
data analysis
publication, presenta-tion, archival
visualization
modeling
reduced data, I(Q)
raw data (2-D Intensity, E, T, P, t, etc.)
adjustableparameters
At the Facility
At the Home Institu
tion
16
Data processes: changing models?
16
Figures courtesy F. De Carlo and P. Jemian, from “Living Data for Extreme-‐Scale Science FaciliBes”, submi:ed by I. Foster et al. to DoE’s Advanced ScienBfic CompuBng Research (ASCR) call, April 2012
Visible light Red blood cells
Algae Yeast
18
Finding patterns in spectroscopic imaging
• Automatic classification of cell types followed by histograms of elemental content.
• S. Wang, J. Ward, S. Vogt et al.
18
J. Lehmann, D. Solomon, J. Kinyangi, L. Dathe, S. Wirick, and C. Jacobsen, Nature Geoscience 1, 238 (2008). Mathematical method: M. Lerotić, C. Jacobsen, T. Schäfer, S. Vogt, Ultramicroscopy 100, 35 (2004).
12
Storing and managing data• Automated pipelines• Metadata on experimental conditions• Self-documented, compressible, platform-independent format with
support for parallel computing: HDF5 (www.hdfgroup.org)– Data chunking has big effect on performance– Compression by dedicated computer?– Parallel writing of precompressed chunked data– HDF5 will soon have Single Writer/Multiple Reader (SWMR)– HDF5 has tentative plans for Multiple Writer/Multiple Reader (MWMR)
but schedule depends on funding of The HDF Group (www.hdfgroup.org)
• Provenance: tracking what you did to process the data• HDF5-based schema for metadata, provenance: DataExchange
(www.aps.anl.gov/DataExchange/)
12
Argonne: team effort between APS beamline scientists, APS detector group, APS engineering support group, and Math and Computer Science division
APS: Nicholas Schwarz
9
Sesame Street Science
9
Who are the people in your neighborhood?