Big Process for Big Data @ NASA
-
Upload
ian-foster -
Category
Technology
-
view
1.318 -
download
0
description
Transcript of Big Process for Big Data @ NASA
![Page 1: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/1.jpg)
computationinstitute.org
Big process for big data
Ian Foster
[email protected] Goddard, February 27, 2013
![Page 2: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/2.jpg)
computationinstitute.org
The Computation Institute
= UChicago + Argonne
= Cross-disciplinary nexus
= Home of the Research Cloud
![Page 3: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/3.jpg)
computationinstitute.org
High energy physics
Molecular biology
Cosmology
Genetics
MetagenomicsLinguistics
Economics
Climate change
Visual arts
![Page 4: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/4.jpg)
computationinstitute.org
x10 in 6 years
x105 in 6 years
Will data kill genomics?
Kahn, Science, 331 (6018): 728-729
![Page 5: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/5.jpg)
computationinstitute.org
18 ordersof magnitudein 5 decades!
12 ordersof magnitudeIn 6 decades!
Moore’s Law for X-Ray Sources
![Page 6: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/6.jpg)
computationinstitute.org
1.2 PB of climate dataDelivered to 23,000 users
![Page 7: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/7.jpg)
computationinstitute.org
We have exceptional infrastructure for the 1%
![Page 8: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/8.jpg)
computationinstitute.org
What about the 99%?
We have exceptional infrastructure for the 1%
![Page 9: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/9.jpg)
computationinstitute.org
What about the 99%?
We have exceptional infrastructure for the 1%
Big science. Small labs.
![Page 10: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/10.jpg)
computationinstitute.org
Need: A new way to deliver research cyberinfrastructure
FrictionlessAffordable
Sustainable
![Page 11: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/11.jpg)
computationinstitute.org
We asked ourselves:
What if the research work flow could be managed as
easily as……our pictures
…home entertainment…our e-mail
![Page 12: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/12.jpg)
computationinstitute.org
What makes these services great?
Great User Experience+
High performance (but invisible) infrastructure
![Page 13: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/13.jpg)
computationinstitute.org
We aspire (initially) to create a great user
experience forresearch data managementWhat would a “dropbox
for science” look like?
![Page 14: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/14.jpg)
computationinstitute.org
• Collect•Move• Sync• Share• Analyze
• Annotate• Publish• Search• Backup• Archive
BIG DATA…for
![Page 15: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/15.jpg)
computationinstitute.org
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
A common work flow…
![Page 16: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/16.jpg)
computationinstitute.org
RegistryStaging Store
IngestStore
AnalysisStore
Community Store
Archive Mirror
IngestStore
AnalysisStore
Community Store
Archive Mirror
Registry
… with common challenges
Data movement, sync, and sharing
• Between facilities, archives, researchers
• Many files, large data volumes• With security, reliability,
performance
![Page 17: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/17.jpg)
computationinstitute.org
• Collect•Move• Sync• Share• Analyze
• Annotate• Publish• Search• Backup• Archive
• Collect•Move• Sync• Share
Capabilities delivered using Software-as-Service (SaaS) model
![Page 18: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/18.jpg)
computationinstitute.org
DataSource
DataDestinatio
n
User initiates transfer request
1
Globus Online moves/syncs files
2
Globus Online notifies user
3
![Page 19: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/19.jpg)
computationinstitute.org
DataSource
User A selects file(s) to share; selects user/group, sets share permissions
1
Globus Online tracks shared files; no need to move files to cloud storage!
2
User B logs in to Globus Online
and accesses shared file
3
![Page 20: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/20.jpg)
computationinstitute.org
Extreme ease of use
• InCommon, Oauth, OpenID, X.509, …• Credential management• Group definition and management• Transfer management and
optimization• Reliability via transfer retries• Web interface, REST API, command
line• One-click “Globus Connect” install • 5-minute Globus Connect Multi User
install
![Page 21: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/21.jpg)
computationinstitute.org
Early adoption is encouraging
![Page 22: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/22.jpg)
computationinstitute.org
Early adoption is encouraging
8,000 registered users; ~100 daily~10 PB moved; ~1B files
10x (or better) performance vs. scp99.9% availability
Entirely hosted on AWS
![Page 23: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/23.jpg)
computationinstitute.org
Delivering a great user experience relies onhigh performance
network infrastructure
![Page 24: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/24.jpg)
computationinstitute.org
Science DMZoptimizes performance
+
![Page 25: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/25.jpg)
computationinstitute.org
What is a Science DMZ?
Three key components, all required:• “Friction free” network path
– Highly capable network devices (wire-speed, deep queues)– Virtual circuit connectivity option– Security policy and enforcement specific to science
workflows– Located at or near site perimeter if possible
• Dedicated, high-performance Data Transfer Nodes (DTNs)– Hardware, operating system, libraries optimized for
transfer– Optimized data transfer tools: Globus Online, GridFTP
• Performance measurement/test node– perfSONAR
Details at http://fasterdata.es.net/science-dmz/
![Page 26: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/26.jpg)
computationinstitute.org
Globus GridFTP architecture
28
GridFTP Dedicated
LFN
Shared
Glo
bu
s X
IO
Parallel TCP
UDP or RDMA
TCP
Internal layered XIO architecture allows alternative network and filesystem interfaces to be plugged in to the stack
![Page 27: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/27.jpg)
computationinstitute.org
GridFTP performance options
• TCP configuration• Concurrency: Multiple flows per node• Parallelism: Multiple nodes• Pipelining of requests to support small files• Multiple cores for integrity, encryption• Alternative protocol selection*• Use of circuits and multiple paths*
Globus Online can configure these options based on what it knows about a transfer
* Experimental
![Page 28: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/28.jpg)
computationinstitute.org
Exploiting multiple paths
• Take advantage of multiple interfaces in multi-homed data transfer nodes
• Use circuit as well as production IP link• Data will flow even while the circuit is being set up• Once circuit is set up, use both paths to improve
throughputRaj Kettimuthu, Ezra Kissel, Martin Swany, Jason Zurawski, Dan Gunter
![Page 29: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/29.jpg)
computationinstitute.org
Exploiting multiple paths
multipath
multipath
Default, commodity IP routes+ Dedicated circuits
= Significant performance gains
Transfer between NERSC and ANL Transfer between UMich and Caltech
Raj Kettimuthu, Ezra Kissel, Martin Swany, Jason Zurawski, Dan Gunter
![Page 30: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/30.jpg)
![Page 31: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/31.jpg)
computationinstitute.org
K. Heitmann (Argonne) moves 22 TB of cosmology data LANL ANL at 5 Gb/s
![Page 32: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/32.jpg)
computationinstitute.org
B. Winjum (UCLA) moves 900K-file plasma physics datasets UCLA NERSC
![Page 33: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/33.jpg)
computationinstitute.org
Dan Kozak (Caltech) replicates 1 PB LIGO astronomy data for resilience
![Page 34: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/34.jpg)
computationinstitute.org
• Collect•Move• Sync• Share• Analyze
• Annotate• Publish• Search• Backup• Archive
BIG DATA…for
![Page 35: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/35.jpg)
computationinstitute.org
• Collect•Move• Sync• Share• Analyze
• Annotate• Publish• Search• Backup• Archive
BIG DATA…for
![Page 36: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/36.jpg)
computationinstitute.org
Globus Online Research Data Management-as-a-Service
Globus Integrate (Globus Nexus, Globus Connect)
Sharing, Collaboration,
Annotation… SaaS
PaaS
Many more capabilities planned …
Backup, Archival, Retrieval
Ingest, Cataloging, Integration
![Page 37: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/37.jpg)
computationinstitute.org
A platform for integration
![Page 38: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/38.jpg)
computationinstitute.org
Catalog as a service
Approach• Hosted user-
defined catalogs• Based on tag
model<subject, name, value>
• Optional schema constraints
• Integrated with other Globus services
Three REST APIs/query/• Retrieve subjects/tags/• Create, delete,
retrieve tags/tagdef/• Create, delete,
retrieve tag definitions
Builds on USC Tagfiler project (C. Kesselman et al.)
![Page 39: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/39.jpg)
computationinstitute.org
Other early successes in services for science…
![Page 40: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/40.jpg)
computationinstitute.org
![Page 41: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/41.jpg)
computationinstitute.org
![Page 42: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/42.jpg)
computationinstitute.org
Other innovative science SaaS projects
![Page 43: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/43.jpg)
computationinstitute.org
Other innovative science SaaS projects
![Page 44: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/44.jpg)
computationinstitute.org
To provide more capability formore people at substantially lower cost by creatively aggregating (“cloud”) and federating (“grid”) resources
“Science as a service”
Our vision for a 21st century cyberinfrastructure
![Page 45: Big Process for Big Data @ NASA](https://reader037.fdocuments.net/reader037/viewer/2022103114/554ea02ab4c9055f7b8b4687/html5/thumbnails/45.jpg)
computationinstitute.org
Thank you to our sponsors!