Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be...
Transcript of Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be...
![Page 1: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/1.jpg)
Open Science Data Cloud
Robert Grossman Open Cloud Consor7um University of Chicago Open Data Group
March 10, 2011
![Page 2: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/2.jpg)
Astronomical data Biological data (Bionimbus)
NSF‐PIRE OSDC Data Challenge
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Earth science data (& disaster relief)
Open Science Data Cloud
![Page 3: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/3.jpg)
Who are we?
![Page 4: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/4.jpg)
4 www.opencloudconsor7um.org
• U.S based not‐for‐profit corpora7on. • Manages cloud compu7ng infrastructure to
support scien7fic research: Open Science Data Cloud.
• Manages cloud compu7ng testbeds: Open Cloud Testbed.
• Develop reference implementa7ons, benchmarks and standards.
![Page 5: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/5.jpg)
OCC Members
• Companies: Cisco, Citrix, Yahoo!, … • Universi7es: University of Chicago, Calit2, Johns Hopkins, Northwestern Univ., ORNL, University of Illinois at Chicago, …
• Federal agencies: NASA • Other: Na7onal Lambda Rail • Beginning to add interna7onal partners in 2011.
5
![Page 6: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/6.jpg)
Infrastructure
• 2010 Proof‐of‐Concept Infrastructure – 450+ nodes – 3000+ cores – 2+ PB – Four data centers (two more to come in 2011) – Data centers have 10G network connec7ons to StarLight (some 100G links in 2011)
• Plan to add approximately 1 PB of data in 2011. • With current funding, we will refresh 1/3 of the infrastructure in 2011 and 2012.
![Page 7: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/7.jpg)
Why Another Cloud Project?
![Page 8: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/8.jpg)
Small Medium to Large Very Large
Data Size
Low
Med
Wide
Variety of analysis
No infrastructure Dedicated infrastructure General infrastructure
Scien7st with laptop
Open Science Data Cloud
High energy physics, astronomy
![Page 9: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/9.jpg)
Single worksta7ons
Small to medium clusters
HPC
Cycles
Small
Med
Large
Persistent data
data clouds
Large & spec. clusters
databases
![Page 10: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/10.jpg)
What is the Open Science Data Cloud?
![Page 11: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/11.jpg)
Hosted, managed, distributed facility to: • Manage & archive your medium and large datasets • Provide computa7onal resources to analyze it • Provide networking to share it with your colleagues and the public.
![Page 12: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/12.jpg)
Long Time Goal
Build a (small) data center for science.
![Page 13: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/13.jpg)
And preserve your data the same way that libraries preserve books &
museums preserve art.
![Page 14: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/14.jpg)
OSDC Perspec7ve • Take a long term point of view (think like a library not a cloud service provider)
• Operate infrastructure at the scale of a small data center
• Interoperate with public clouds
• Open, interoperable architecture
• Experiment at scale • Vendor neutral
![Page 15: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/15.jpg)
OSDC Projects
![Page 16: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/16.jpg)
Project 1. Bionimbus
www.bionimbus.org
![Page 17: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/17.jpg)
Case Study: Public Datasets in Bionimbus
![Page 18: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/18.jpg)
What Could You Do With 1 PB of Genomics Data?
• The NIH in the U.S. currently makes available for download approximately 2PB of data.
• Bionimbus today consists of 6 racks, 212 nodes, 1568 cores and 0.9 PB of storage.
• We plan to add approximately 1 PB of genomics and other data from the biological sciences to Bionimbus in 2011.
![Page 19: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/19.jpg)
Case Study: ModENCODE
• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).
• Bionimbus VMs were used for some of the integra7ve analysis.
• Bionimbus is used as a backup for the modENCODE DCC
![Page 20: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/20.jpg)
Project Matsu 2: An Elastic Cloud For Disaster Response
Daniel Mandl - NASA/GSFC, Lead
20
![Page 21: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/21.jpg)
Provide Fire / Flood Data to Rescue Workers
Short Term Pilot for 2011 • Colored areas represent catchments where rainfall collects and drains to river basins • River gauges displayed as small circles • Detailed measurements are available on the display by clicking on the river gauge sta7ons.
21
Note blue bars indicating a surge of rainfall upstream
Then a flood wave appears downstream at Rundu river gauge days later
Flood Dashboard
Zambezi basin consisting of upper, middle and lower catchments
![Page 22: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/22.jpg)
Project 3: OSDC PIRE Project
![Page 23: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/23.jpg)
OSDC PIRE Project Overview
• Research – Cloud middleware for data intensive compu7ng – Wide area clouds
• Training and educa7on workshops – Data intensive compu7ng using the OSDC – Cloud compu7ng for scien7fic compu7ng
• Outreach – OSDC Data Challenge
![Page 24: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/24.jpg)
Foreign Partners
• Na7onal Ins7tute of Advanced Industrial Science and Technology (AIST), Japan
• Beijing Ins7tute of Genomics (BIG) • Edinburgh University • Korea Ins7tute of Science & Technology • San Paulo State University • Universidade Federal Fluminense, Brasil • University of Amsterdam
![Page 25: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/25.jpg)
OSDC Data Challenge
• Annual contest to select 3 to 4 datasets each year to add to the OSDC.
• Looking for the most interes7ng datasets to add.
![Page 26: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/26.jpg)
Research Focus
• Cloud architectures for data intensive compu7ng
• Wide area clouds • Con7nuous learning • Scanning queries
![Page 27: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/27.jpg)
Ways to Par7cipate
• Nominate one of your graduate students to spend a summer working with one of the OSDC PIRE Foreign Partners
• Send one of your graduate students to hands‐on Workshops, such as Introduc7on to Data Intensive Compu7ng
• Submit your most impressive dataset to the OSDC Data Challenge
• Buy a container of computers and join the OSDC
![Page 28: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/28.jpg)
Open Science Data Cloud Sustainability Model
![Page 29: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/29.jpg)
Towards a Long Term, Sustainable Model
• Capital Exp about $1M/year • Opera7ng Exp about $1M/year • Moore Founda7on providing $1M/year for 2011 and 2012 to support the Cap Exp.
! "#$$! "#$"! "#$%! "#$&! "#$'!
!"#$%&'()$ $ *$ *$ *$ *$
+,-&.$%&'()$ /$ 01$ 12$ 1*$ 1*$
!"#$3&4&'5-6$789:$ 2$ ;<=>$ ></?$ *<10$ 01<;0$
+,-&.$3&4&'5-6$789:$ 0<11$ /<*=$ 02<;/$ 0?<;;$ 1/<?*$
!@AB"%$3,%")$ 01*2$ ;*/2$ 01*22$ 0?C12$ ;>*/2$
$
![Page 30: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/30.jpg)
Who do you most trust to manage your data for 100 years?
Companies may not be here tomorrow.
Think of a not for profit with that mission.
Government agencies have a role, but not always easy to use.
![Page 31: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/31.jpg)
Buy A Container and Join the OCC
• Use 2/3 of the container for your own purposes.
• Provide 1/3 of the container to the OCC for a share replica space.
![Page 32: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/32.jpg)
To Get Involved
Join the Open Cloud Consor7um: www.opencloudconsor7um.org
![Page 33: Open Science Data Cloud - jpgrid.org · NSF‐PIRE OSDC Data Challenge The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may](https://reader030.fdocuments.net/reader030/viewer/2022041106/5f0890857e708231d422a2d1/html5/thumbnails/33.jpg)
Ques7ons?