Grid optical network service architecture for data intensive applications
Aaas Data Intensive Science And Grid
-
Upload
ian-foster -
Category
Technology
-
view
1.340 -
download
4
description
Transcript of Aaas Data Intensive Science And Grid
![Page 1: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/1.jpg)
Ian Foster
Computation Institute
Argonne National Lab & University of Chicago
New computing platforms
for data-intensive science
![Page 2: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/2.jpg)
3
![Page 3: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/3.jpg)
4
Growth of Genbank
(1982-2005)
BroadInstitute
![Page 4: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/4.jpg)
5
Proteomics Genomics Transcriptomics Protein sequence prediction Phenotypic studies Phylogeny Sequence analysis Protein structure prediction Protein-protein interaction Metabolomics Model organism collections Systems biology Health epidemiology Organisms Disease ….
1070 molecular bio databases Nucleic Acids Research Jan 2008
(96 in Jan 2001)
Slide: Carole Goble
![Page 5: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/5.jpg)
6
New problem solving methodologies
<0 1700 1950 1990
Empirical
Data
Theory
Simulation“Applied computer science is now playing the role that mathematics did from the 17th through the 20th centuries: providing an orderly, formal framework and exploratory apparatus for other sciences”
– G. Djorgovski
![Page 6: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/6.jpg)
7
![Page 7: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/7.jpg)
8
More data does not always mean more knowledge
Folker Meyer, Genome Sequencing vs. Moore’s Law: Cyber Challenges for the Next Decade, CTWatch, August 2006.
![Page 8: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/8.jpg)
9
enormous
Data is
Infrastructure Storage & computingEconomics of scale
AggregationData & softwarePeople & disciplines
AlgorithmsScalable, probabilisticErrors & ambiguity
distributed
noisy
Cloud
Grid
![Page 9: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/9.jpg)
10
Data
An incomplete list of process steps
Discover
Access
Integrate
Analyze
Mine
Publish
Annotate
Validate
CurateShare
Artisanal
Industrial
Data
Analyses
Models
Experiments
Literature
![Page 10: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/10.jpg)
11
SOA as an integrating framework?
We expose data and software as services …
which others discover, decide to use, …
and compose to create new functions ...
which they publish as new services.
Technical …• Complexity• Semantics• Distribution• Scale
socio-technical challenges• Incentives• Policy, trust• Reproducibility• Life cycle
“Service-oriented science”, Science, 2005
and
![Page 11: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/11.jpg)
12
Grid technology
![Page 12: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/12.jpg)
13
NAE Grand Challenges
13
![Page 13: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/13.jpg)
14
The future of multi-site data integration: An example
fMRI
Are positive symptom schizophrenics associated with more severe superior temporal gyrus dysfunction?
Receptor Density
ERP
Web
PubMed, Expasy,
Brain Map,Etc.
Structure
Clinical
PortalPortal
0.150.18
0.140.11
-0.14-0.10-0.06-0.020.020.060.100.140.180.220.260.30
ARIP - 20MG ARIP - 30MG RISP - 06MG PLACEBOTreatment Group
![Page 14: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/14.jpg)
15
caBIG: sharing of infrastructure, applications, and data.
Aggregation in cancer biology
Globus
![Page 15: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/15.jpg)
16
As of Feb16, 2009
123 participants104 services
65 data39 analytical
![Page 16: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/16.jpg)
17
Microarray clustering in caBIG
1. Query and retrieve microarray data from a caArray data service:cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub
2. Normalize microarray data using GenePattern analytical service node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService
1. Hierarchical clustering using geWorkbench analytical service: cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage
Workflow in/output
caGrid services
“Shim” servicesothers
Wei Tan(Taverna workflow)
![Page 17: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/17.jpg)
18
Children’s Oncology Groupclinical imaging irials (Erberich)
![Page 18: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/18.jpg)
19
Wide-area medical interface service
Converts local medical workflow actions into wide area operations Image workflow, EHR, …
Transparently manages federation of Security Data replication and recovery Data discovery
En
terp
rise/G
ridIn
terfa
ce S
erv
ice
DICOM Protocols
Grid Protocols(Web services)
DICOM
XDS
HL7
Vendor Specific
Wid
e A
rea
Serv
ice A
ctor
Plug-in Adapters
![Page 19: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/19.jpg)
20
Main ESG PortalMain ESG Portal CMIP3 (IPCC AR4) ESG PortalCMIP3 (IPCC AR4) ESG Portal
198 TB of data at four locations 1,150 datasets 1,032,000 files Includes the past 6 years of joint
DOE/NSF climate modeling experiments
35 TB of data at one location 74,700 files Generated by a modeling campaign coordinated by the
Intergovernmental Panel on Climate Change Data from 13 countries, representing 25 models
8,000 registered users 1,900 registered projects
Downloads to date 49 TB 176,000 files
Downloads to date 387 TB 1,300,000 files 500 GB/day
(average)
400 scientific papers published to date based on analysis of CMIP3 (IPCC AR4) data
Earth System Grid
ESG usage: over 500 sites worldwide
ESG monthly download volumes
Globus
www.earthsystemgrid.org
![Page 20: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/20.jpg)
21
Understanding interactions between human and natural systems
IPCC Emissions scenarios
Numerical Simulations
IPCC 4th Assessment
2007
IPCC process: Bill Collins, LBNL
Mitigation
Adaptation
![Page 21: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/21.jpg)
22
A Community Integrated Model for Economic and Resource Trajectories for
Humankind (CIM-EARTH)
Dynamics,foresight,
uncertainty,resolution, …
Agriculture,transport,
taxation, …
Data (global,local, …)
(Super)computers
CIM-EARTHFramework
Communityprocess
Opencode, data
www.cim-earth.org
![Page 22: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/22.jpg)
23
Alleviating Poverty
in Thailand:Modeling
Entrepreneurship
Consider only wealth,
access to capital
Consider alsodistance to
6 major cities
Rob Townsend, Tibi Stef-Praun, Victor Zhorin
Match
High
Low
![Page 23: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/23.jpg)
24
enormous
Data is
Infrastructure Storage & computingEconomics of scale
AggregationData & softwarePeople & disciplines
AlgorithmsScalable, probabilisticErrors & ambiguity
distributed
noisy
Cloud
Grid
![Page 24: Aaas Data Intensive Science And Grid](https://reader035.fdocuments.net/reader035/viewer/2022081512/554ea0f5b4c905977e8b460d/html5/thumbnails/24.jpg)
Computation Institutewww.ci.uchicago.edu
Thank you!