Earth Data Science Planning Meeting #2 March 7, 2013.

17
Earth Data Science Planning Meeting #2 March 7, 2013

Transcript of Earth Data Science Planning Meeting #2 March 7, 2013.

Page 1: Earth Data Science Planning Meeting #2 March 7, 2013.

Earth Data Science Planning Meeting #2

March 7, 2013

Page 2: Earth Data Science Planning Meeting #2 March 7, 2013.

Agenda

• Recap from 1st meeting• Discuss near term action items• Construct WGs• Revisit key questions

Page 3: Earth Data Science Planning Meeting #2 March 7, 2013.

Recap for Meeting #1

• Introductions• Discussed the history of forming the group

(e.g., 8X Retreat)• Study Objectives• Initial Plans• Questions

Page 4: Earth Data Science Planning Meeting #2 March 7, 2013.

Study Objective (1)

• Evaluation of the business case of targeting “data science” as a technology growth area in earth data systems research

• Identification of near-term science questions/challenges to address

• Identification of Data Science vs. Big Data synergies and differences

• Development of a capabilities roadmap• Current state of JPL vs competitors• Required staffing needs and gaps

Page 5: Earth Data Science Planning Meeting #2 March 7, 2013.

Study Objective (2)

• Key partnerships• Necessary facilities support vs. current state• Recommendations on how to structure a long-term

program• Identify opportunities to work NASA ESD Program

and propose

Page 6: Earth Data Science Planning Meeting #2 March 7, 2013.

Today’s Discussion: Proposed Near-Term Actions

• Data Lifecycle • Data Science White Paper• IT for Climate Research Workshop• Use Cases (Near-term, Long-term)• Invited Speakers

Page 7: Earth Data Science Planning Meeting #2 March 7, 2013.

Earth Science Data Lifecycle

For JPL Internal Use Only

Data Acquisition

and Command

Instrument

Operations

EDOS/GDS

L0A Processin

g

ScienceData

ProcessingL0BL1L2L3L4

SDS EOSDIS DAAC

ScienceData

Management Archive

&Distribution

Instrument

Operations

EDOS/GDS

L0A Processin

g

ScienceData

ProcessingL0BL1L2L3L4

SDS EOSDIS DAAC

ScienceData

Management Archive

&Distribution

EOSDIS Data Centers

ScienceData

Management Archive

&Distribution

ScienceData

ProcessingL0BL1L2L3L4

Science Data Systems

Instrument Operations

EDOS/Ground Data

Systems

L0A Processing

Science Teams Outreach

Mission Operation

s

Mission Operations

TDRS Network

On Board Processing

Applications

Analysis, Modeling and

ApplicationEnvironments/

Gateways

DecisionSupport

Research

Page 8: Earth Data Science Planning Meeting #2 March 7, 2013.

Data Science White Paper• Real example of a problem

• What are the problems we are trying to solve?

• Tracability matrix (link to use cases)• Data Science Concepts

– Massive Data Analysis– Long –tail Science Data Analysis

• Gap Analysis– Existing JPL activities (including investments)– Benchmarking (both our competitors and others)

• Opportunities– Climate, Shifting Archives, Applications->Decision Support

• Recommendations– Targets

• NASA Earth Science Technology Office (ESTO)• Jack Kaye’s programs; Steve Volz’s programs (Maiden)

Page 9: Earth Data Science Planning Meeting #2 March 7, 2013.

IT for Climate Research Workshop

• JPL held 1st IT for Climate Research Workshop in 2009 at SMC-IT in Pasadena, CA– Focus on motiving the problem of model to data comparison

• JPL/GSFC held 2nd IT for Climate Research Workshop in 2010 at GSFC– Focused on integration of ESG with satellite data

• Proposed 3rd workshop– Focus on data science aspects for climate-model comparison

Page 10: Earth Data Science Planning Meeting #2 March 7, 2013.

Use Case Planning

• Identify key use cases that focus on data science challenges

• Identify a use case leader

• Capture the use case in a template

Page 11: Earth Data Science Planning Meeting #2 March 7, 2013.

Proposed Use Case #1: Climate Model/Obs Comparison

• Growing, distributed, massive record of observational and climate model output– CMIP3: ~34 Terabytes– CMIP5: ~3 Petabytes– CMIP6: 350 PBs – 3 Exabytes (per D.

Williams and 2011 Climate Knowledge Discovery Workshop)

• A new paradigm is required to shift focus from data access and independent data analysis to online analysis services for highly distributed, heterogeneous data to• Fuse data together for long-term records• Compute higher order data products on request• Analyze distributed data (e.g., climate model output, satellite data, etc)

with distributed computation• Establish a scalable computing infrastructure for missions and science

projects

Page 12: Earth Data Science Planning Meeting #2 March 7, 2013.

Proposed Use Case #2: CO2 Research

AIRS

OCO-2

TES

GOSAT

Bias Update, etc.

Data Assim. CO2 f(x,y,z,t) Data Assim.

d[CO2]/dt at surface

Primarily a GCM, e.g. GEOS-5

A coupled chemistry-transport model with surface model, e.g. GEOS-Chem

L2 data products from Data Centers

In Situ

Aircraft

TCCON

Provide data records tied to WMO standard

•Existing measurements are integrated from disparate data centers•Methods for generating long-term records are under development•Initial data records from ACOS and AIRS have been captured•Support for generating OCO-2 L3 products will be in place

Page 13: Earth Data Science Planning Meeting #2 March 7, 2013.

Proposed Use Case #3: Extreme Weather Events

Can we look at historical data to link to extreme weather events? (e.g., Hurricanes, etc?)

Page 14: Earth Data Science Planning Meeting #2 March 7, 2013.

Use Cases Beyond Earth

• We started at Earth, but do want to wear a bigger hat

• Astronomy

• Engineering

• Planetary

Page 15: Earth Data Science Planning Meeting #2 March 7, 2013.

Working Groups

• Data Lifecycle• Data Science White Paper• Use Cases

– Climate– CO2– Extreme Weather– Astronomy– Planetary– Engineering

Page 16: Earth Data Science Planning Meeting #2 March 7, 2013.

Speakers?

• Begin to socialize efforts from colleagues

• Recommendation: Focus on those people analyzing massive scientific data sets in physical sciences – Remote sensing would be ideal

Page 17: Earth Data Science Planning Meeting #2 March 7, 2013.

Questions• What is Big Data?

• What is the life cycle of the entire flow?

• What are the problems to solve• CO2 Sink/Source on ….

• What are the opportunities?

• What is the long term plan?

• What are some low hanging fruit?

• What are our next steps

• What benefits do we expect?