Data and Information Opportunities

15
Data and Information Opportunities Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science Programs [email protected] Board on Research Data and Information Sponsors Meeting September 23 rd , 2013

description

Data and Information Opportunities. Board on Research Data and Information Sponsors Meeting September 23 rd , 2013. Laura Biven, PhD Senior Science and Technology Advisor Office of the Deputy Director for Science Programs [email protected]. Priorities Challenges Opportunities. - PowerPoint PPT Presentation

Transcript of Data and Information Opportunities

Data and Information Opportunities

Laura Biven, PhDSenior Science and Technology Advisor

Office of the Deputy Director for Science [email protected]

Board on Research Data and Information Sponsors MeetingSeptember 23rd, 2013

Priorities Challenges Opportunities

2

Data Management for primary

research

Data Management for reuse and repurposing

Data Management for primary

research Data Management for reuse and repurposing

The World on Big Data

The World

Not only true for astronomy, high energy physics,… biology, climate, materials science,…

Quick-Facts about the DOE Office of Science

3

Advanced Scientific Computing Research

Basic Energy Sciences

Biological and Environmental Research

Fusion Energy Sciences

High Energy Physics

Nuclear Physics

Very diverse portfolio with

truly big data

4

The DOE/SC Labs Today – User Facilities Us

5

SSRL (SLAC)ALS (LBNL)APS (ANL)NSLS (BNL)LCLS (SLAC)HFIR (ORNL)Lujan (LANL)SNS (ORNL)CCNM (ANL)Foundry (LBNL)CNMS (ORNL)CINT (SNL/LANL)CFN (BNL)NERSC (LBNL)OLCF (ORNL)ALCF (ANL)Tevatron (FNAL)B-Factory, SLACRHIC (BNL)TJNAF HRIBF (ORNL)ATLAS (ANL)EMSL (PNNL)JGI (LBNL)ARM DIII-D (GA) Alcator (MIT)NSTX (PPPL)

FES

SSRL

ALS

APS

NSLS

HFIRLujan

SNS

NSRCs

NERSC

OLCF

ALCF

Tevatron

B-Factory

RHIC

TJNAF

HRIBF

ATLAS

EMSL

JGIARM D

III-D

Alc

ato

rN

ST

X

Light Sources

Neutron Sources

NanoCenters

ComputingFacilities

High energy physics facilities

Nuclear physicsfacilities

Bio & EnviroFacilities

LCLS

Users Come from all 50 States and D.C.

26,000 users/year at 32 national

scientific user facilities

'82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 '03 '04 '05 '06 '07 '08 '09 '10 '11 0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

10,000

11,000

12,000

LCLS

APS

ALS

SSRL

NSLS

Fiscal Year

Nu

mb

er

of

Us

ers

Synchrotron Light Sources

NSLS 1982SSRL 1974 & 2004NSLS-II 2015

LCLS 2009APS 1996

ALS 1993

6

Researchers may use

more than one facility

Users by Discipline at the Synchrotron Light Sources

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

20

07

20

08

20

09

20

100%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

-

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

5,000

5,500

6,000

6,500

7,000

7,500

8,000

8,500

9,000

9,500

10,000

Life Sciences

Chemical Sciences

Geosciences & Ecology

Applied Science/Engi-neeringOptical/General Physics

Materials Sciences

Other

Total Number of Users

Fiscal Year

% o

f U

sers

Number of Users

7

Advanced Light Source Data Rates

8

Data and Communication in Basic Energy Sciences: Creating a Pathway for Scientific Discovery (2012)

Data Strategy is very

important to productivity

and competitiveness

ASCR and BES, BER, HEP

9

http://science.energy.gov/~/media/ascr/pdf/program-documents/docs/ASCR_DataCrosscutting2_8_28_13.pdf

In April 2013, a diverse group of researchers from the U.S. Department of Energy (DOE) scientific community assembled in Germantown, Maryland to assess data requirements associated with DOE-sponsored scientific facilities and large-scale experiments.

Data Crosscutting Requirements Review

• Many Office of Science experimental facilities anticipate rapid growth in data volume, velocity, and complexity.User Facilities need end-to-end systems that provide more automated workflows and capabilities to ingest, analyze, and manage much larger and more complex data sets generated at faster rates.

• There is an urgent need for standards and community APIs for storing, annotating, and accessing scientific data.The development of standards and protocols for distributed data and service interoperability is essential. Furthermore, API standards will enable collaborations and facilitate extensibility, whereby similar, customized services can be developed across science domains. Such standardization will facilitate data reuse and integration from multiple experiments. It also will be needed as part of any move to provide facility-wide data services.

Crosscutting Requirements Report – Findings

10

K-Base

11

http://kbase.us/

Administration Directives

12

Push for consideration of reuse and repurposing is very timely.

• OSTP Memo: Increasing Access to the Results of Federally Funded Scientific Data

• Open Data Policy – Managing Data as an Asset

• Incentives for sharing: Data rights, licensing, citation, privacy, U.S. research competitiveness

• Sustainability of data

• Maintaining good communication with publishing communities

• Maintaining good communication and coordination with international partners

DOE/SC Interests

13

14

ASCR and BES

http://science.energy.gov/~/media/ascr/pdf/research/scidac/ASCR_BES_Data_Report.pdf

The workshop was organized in the context of the impending data tsunami that will be produced by DOE’s BES facilities. Current facilities, like SLAC National Accelerator Laboratory’s Linac Coherent Light Source, can produce up to 18 terabytes (TB) per day, while upgraded detectors at Lawrence Berkeley National Laboratory’s Advanced Light Source will generate ~10TB per hour. The expectation is that these rates will increase by over an order of magnitude in the coming decade. The urgency to develop new strategies and methods in order to stay ahead of this deluge and extract the most science from these facilities was recognized by all.

15

Reports

This new report discusses the natural synergies among the challenges facing data-intensive science and exascale computing, including the need for a new scientific workflow.

DOE ASCR Advisory Committee (ASCAC) Data Subcommittee Report

http://science.energy.gov/~/media/ascr/ascac/pdf/reports/2013/ASCAC_Data_Intensive_Computing_report_final.pdf