The Campus Data Storage & Services Task Force: Key Findings & Recommendations Michael Grady, Office...

28
The Campus Data Storage & Services Task Force: Key Findings & Recommendations Michael Grady, Office of the CIO Beth Sandore Namachchivaya, Library University of Illinios at Urbana- Champaign https:// wiki.cites.uiuc.edu /wiki/ display/DSST/

Transcript of The Campus Data Storage & Services Task Force: Key Findings & Recommendations Michael Grady, Office...

The Campus Data Storage & Services Task Force: Key Findings & Recommendations

Michael Grady, Office of the CIOBeth Sandore Namachchivaya, LibraryUniversity of Illinios at Urbana-Champaign

https://wiki.cites.uiuc.edu/wiki/ display/DSST/

Charge

Objective: help the campus develop a strategy around storage

“develop a comprehensive strategic plan for addressing the central data storage needs of this campus, including specific operational

ideas for implementation”

Time Line

August – December 2011: Survey, info. gathering, meetings, brainstorming, working groups, input from multiple constituencies

January-February 2012:Working group reports, mini-retreat, final report

March 9, 2012: final report submitted

Task Force Activities

• Conduct & analyze campus baseline survey• Develop common agenda • Environmental scan: peer institutions• Collect & analyze use cases• Prioritize key needs & issues

Analyzed Storage Needs For:• Enterprise storage & services • Web storage and services at the unit level• Individual student/staff/faculty• Research data (incl. services)• Institutional/administrative • Long-term preservation and curation• Instructional needs

Data Storage Task Force:Storage Survey Findings

• 7 PB of storage across 50 data centers• 9 units support 90% of storage; heavy focus

on research• Inconsistent sensitive data services• Few data replication services• Backup services inconsistent• “Every tub on own bottom” = ineffective

use of finite technical expertise• Brittle purchase and refresh cycles

Units With Significant Storage

0

200

400

600

800

1000

1200

1400

Nine Units* with Largest Allocated & Unallocated Storage

Allocated Unallocated

Tera

byte

s

*NCSA eliminated due to uniqueness of its storage resources, which are a national resource.

Remaining Storage Distribution

0

10

20

30

40

50

60

70

80

Allocated & Unallocated Storage for Remaining 24 Campus Units

Allocated Storage Unallocated

Tera

byte

s

Survey: Unmet Needs• Capacity, accelerated need for growth• Sensitive data support for research &

administrative data• Affordable backup, archiving, and timely

data recovery • Easy sharing and collaboration with data on

campus & off• Remote accessibility• Cloud storage—elastic expansion

Survey: Obstacles

• Funding: lack of recurring funds• Staffing: insufficient staff expertise• Existing services too complicated• Education about what is available, and how

to use it

Storage & Research Lifecycle• Manage sensitive data!!!• Long-term (5 years +) managed storage for high capacity datasets;• Ease of use—move data from HPC to easily-identified storage environments

suited to the data needs• Create and manage pools of storage across and among campus units;• Collaboration tools to enable sharing of research data among

individuals/groups• Storage-related services:

– Tiered storage to support sensitive data (HIPAA, classified data, etc.), high and low availability;

– Backup, replication, and archiving per requirements– Research data consulting services, incl. data management planning,

database design, curation, migration, and preservation; – Data archiving

Analysis: Urbana Campus NSF Data Management Plans

• 341 proposals with DMPs (updates and supplements not included)

• 43 proposals used Grainger Engineering Library template and mention assistance from Grainger Library in their proposal (12.61%)

• 57 proposals identified IDEALS IR as a location where data will be deposited (includes the 43 from above) (16.72%)

• 52 proposals used the single sentence "See GPG Chapter II.C.2.j for guidance on contents" for their DMP (15.25%)

Compiled by William Mischo and Mary Schlembach

NSF DMP Data Storage: UIUC• RAIDs• work or lab computers• research group servers• external hard drives• no storage format provided (2 cases)• outside repository• “Unique storage solutions”• NCSA data test bed• custom built processor• research group cluster

Compiled by William Mischo and Mary Schlembach

Workplace Productivity, Instruction, and Institutional Assets

• Easy to manage, easy to federate• Central support for file-system storage and

services• Central support for database content and

access services (incl. virtualization)• Central multimedia storage• LMS content storage and access• Preservation storage: file system and block

level

Sensitive Data, Security, and Privacy • Potential campus exposure from non-

secure clinical research data storage• Potential exposure from non-secure use of

student, HR, and other administrative data• Not every unit can manage sensitive data

adequately—risk of exposure multiplies• Need policies and best practices around

research compliance and administrative practice

Storage Architecture, Technology, Delivery, and Cost Models

• Peers have more robust central storage infrastructure

• Cost efficiencies• Staff efficiencies• Advantages of late entry to this area• Need network and storage architecture

work

Storage Architecture

Current Decentralized Storage• Predicated on “just in case” needs• Non-recurring funding• Significant staff investment within units• Over-priced, obsolete storage• Locks units into inflexible solutions

Benefits of Centralized Storage

• Supports “just in time” allocation• Easy access to unallocated storage• Consistent support for HPC storage• Requires fewer central staff to manage• Edge IT Pro’s can focus on edge

technologies

Recommendations: Strategy

• Campus Storage Management Governance• Share Storage Resources • Provide Centrally Managed Storage• Provide Storage Management Services• Incorporate Cloud Services• Provide Best Practices and Policies• Adopt “evergreen” approach to storage refresh

Recommendations:

Actions, time frames, responsible parties…

Rec’s #1

Rec’s #2

Key Takeaways

• Need a “storage evolution” campus-wide• Ensure culture of excellence while shifting

infrastructure focus to middle• Govern & provision based on user needs• Enable provision of domain-specific edge

technologies• Risk too great to not take action

Synergistic Work: 2010-12Urbana Campus

• Data Center Consolidation– Reduce small data center footprint across campus– Support research computing cluster https://campuscluster.illinois.edu/

• NSF Data Management plan requirement: 1/2011• Data Stewardship Committee

– Focus on research data needs & policy

• CIO Cyberinfrastructure planning• IT governance

Task Force & Working Groups:

Mike Grady, Office of the CIO ; co-chairBeth Sandore Namachchivaya, University Library; co-chairJason Alt, NCSAJack Brighton, College of Media, CMEMichelle Butler, NCSAMike Corn, Office of the CIODan Davidson, IGBJennifer Eardley, DBS, OVCRMichael Edwards, College of LASDavid Gerstenecker, College of ACESGabe Gibson, College of LASHoward Guenther, OVCRTom Habing, University LibraryMaggie Helms, DBS

Josh Henry, College of ACES Alice Jones, AITS/UAJoanne Kaczmarek, University ArchivesJackie Kern, Facilities & ServicesCharley Kline, CITESCarol Livingstone, DMICarol Malmgren, Office of the Registrar Glenda Morgan, Office of the CIOFrank Penrose, College of EngineeringSarah Shreeves, University LibraryJason Strutz, University LibraryChuck Wallbaum, School of Chemical SciencesKristopher Williams, Materials Research Laboratory