Professor of Biometry, Statistics & Horticulture creating a Brian ... · Professor of Biometry,...
Transcript of Professor of Biometry, Statistics & Horticulture creating a Brian ... · Professor of Biometry,...
creating a data science institute
at UW-MadisonInformation & Technology Leadership Conference
9 April 2019
Brian YandellProfessor of Biometry, Statistics & Horticulture
abstractDigital Data are now everywhere, in every sector, in every organization. Deep use of data creates significant value, enhancing knowledge, productivity and competitiveness. There is a significant need to develop and implement methods to effectively harness the tremendous amounts of available data.
To address this need, the university is preparing to launch a new Data Science Institute, which will include the already successful Data Science Hub as an important outreach component. This presentation will discuss how the Data Science Institute will provide a collaborative forum in which research is accelerated, creating new value from large and noisy data sets emerging across all disciplines.
store datahow?
study datawhat?
UW researchwhy?
computewhen?
industry
public
manage datawhere?
bigger picture● data science is connected to larger transformations of academy
○ data is reshaping our world○ reorient UW to bring power of data science to all fields of study○ future of data science will be shaped by insights from all other disciplines
● other academic activity○ Moore-Sloan DS Environments
■ $32M; see Creating Institutional Change (2017) ■ key sites: UWA eScience, UC Berkeley, NYU■ others in MSDSE (15 including UMI MIDAS)■ Data Science Leadership Summit Report (2018)
○ MIT's new $1B College of Computing
● federal agencies○ NSF 4 Big Data HUBs & 10 Big Ideas○ Federal Big Data Research & Development Strategic Plan (2016)
UW data science initiatives over the past 2 years
● Winter-Summer 2017: new WID Director with Hub / Town Hall concepts● Summer 2017-2019+: Data Science Hub (funded by UW2020 in May 2018)● Fall 2017-2019+: Data Science Undergraduate Major Working Group● Spring-Fall 2018: Computing Working Group*● Spring-Summer 2018: RTAG Research Computing & Data White Paper● Summer 2018: FoxConn Inst for Research in Science & Technology (in CoE)● Fall 2018: OVCRGE Data Science (Institute) Planning Committee*● Fall 2018-Spring 2019: Computer, Data & Information School proposal* (in L&S)● Spring 2019: OVCRGE Data Science Institute creation*● Spring 2019: Research Computing & Data design proposal* (with CIO)
*: charged directly or indirectly by Chancellor
Data Science Hub Leadership
● data science Hub Steering Committee (Summer 2017---)○ Michael Ferris (CS, Director)○ Brian Yandell (Stat, Hort, co-Lead), Lauren Michael (CHTC, co-Lead)○ Cameron Cook (RDS), Russell Dimond (SSCC Associate Director), AnHai Doan (CS), Kristin Eschenfelder
(iSchool Director), Young Mie Kim (Journalism), Michael Newton (Stat, BMI Interim Chair), Robert Nowak (ECE); Eric Wilcots (Astronomy, L&S Deputy Dean), Stephen Wright (CS)
○ Former: Katherine Curtis (CES, APL Director), Paul Rathouz (BMI Chair, now Director Biomed Data Science Hub, U TX)
● funding proposals○ UW2020 proposal to fund Data Science Hub (Dec 2017; funded May 2018)○ Data Science Initiative proposal on Consulting Network (Mar 2018; declined)○ NSF NRT proposal (Spring 2019; in review)
● initial staff○ Sarah Stevens, Data Science Facilitator○ Whitney Sweeney, Data Science Administrator
verbs of a data science ecosystem
data science hub goal:● coordinate and execute data
science strategy ● via campus-wide research
network ● fill critical gaps● support data science growth● encourage cross-fertilization
http://datascience.wisc.edu
http://datascience.wisc.edu/
empowering people to better design and tackle data-rich research projects
• coordinate resources & expertise• connect researchers & tools • collaborate to build community & capacity
Data Science Hub Connections
• office hours: Hub Central at Discovery– Wednesdays: 9:30 - 11:30 am– Thursdays: 3 - 5 pm– Office Hours Plus: many data science experts in one place
• website: https://datascience.wisc.edu/events/
• newsletter: datascience.wisc.edu/newsletter/ (700+)
• Carpentries (UW is national leader)– Data Carpentry for target audiences– Software Carpentry for advanced researchers– Carpentries Instructor Training
• Broader training network
– Social Sciences Computing Cooperative (Stata, …)– R for Researchers– DoIT Software Training for Students– Lynda online: productivity, graphic design, web, code
Data Science Training / Workshops
Social Sci Comp Coop
DSHub Facilitation
Database Group
Bioinfo Shared Res
Cancer Info Shared Res
Biomed Informatics
Biostat & Epi Research
Design
Grainger Inst Mach Alg DataBiometry
compute
Carpentry & Training
NSF/IFDS
data science consulting ecosystem
NSF/LUCID store data
Bioinfo Resource
Ctr
manage data
Design Lab
collab orate
frontier research
PIs
consulting groups
center partners
core partners
core projects
affiliated individuals
communities of practice
translate
training & outreach data & compute
infrastructure
partnerships across campus ecosystem
data science undergraduate major
● stat, math, CS, iSchool collaboration○ Bret Larget (Stat, Bot, Director), Remzi Arpaci-Dusseau (CS), Gloria Mari-Beffa (Math,
Associate Dean L&S)○ AnHai Doan (CS), Kristin Eschenfelder (iSchool Director), Kyung-Sun Kim (iSchool Interim Director),
Elaine Klein (Associate Dean L&S) Dorothea Salo (iSchool), Eric Wilcots (Astronomy, L&S Deputy Dean), Brian Yandell (Stat, Hort)
○ inclusive dialog with units in other colleges● 5 core courses (certificate)
○ Data Modeling I & II: R, data wrangling, viz, statistical thinking, models○ Data Programming I & II: Python, programming, databases, structures○ Data Science Ethics: ethical issues in data science
● BS/BA proposal in process of approval● begin in 2020 or 2021(could grow quickly to be large majors)
L&S School of Computer, Data & Information Science
● Computer Working Group (Jan-Oct 2018)○ Tom Erickson (tech entrepreneur, UW–Madison ECE graduate), Michael Lehman (Silicon
Valley veteran, UW–Madison’s interim CIO 2017-18)○ Remzi Arpaci-Dusseau (CS), Jake Blanchard (Engr Physics, Associate Dean CoE), AnHai Doan (CS),
Kathleen Gallagher (Milwaukee Institute ED), Jon Hopkins (software entrepreneur, UW–Milwaukee CS & School of Engr), Erik Iverson (WARF Managing Director), Tim Norris (OVC for Finance & Administration), Brian Pinkerton (former CTO of Chan Zuckerberg Initiative, CS alum), Dale Smith (US Bank Executive VP), Tom Still (WI Tech Council President)
● WI in the Computer Age: (Oct 2018-Mar 2019)○ Gloria Mari-Beffa (Math, Associate Dean L&S), Greg Downey (Journalism, iSchool, Associate
Dean L&S)○ Remzi Arpaci-Dusseau (CS), AnHai Doan (CS), Kristin Eschenfelder (iSchool Director), Kyung-Sun Kim
(iSchool Interim Director), Bret Larget (Stat, Bot), Dorothea Salo (iSchool), Brian Yandell (Stat, Hort)
Research Computing & Research Data Infrastructure
● RTAG Working Group: Research Computing & Research Data (Jun-Jul 2018)○ Katrina Forest (Bact, RTAG Chair)○ Karthik Anantharaman (Bact), Andrew Arnold (SSCC Director), Jan Cheetham (CIO Research
Cyberinfrastructure Liaison), Cameron Cook (RDS), Kristin Eschenfelder (iSchool Director), Isabelle Girard (Campus Research Cores Director), David Page (BMI, CS), Philip Townsend (FWE), Paul Wilson (Engr Physics, ACI Director), Brian Yandell (Stat, Hort)
○ report presented to RTAG & OVCRGE 31 July
● Strategic Plan for Research Cyberinfrastructure (Fall 2018-Spring 2019)○ Lois Brooks (CIO), Norman Drinkwater (VCRGE)○ Jan Cheetham (CIO Research Cyberinfrastructure Liaison)○ draft plan in process
Data Science Institute
● Data Science Planning Committee (Oct-Dec 2018)○ Brian Yandell, Chair (Statistics, Horticulture)○ Michael Ferris (Computer Science, WID), Katrina Forest (Bacteriology, RTAG chair), Martin Foys (English),
Jan Greenberg (Social Work, OVCRGE), Young Mie Kim (Journalism), Lauren Michael (DS Hub), Robert Nowak (ECE), David Page (Biostatistics and Medical Informatics), Maureen Smith (Population Health Sciences), Alan Sorensen (Economics), Eric Wilcots (Astronomy, L&S Deputy Dean), Stephen Wright (Computer Sciences)
● Data Science Institute Creation Committee (Jan-Mar 2019)○ Brian Yandell, Co-chair (Statistics and Horticulture); Steven Ackerman, Co-chair
(Atmospheric & Oceanic Sciences; Associate Vice Chancellor for Physical Sciences)○ Katrina Forest (Bacteriology); Florence Hsia (History of Science; Associate Vice Chancellor for Arts &
Humanities); Robert Nowak (Electrical & Computer Engineering); Petra Schroeder (Associate Vice Chancellor for Finance & Administration); Maureen Smith (Population Health Sciences); Alan Sorensen (Economics); Stephen Wright (Computer Sciences)
○ approved by University Research Council (6 Mar 2019)○ on calendar for University Academic Planning Council (18 Apr 2019)
DSI overview● Data Science: study, development, and application of methods that reveal new insights from data
○ Data: numbers, text, images, graphs, sounds, code, metadata, ...● Mission:
○ perform cutting-edge research in the fundamentals of data science○ translate this research by partnering with key application areas○ collaborate with researchers across divisions to advance scientific discovery
● Value Proposition○ leverage foundations of data science across all applications○ build on UW-Madison’s considerable strengths in multiple areas ○ collaborate with domain scientists to apply new algorithmic and computational tools○ advance key application areas to enhance prospects for large-scale funding
● Scope○ in: world-class data science research; broad partnerships & collaboration across campus○ out: create / maintain research computing & data storage (RTAG white paper)
● Components○ functions: research, translation, and collaboration○ relationships & partnerships○ organization & governance
data
science
institute
= +R R R +
ResearchCollaboration
data science hub
Computing & Data Infrastructure
Research Research Research
Training Concierge Events
omics dark data
deep learning spatial
Translation
DSI AdministrationDirector, Asst Director, staff
Funded Research Teamfaculty & postdocs
Data Translation Teamdata scientists
Hub Collaboration Teamdata science facilitators
organization & governance● leadership and governance model
○ (Interim) Director reports to VCRGE○ Assistant Director for Administration reports to Director○ Leadership Team = Director, Assistant Director, Faculty Fellows○ Advisory Board periodically will meet with Director
● organizational structure○ Faculty Fellows (5-year with space in DSI) & Faculty Affiliates○ Scientific staff
■ Data Science Facilitators operate Data Science Hub■ Data Scientists (in DSI) & Research Scientists (with research teams)■ Postdocs, graduate students (in DSI or affiliated)
○ Administrative staff■ administration, communications, IT
○ Advisory board: internal spanning divisions; external advisors
data science case studies● How to improve data quality in research domains?
○ bottlenecks of data curation ● How to best integrate diverse data into systems-level models?
○ multiple data sources, models at scale, policy challenges ● How to help nonprofits, governments & companies make better decisions?
○ prepare managers to use data to make better decisions ● What are key evolving patterns in human literature?
○ text analytics on 17M-volume Hathi Trust● What chemotherapy works best for this cancer patient?
○ data lakes, HIPAA, biomed visualization, and real-time team decisions ● How does genetics influence metabolic disease?
○ central dogma DNA-RNA-protein at scale and with microbiomes
- ···- ····-· ·1
Saffron+ 'J, 0
// T
C-::::_l
/"'- .,,o,,,
•o
..............,, .. E·'>.,s.-----:=1+ 0+
++ ++0
Aldo's Cafe
ii
I
!
Steenbock's On Orchard
••
••HUB Central at Discovery
UP
UP
Ref
riger
ato
r
5' -
4"
REMOVED 3.28.19 - OPTION 2•••
•••••
(9) FACULTY(6) SCIENTISTS - 2 IN FACULTY SPACES(3) FACILITATORS - TOUCH-DOWN DESKS W/ MOBILE STORAGE IN COLLABORATION SPACE(6) ADMIN(1) DIRECTOR(1) RECEPTION(10) GRAD STUDENT /POSTDOC(1) CONFERENCE ROOM - 14 SEATS
ADMIN
ADMIN (3) FACILITATORS
0468 1902MCARDLE, 10TH FLOOR FURNITURE PLAN
MOBILE MARKERBOARD
GRAD STUDENT
GRADSTUDENT
FACULTY
FACULTY
GRADSTUDENT
SCIENTIST
FACULTY
GRADSTUDENT
CONFERENCE
GRADSTUDENT
GRADSTUDENT
FACULTY SCIENTIST
GRADSTUDENT
FACULTYFACULTY
GRADSTUDENT
GRAD STUDENT
GRAD STUDENT
FACULTYFACULTY
(4) SCIENTISTS FACULTY
FLEX
FLEX
STAIR1000A
STAIR1000B
ELEV1000H
ELEV1000I
CORRIDOR1000J
ELEV1000Z
JANITOR1005
SHARED OFFICE1006
SHARED OFFICE1010A
OFFICE1010B
1010C
Room1010
DIRECTOR'S OFFICE1009A
ADMINOFFICE1009B
RECEPTION
1009
COPY1012
1013
SHOWER1014
OFFICE1015
RESTROOM1016
KIT
CH
ENET
TE
1017
SHARED OFFICE1018
MULTIPURPOSE ROOM1019
OFFICE1015A
UTILITY1016A
OFFICE1018A
SHARED OFFICE1022 SHARED OFFICE
1022A
SERVER1024
LOBBY1025
SHARED OFFICE1026
RESTROOM1027
SHARED OFFICE1028
ELEV1000Y
1005
A
X1
X1
X1SHARED OFFICE
X1
X1
X1
X1
X1
X1X1
X1?
X1/PROJECTION?
X1
X1?
X1?
X1
16 MONITORS ON THIS FLOOR
DSI space: McArdle Lab 10th Floor
DN
DN
U.C
.R
efr
.
STAIR1100A
STAIR1100B
CORRIDOR1100J
ELEV1100Y
CONF SVC1109A
UTILITY1109B
CONF SVC1109C
CONFERENCE1117
Room1118
BREAKOUT1121
BREAKOUT1125
BREAKOUT1123
RESTROOM1127
(1) SEMINAR ROOM, 75 SEATS, OR 42 AT TABLES & CHAIRS(1) LARGE CONFERENCE, 18 SEATS, MOBILE TABLES(3) BREAKOUT ROOMS, 6 SEATS EACH, FURNITURE VARIES(1) KITCHENETTE
SEMINAR
0468 1902MCARDLE, 11TH FLOOR FURNITURE PLAN
X1
X1X1
X1
X1
1109X? - HOW MANY? X1
X1
DSI space: McArdle Lab 11th Floor
AV screens throughout: inform, present, share
virtually connect physical spacessage2 at U HI and U IL
airport advertising ...