DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm...
Transcript of DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm...
![Page 1: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/1.jpg)
1
DR / CNRS - UPSaclay LAL & LRI
BALÁZS KÉGL
MdC / Telecom ParisTech LTCI
ALEXANDRE GRAMFORT
MdC / Mines ParisTech CGS
AKIN KAZAKÇI
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data ScienceParis-Saclay
Postdoc / UPMC LIP6
DJALEL BENBOUZID
![Page 2: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/2.jpg)
Center for Data ScienceParis-Saclay
• Paris-Saclay Center for Data Science
• the data science ecosystem
• Analytics tools
• data challenges
• rapid analytics and model prototyping
2
OUTLINE
![Page 3: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/3.jpg)
Center for Data ScienceParis-Saclay
DATA SCIENCE
3
Design of automated methods
to analyze massive and complex data
to extract useful information
![Page 4: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/4.jpg)
Center for Data ScienceParis-Saclay4
DATA SCIENCE=
BIG DATA
We are focusing on inference:
data knowledge
Interfacing with infrastructure, security, production
![Page 5: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/5.jpg)
5
UNIVERSITÉ PARIS-SACLAY
19 founding partners
![Page 6: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/6.jpg)
Center for Data ScienceParis-Saclay
UNIVERSITÉ PARIS-SACLAY
6
+ horizontal multi-disciplinary and multi-partner initiatives (“lidex”) to create cohesion
![Page 7: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/7.jpg)
Center for Data ScienceParis-Saclay7
Center for Data ScienceParis-Saclay
A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay
http://www.datascience-paris-saclay.fr/
Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale
ChemistryEA4041/UPSud
Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique
EconomyLM/ENSAE RITM/UPSudLFA/ENSAE
NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA
Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud
The Paris-Saclay Center for Data ScienceData Science for scientific Data
250 researchers in 35 laboratories
Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry
VisualizationINRIALIMSI
Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA
StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech
Data sciencestatistics
machine learninginformation retrieval
signal processingdata visualization
databases
Domain sciencehuman society
life brain earth
universe
Tool buildingsoftware engineering
clouds/gridshigh-performance
computingoptimization
Data scientist
Applied scientist
Domain scientist
Data engineer
Software engineer
Center for Data ScienceParis-Saclay
datascience-paris-saclay.fr
@SaclayCDS
LIST/CEA
![Page 8: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/8.jpg)
Center for Data ScienceParis-Saclay8
THE DATA SCIENCE ECOSYSTEM
Data domainsenergy and physical sciences
health and life sciences Earth and environment
economy and society brain
Data scientist
Data trainer
Applied scientist
Domain scientistSoftware engineer
Data engineer
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
Tool building software engineering
clouds/grids high-performance
computing optimization
![Page 9: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/9.jpg)
Center for Data ScienceParis-Saclay
TOOLS
9
We are designing and learning to manage
tools
to accompany data science projects
with different needs
![Page 10: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/10.jpg)
Center for Data ScienceParis-Saclay
TOOLS: LANDSCAPE TO ECOSYSTEM
10
Data scientist
Data trainer
Applied scientist
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges
• coding sprints • Open Software Initiative • code consolidator and engineering projects
• bootcamps / hackathons • IT platform for linked data • annotation tool • SaaS data science platform
![Page 11: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/11.jpg)
Center for Data ScienceParis-Saclay
TWO ANALYTICS TOOLS
11
RAPID ANALYTICS AND MODEL PROTOTYPING
DATA CHALLENGES
![Page 12: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/12.jpg)
Center for Data ScienceParis-Saclay
• A data challenge is a recently developed unconventional dissemination and communication tool
• a scientific or industrial data producer arrives with a well-defined problem and a corresponding annotated data set
• defines a quantitative goal
• makes the problem and part of the data set (the training set) public on a dedicated site
• data science experts then take the public training data and submit solutions (predictions) for a test set with hidden annotations
• submissions are evaluated numerically using the quantitative measure
• contestants are listed on a leaderboard
• after a predefined time, typically a couple of months, the final results are revealed and the winners are awarded
12
DATA CHALLENGES
![Page 13: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/13.jpg)
Center for Data ScienceParis-Saclay
• The HiggsML challenge on Kaggle
• https://www.kaggle.com/c/higgs-boson
13
DATA CHALLENGES
![Page 14: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/14.jpg)
Center for Data ScienceParis-Saclay
• Official ATLAS GEANT4 simulations
• 30 features (variables)
• 250K training: input, label, weight
• 100K public test (AMS displayed real-time), only input
• 450K private test (to determine the winner after the closing of the challenge), only input
• public and private tests are shuffled, participants submit a vector of 550K labels
14
HUGE PUBLICITY B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
14
![Page 15: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/15.jpg)
Center for Data ScienceParis-Saclay
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
15
B. Kégl / AppStat@LAL Learning to discover
CLASSIFICATION FOR DISCOVERY
15
![Page 16: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/16.jpg)
16
HUGE PUBLICITY
SIGNIFICANT IMPROVEMENT OVER THE BASELINE
yet partially missing the objectives
![Page 17: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/17.jpg)
Center for Data ScienceParis-Saclay
• Challenges are useful for
• generating visibility in the data science community about novel application domains
• benchmarking in a fair way state-of-the-art techniques on well-defined problems
• finding talented data scientists
• Limitations
• not necessary adapted to solving complex and open-ended data science problems in realistic environments
• no direct access to solutions and data scientist
• emphasizes competition
17
DATA CHALLENGES
![Page 18: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/18.jpg)
18
We decided to design something better
![Page 19: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/19.jpg)
Center for Data ScienceParis-Saclay19
• Single-day coding sessions
• 20-30 participants
• preparation is similar to challenges
• Goals
• focusing and motivating top talents
• promoting collaboration, speed, and efficiency
• solving (prototyping) real problems
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 20: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/20.jpg)
20
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 21: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/21.jpg)
Center for Data ScienceParis-Saclay21
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 22: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/22.jpg)
22
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 23: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/23.jpg)
23
RAPID ANALYTICS AND MODEL PROTOTYPING
![Page 24: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/24.jpg)
24
ANALYTICS TOOLS TO PROMOTE COLLABORATION AND INNOVATION
![Page 25: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/25.jpg)
25
ANALYTICS TOOLS TO MONITOR PROGRESS
![Page 26: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/26.jpg)
Center for Data ScienceParis-Saclay
• Algorithm selection and hyperparameter optimization
• studying human problem-solving
• combining human solutions with automatic tools
• comparing and tuning hyperparameter optimizers
• meta-learning: embedding data sets and models, collaborative optimization
26
RESEARCH (BEYOND SOLVING PROBLEMS)
![Page 27: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/27.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
27
2015 Jan 15 replaying the
HiggsML challenge
![Page 28: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/28.jpg)
Center for Data ScienceParis-Saclay
2015 Feb 9 Mortality prediction in septic patients
RAPID ANALYTICS AND MODEL PROTOTYPING
28
![Page 29: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/29.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
29
2015 Apr 10 Classifying variable stars
![Page 30: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/30.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
30
2015 May Drug identification from spectra
![Page 31: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/31.jpg)
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
31
2015 June Insect classification
![Page 32: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/32.jpg)
Center for Data ScienceParis-Saclay
• Short-term, ad-hoc teams assembled for a given task
• Low-engagement consulting job
• Efficient use of scarce data scientists (i.e., your time)
• Developing and practicing marketable skills
• fast-feedback experimentation is also useful in research
• Networking
• Management meta-tools to track your performance, to guide your training
32
IMPLEMENTING RAMPS IN AN INDUSTRIAL CONTEXT
![Page 33: DJALEL ENBOUZID ALEXANDRE GRAMFORTLM/ENSAE RITM/UPSud LFA/ENSAE Neuroscience UNICOG/Inserm U1000/Inserm NeuroSpin/CEA Particle physics astrophysics & cosmology LPP/Polytechnique DMPH/ONERA](https://reader034.fdocuments.net/reader034/viewer/2022042909/5f3c59fac622340fb522ae69/html5/thumbnails/33.jpg)
Center for Data ScienceParis-Saclay33
THANK YOU!