The Innovative Medicines Initiative A winning case for Joint Technology Initiative Status.
The Big Data Platform Initiative of the EC Joint Research ...
Transcript of The Big Data Platform Initiative of the EC Joint Research ...
![Page 1: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/1.jpg)
The Big Data Platform Initiative of the EC Joint Research Centre
European Commission, Joint Research Centre
Directorate I Competences, Unit I.3 Text and Data Mining EO&SS@BigData Project
Joint Research Centre (JRC)
Data analytics workshop for official statistics (daWos)
Amsterdam. 10/09/2018
URL: https://cidportal.jrc.ec.europa.eu Contact: [email protected]
![Page 2: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/2.jpg)
Outline
• Project background
• JEODPP platform concept
• Data holdings
• Services
• Outreach
• Project evolution
![Page 3: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/3.jpg)
Project background
• Explosion of digital data sources led to the big data paradigm (Volume, Velocity, and Variety of data streams).
• Earth Observation (EO) entering big data thanks Copernicus Sentinel satellites (full, free, and open data).
• JRC task force recommended in late 2014 to start a big data pilot project on EO and Social Sensing.
• Initial state: fragmented approach hampering collaborative working and knowledge sharing.
• Project start: January 2015.
![Page 4: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/4.jpg)
Policy context
• REGULATION (EU) No 377/2014 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL 3/4/14 establishing the Copernicus Programme and repealing Regulation (EU) No 911/2010. [JRC also mentioned in proposed new space programme regulation to enter into force by 1.1.2021]
• Communication of the Commission on Data, information and knowledge management at the European Commission (COM(2016)6626-final)
• Communication from the Commission on the European Cloud Initiative (COM(2016) 178 final): The Commission and participating Member States should develop and deploy a large scale European HPC, data and network infrastructure, including: the establishment of a European Big Data centre, E.g. hosted by JRC for multidisciplinary data but focused on INSPIRE/GEOSS/Copernicus spatial data [COM(2016 178 final].
• Communication from the Commission on Artificial Intelligence for Europe (COM(2018) 237 final).
![Page 5: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/5.jpg)
Project milestones
• 2015: survey of user needs and proposal of solutions addressing their needs; endorsement of the concept of JRC Earth Observation Data and Processing Platform (JEODPP)
• 2016: procurement of hardware and first batch processing service with massive runs
• 2017: release of interactive visualisation/analysis and deployment of remote desktop services
• 2018: multi-petabyte extension, development of machine learning capabilities, JIPlib release, user basis in continuation expansion
![Page 6: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/6.jpg)
Indicators
Decisions
Big data
Big geospatial data for policy
Policy relevant information
Data
Volu
me,
Velo
city,
Variety
atmosphere
marine
land
climate
emergency
security
Exploit data volume, velocity, and variety to generate policy relevant information
• Using FAIR data principles (findable, accessible, interoperable, reusable) • With data mining competence in shared and collaborative environment • Relying on reproducible workflows
directives, legislations, communications, …
Earth Observation, in situ, crowd sourcing, social sensing, text data, web scrapping, …
![Page 7: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/7.jpg)
JRC Big Data Platform: Conceptual representation
![Page 8: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/8.jpg)
Infrastructure
Based on commodity hardware and open-source software stack:
• Storage
• CERN EOS distributed file system
• Currently 5 PiB net capacity
• 2 more PiB net for development/testing
• Processing servers (batch processing)
• 1,400 cores over 35 nodes
• 3 GPU servers
• extensions including further GPU servers in late 2018
![Page 9: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/9.jpg)
JEODPP in
As of September 2017
As of September 2018
![Page 10: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/10.jpg)
Main software stack
Source: Soille et al., Future Generation of Computer Systems, 2017 DOI: 1010.1016/j.future.2017.11.007 (in press)
![Page 11: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/11.jpg)
JEODPP access modes [WIKI Link]
• EOS CIFS mount from desktop client (read-only)
• Netapp CIFS mount (read/write) for data transfer
• Terminal service (remote desktop) https://cidportal.jrc.ec.europa.eu/apps/terminal/
• Document & data sharing based on NextCloud https://cidportal.jrc.ec.europa.eu/apps/cloud/
planned federation with JRCBox
• FTPS for file transfer to EOS
• JHub https://cidportal.jrc.ec.europa.eu/jhub/ for
• interactive visualisation and analysis
• tailored Docker containers for development
![Page 12: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/12.jpg)
JEODPP current space usage
![Page 13: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/13.jpg)
Connecting storage and processing via cloud sharing services
![Page 14: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/14.jpg)
Low-level batch processing
• Running large-scale data processing tasks in a cluster environment
• Docker containers for flexible management of processing environments
• Custom builds for different requirements
• Facilitates upgrades of processing environment (libraries, tools)
• Run through a workload manager
• HTCondor scheduler
• Extensive use for large scale processing/analysis
![Page 15: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/15.jpg)
JEODPP Batch Processing System
Diverse user environments originating from different: • libraries • tools • software • versions • distros: Debian/Centos
Docker images are built based on user requirements
Container-based cluster management
REPOSITORY TAG Info SIZE
jipl_S1toolbox-dev 2.0 snap 4.0 6.269 GB
jipl_S1toolbox-dev 1.0.1 snap 2.0.2 6.282 GB
ghsl_se2cor-dev 1.0 snap 2.0 4.742 GB
critech_ipython_deltares-dev 1.0 python 2.7 6.939 GB
marsec_MCR 1.0 MatLab run time 2015b 3.082 GB
jipl-dev 1.0 3.666 GB
marsec_sumo-dev 1.0 java 1.8 2.842 GB
canhemon_grass-dev 1.0 debian testing, python 3.0 3.397 GB
cloudmask-download v0_2 74994254f754 11 weeks ago 444.8 MB1.0.1 3.421 GB
cloudmask-download 1.0.0 3.421 GB
sentinel-download 1.0 3.121 GB
![Page 16: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/16.jpg)
Examples of batch processing scientific workflows on JEODPP
![Page 17: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/17.jpg)
JEODPP batch processing monitoring
![Page 18: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/18.jpg)
JEODPP Terminal Service via Web https://cidportal.jrc.ec.europa.eu/apps/terminal/
• A pool of Docker containers running next to the data
• Linux desktop environment • Standard software installed
QGIS, GRASS IDL/ENVI, Matlab (personalised licenses) R (R, R Commander, Rstudio) Python, Jupyter-lab, Jupyter-notebook Additions on request
• Relies on HTML5 and runs in FF, IE, and Chrome
• For prototyping, ad hoc products’ analysis/visualisation, and launch batch processing
![Page 19: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/19.jpg)
JEODPP users • 35 use-cases • From 16 units • Across 8 directorates
![Page 20: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/20.jpg)
Interactive visualization and analysis with Jupyter
• Web interface to visualize and analyze any kind of data in a single document called a Jupyter notebook
• Jupyter notebooks integrate live code, equations, visualizations, and narrative text.
• Facilitate knowledge sharing, collaborative working, and reproducible workflows.
• Suitable to non-programmers by integrating GUIs based on widgets (buttons, sliders, etc.).
![Page 21: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/21.jpg)
Jupyter ecosystem
http://jupyter.org/
![Page 22: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/22.jpg)
JupyterLab ecosystem (evolution of Jupyter)
![Page 23: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/23.jpg)
ipyleaflet
https://github.com/ellisonbg/ipyleaflet
![Page 24: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/24.jpg)
ipywidgets and bqplot
https://github.com/jupyter-widgets/ipywidgets https://github.com/bloomberg/bqplot
![Page 25: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/25.jpg)
From big data to interactive rendering and analysis
Source: FGCS, 2017, DOI: 10.1016/j.future.2017.11.007
+ in Situ data
![Page 26: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/26.jpg)
Global Human Settlement Layer with Global Surface Water Occurence on top of Global S1 mosaic
![Page 27: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/27.jpg)
Html export to facilitate outreach (example with ALOS DEM)
![Page 28: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/28.jpg)
Execution of arbitrary python code in interactive mode (e.g. for MSPA)
![Page 29: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/29.jpg)
Takeaway messages
• Exponential growth of data and data sources.
• The big data paradigm is permeating all fields.
• FAIR data principles also applies to data analysis.
• Challenge of turning data into insights facilitated by platforms with data co-located with processing.
• Jupyter notebooks contributes to reproducible analysis as well as knowledge sharing and collaborative working.
• Importance of interactive analysis and visualisation.
• Open standards including open API are needed to avoid platform lock-in.
![Page 30: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/30.jpg)
Project evolution: Big Data Analytics (2019-2020)
• Innovative approaches (AI/machine learning) for combining large amounts of data originating from different sources
• Enabled by the JRC Big Data Platform (JEODPP)
• Initial focus on geospatial data and their combination with other data sources
• Key enabler of data and knowledge sharing across JRC and towards partners
• Link with DIAS (support to DG GROW and possible partnership with WEkEO DIAS)
• Key role of openEO H2020 project (definition of common API)
![Page 31: The Big Data Platform Initiative of the EC Joint Research ...](https://reader031.fdocuments.net/reader031/viewer/2022013001/61cb83d545dca9780f66d043/html5/thumbnails/31.jpg)
Thank you for your attention!
EO&SS@BigData pilot project Unit I.3 Text and Data Mining Unit Directorate I Competences
GEO-WEEK, Washington DC, Oct 2017
https://doi.org/10.1016/j.future.2017.11.007 Publication list: https://cidportal.jrc.ec.europa.eu/home/publications