BIMCV: The Perfect "Big Data" Storm.

121
e-Infraestructuras Nacionales BIMCV: The Perfect "Big Data" Storm. Maria de la Iglesia, PhD. http://ceib.san.gva.es Collision of Peta Bytes of Population Image Data, Millions of Hardware Devices and Thousands of Software Tools.

Transcript of BIMCV: The Perfect "Big Data" Storm.

Page 1: BIMCV: The Perfect "Big Data" Storm.

e-Infraestructuras Nacionales

BIMCV: The Perfect "Big Data" Storm.

Maria de la Iglesia, PhD. http://ceib.san.gva.es

Collision of Peta Bytes of Population Image Data, Millions of Hardware Devices and Thousands of Software Tools.

Page 2: BIMCV: The Perfect "Big Data" Storm.

OVERVIEW

• Big Data

• Strategic Vision of Big Data in EU

• Strategic Vision of Big Data in US

• Big Data in Neuroimaging

• Population Imaging

• EuroBioimaging – BIMCV – Valencia Node

• Neuroimaging

• Relevant facts

Page 3: BIMCV: The Perfect "Big Data" Storm.

Big Data

Page 4: BIMCV: The Perfect "Big Data" Storm.
Page 5: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Techniques for analyzing big data

– A/B testing.

• Association rule learning.

• Classification.

• Cluster analysis.

• Crowdsourcing.

• Data fusion and data integration. – Signal processing

– natural language processing

• Data mining.

Page 6: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Techniques for analyzing big data

– Ensemble learning

– Genetic algorithms

– Machine learning

– Natural language processing (NLP)

– Neural Networks

• Pattern recognition

– Network analysis

Page 7: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Techniques for analyzing big data

– Optimization

• Pattern recognition

• Predictive modeling.

• Regression.

• Signal processing– time series analysis

– data fusion

• Spatial analysis.

• Statistics.

Page 8: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Big DataTechnologies– Big Table. (Proprietary distributed database system

built on the Google File System. Inspiration for Hbase)

– Business intelligence (BI). BI tools are often used to read data that have been previously stored in a data warehouse or data mart

– Cassandra. An open source (free) database management system designed to handle huge amounts of data on a distributed system. This system was originally developed at Facebook and is now managed as a project of the Apache Software foundation

Page 9: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Big DataTechnologies– Cloud computing. – Data mart. – Data warehouse. using ETL (extract, transform, and load) – Distributed system. – Dynamo – ETL– Google File System. – Hadoop– HBase. – MapReduce. – Mashup

Page 10: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Big DataTechnologies– Non-relational database.

– R.

– Relational database.

– Semi-structured data.

– SQL.

– Stream processing.

– Structured data.

– Unstructured data.

– Visualization.

Page 11: BIMCV: The Perfect "Big Data" Storm.

Big data techniques and technologies

• Big DataTechnologies

– VISUALIZATION

• Tag cloud

• Clustergram

• History flow

• Spatial information flow

Page 12: BIMCV: The Perfect "Big Data" Storm.

VISUALIZATION: Tag cloud

Page 13: BIMCV: The Perfect "Big Data" Storm.

VISUALIZATION: Clustergram

Page 14: BIMCV: The Perfect "Big Data" Storm.

VISUALIZATION: History flow

Page 15: BIMCV: The Perfect "Big Data" Storm.

VISUALIZATION: Spatial information flow

Page 16: BIMCV: The Perfect "Big Data" Storm.
Page 17: BIMCV: The Perfect "Big Data" Storm.

Strategic Vision of Big Data in EU

Page 18: BIMCV: The Perfect "Big Data" Storm.

How Is the Europe Union Responding? In Big Data

Page 19: BIMCV: The Perfect "Big Data" Storm.

Panel: Personalized Medicine in the Era of Big Data

EHTEL Symposium

Tapani Piha

• Head of Unit for eHealth and Technology Assessment

European Commission

DG Health and Consumers

Health Systems and Products

Page 20: BIMCV: The Perfect "Big Data" Storm.

How does Big Data link to the Personalized Medicine?

•Big Data refers to a collection of data sets so large and complex, it’s impossible to process them with the usual databases and tools

•The data is gathered (most of the time) by people just living their lives (e.g. using mobile phones, the internet, driving cars, paying with banking cards)

•Big data is used in the private sector (e.g. Google), and in the public sector (e.g. NSA)

Page 21: BIMCV: The Perfect "Big Data" Storm.

Big Data use in public health & health care?

•Research: "In the last five years, more scientific data has been generated than in the entire history of mankind”1

•Health care: more evidence about personalized treatment, better selection of right provider, better equipped health care providers (e.g. IBM's Watson)

•Public health: better personalized life-style info for citizens, earlier detection of epidemics, more and quicker access to epidemiological information

12012 Winston Hide, The Promise of Big Data, Harvard Public Health

Page 22: BIMCV: The Perfect "Big Data" Storm.

Commission action on Big Data

•BIG-project: multi-sectorial initiative started in 2011 to promote adoption of earlier waves of big data technology and contribute to EU competitiveness;

•Green paper on mHealth: to assess market and further clarify what is needed in the legal framework concerning mHealth

•Study in health program: to assess the usages and adoption of big data programs for (public) health systems within the EU.

Page 23: BIMCV: The Perfect "Big Data" Storm.

Strategic Vision of Big Data in US

Page 24: BIMCV: The Perfect "Big Data" Storm.
Page 25: BIMCV: The Perfect "Big Data" Storm.

How Is U.S. Responding? National Institute of Standards an

Technology (NIST)

NIST is an agency of the U.S. Department of Commerce.

To search federal science and technology web sites, including online databases see: science.org

NIST program questions:Public Inquiries Unit: (301) 975-NIST (6478), Federal Relay Service (800) 877-8339 (TTY). NIST, 100 Bureau Drive, Stop 1070, Gaithersburg, MD 20899-1070Technical website questions: [email protected]

Page 26: BIMCV: The Perfect "Big Data" Storm.

NIST Big Data Public Working Group

Big Data PWG Overview Presentation

September 30, 2013

Wo Chang, NIST

Robert Marcus, ET-Strategies

Chaitanya Baru, UC San Diego

Page 27: BIMCV: The Perfect "Big Data" Storm.

NBD-PWG Overview9/30/13

Agenda

• Why Big Data? Why NIST?

• NBD-PWG Charter

• Overall Workplan

• Subgroups Charter and Deliverables

– Use Case and Requirements SG

– Definitions and Taxonomies SG

– Reference Architecture SG

– Security and Privacy SG

– Technology Roadmap SG

• Next Steps

28

Page 28: BIMCV: The Perfect "Big Data" Storm.

NBD-PWG Overview9/30/13

Why Big Data? Why NIST?

• Why Big Data? There is a broad agreement among commercial, academic, and government

leaders about the remarkable potential of “Big Data” to spark innovation, fuel commerce,

and drive progress.

• Why NIST? (a) Recommendation from January 15 -- 17, 2013 Cloud/Big Data Forum and (b)

A lack of consensus on some important, fundamental questions is confusing potential users

and holding back progress. Questions such as:

– What are the attributes that define Big Data solutions?

– How is Big Data different from the traditional data environments and related

applications that we have encountered thus far?

– What are the essential characteristics of Big Data environments?

– How do these environments integrate with currently deployed architectures?

– What are the central scientific, technological, and standardization challenges that

need to be addressed to accelerate the deployment of robust Big Data solutions?

NBD-PWG is being launched to address these questions and is charged to develop

consensus definitions, taxonomies, secure reference architecture, and technology roadmap

for Big Data that can be embraced by all sectors.

29

Page 29: BIMCV: The Perfect "Big Data" Storm.

NBD-PWG Overview9/30/13

NBD-PWG Deliverables

Working Drafts version 1.0 for

1. Big Data Definitions

2. Big Data Taxonomies

3. Big Data Requirements

4. Big Data Security and Privacy Requirements

5. Big Data Architectures White Paper Survey

6. Big Data Reference Architectures

7. Big Data Security and Privacy Reference Architectures

8. Big Data Technology Roadmap

30

Page 30: BIMCV: The Perfect "Big Data" Storm.

NBD-PWG Overview9/30/13

NBD-PWG Workplan

31

Page 31: BIMCV: The Perfect "Big Data" Storm.

Big Data Ecosystem in One Sentence

• Use Clouds running Data Analytics Collaboratively processing Big Data to solve problems in X-Informatics ( or e-X)

• X = Astronomy, Biology, Biomedicine, Business, Chemistry, Climate, Crisis, Earth Science, Energy, Environment, Finance, Health, Intelligence, Lifestyle, Marketing, Medicine, Pathology, Policy, Radar, Security, Sensor, Social, Sustainability, Wealth and Wellness with more fields (physics) defined implicitly

• Spans Industry and Science (research)

• Education: Data Science see recent New York Times articles

• http://datascience101.wordpress.com/2013/04/13/new-york-times-data-science-articles/

32

Page 32: BIMCV: The Perfect "Big Data" Storm.

Social Informatics

Visual&DecisionInformatics

33

Page 33: BIMCV: The Perfect "Big Data" Storm.

Big Data Definition

• More consensus on Data Science definition than that of Big Data

• Big Data refers to digital data volume, velocity and/or variety

that:

– Enable novel approaches to frontier questions previously inaccessible or

impractical using current or conventional methods; and/or

– Exceed the storage capacity or analysis capability of current or

conventional methods and systems; and

– Differentiates by storing and analyzing population data and not sample

sizes.

– Needs management requiring scalability across coupled horizontal

resources

34

Page 34: BIMCV: The Perfect "Big Data" Storm.

Vendor-neutral and Technology-agnostic Proposals

Data Processing FlowM0039

Data Transformation FlowM0017

IT StackM0047

35

Page 35: BIMCV: The Perfect "Big Data" Storm.

Data Processing FlowM0039

Data Transformation FlowM0017

IT StackM0047

36

Vendor-neutral and Technology-agnostic Proposals

Page 36: BIMCV: The Perfect "Big Data" Storm.

Data Processing FlowM0039

IT StackM0047

Data Transformation FlowM0017

37

Vendor-neutral and Technology-agnostic Proposals

Page 37: BIMCV: The Perfect "Big Data" Storm.

Vendor-neutral and Technology-agnostic

Proposals

Data Transformation FlowM0017

IT StackM0047

Data Processing FlowM0039

38

Page 38: BIMCV: The Perfect "Big Data" Storm.

Electronic Medical Record (EMR) Data I

• Application: Large national initiatives around health data are emerging, and

include developing a digital learning health care system to support

increasingly evidence-based clinical decisions with timely accurate and up-

to-date patient-centered clinical information; using electronic observational

clinical data to efficiently and rapidly translate scientific discoveries into

effective clinical treatments; and electronically sharing integrated health

data to improve healthcare process efficiency and outcomes. These key

initiatives all rely on high-quality, large-scale, standardized and aggregate

health data. One needs advanced methods for normalizing patient,

provider, facility and clinical concept identification within and among

separate health care organizations to enhance models for defining and

extracting clinical phenotypes from non-standard discrete and free-text

clinical data using feature selection, information retrieval and machine

learning decision-models. One must leverage clinical phenotype data to

support cohort selection, clinical outcomes research, and clinical decision

support.

40

PP, Fusion, S/Q, Index Parallelism over EMR (a set per person), viewersStreaming

Page 39: BIMCV: The Perfect "Big Data" Storm.

Electronic Medical Record (EMR) Data II

• Current Approach: Clinical data from more than 1,100 discrete logical,

operational healthcare sources in the Indiana Network for Patient Care

(INPC) the nation's largest and longest-running health information

exchange. This describes more than 12 million patients, more than 4

billion discrete clinical observations. > 20 TB raw data. Between

500,000 and 1.5 million new real-time clinical transactions added per

day.

• Futures: Teradata, PostgreSQL and MongoDB supporting information

retrieval methods to identify relevant clinical features (tf-idf, latent

semantic analysis, mutual information). Natural Language Processing

techniques to extract relevant clinical features. Validated features will

be used to parameterize clinical phenotype decision models based on

maximum likelihood estimators and Bayesian networks. Decision

models will be used to identify a variety of clinical phenotypes such as

diabetes, congestive heart failure, and pancreatic cancer.

41

Page 40: BIMCV: The Perfect "Big Data" Storm.

Pathology Imaging/ Digital Pathology I• Application: Digital pathology imaging is an emerging field where examination of high

resolution images of tissue specimens enables novel and more effective ways for

disease diagnosis. Pathology image analysis segments massive (millions per image)

spatial objects such as nuclei and blood vessels, represented with their boundaries,

along with many extracted image features from these objects. The derived information

is used for many complex queries and analytics to support biomedical research and

clinical diagnosis.

42

MR, MRIter, PP, Classification Parallelism over ImagesStreaming

Page 41: BIMCV: The Perfect "Big Data" Storm.

Pathology Imaging/ Digital Pathology II• Current Approach: 1GB raw image data + 1.5GB analytical results per 2D image. MPI for

image analysis; MapReduce + Hive with spatial extension on supercomputers and

clouds. GPU’s used effectively. Figure 3 of section 2.12 shows the architecture of

Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial

analytics for analytical pathology imaging.

43

• Futures: Recently, 3D pathology imaging

is made possible through 3D laser

technologies or serially sectioning

hundreds of tissue sections onto slides

and scanning them into digital images.

Segmenting 3D microanatomic objects

from registered serial images could

produce tens of millions of 3D objects

from a single image. This provides a

deep “map” of human tissues for next

generation diagnosis. 1TB raw image

data + 1TB analytical results per 3D

image and 1PB data per moderated

hospital per year.Architecture of Hadoop-GIS, a spatial data warehousing system over MapReduce to support spatial analytics for analytical pathology imaging

Page 42: BIMCV: The Perfect "Big Data" Storm.

18: Computational Bioimaging

• Application: Data delivered from bioimaging is increasingly automated, higher

resolution, and multi-modal. This has created a data analysis bottleneck that, if

resolved, can advance the biosciences discovery through Big Data techniques.

• Current Approach: The current piecemeal analysis approach does not scale to

situation where a single scan on emerging machines is 32TB and medical

diagnostic imaging is annually around 70 PB even excluding cardiology. One

needs a web-based one-stop-shop for high performance, high throughput

image processing for producers and consumers of models built on bio-imaging

data.

• Futures: Goal is to solve that bottleneck with extreme scale computing with

community-focused science gateways to support the application of massive

data analysis toward massive imaging data sets. Workflow components include

data acquisition, storage, enhancement, minimizing noise, segmentation of

regions of interest, crowd-based selection and extraction of features, and

object classification, and organization, and search. Use ImageJ, OMERO,

VolRover, advanced segmentation and feature detection software.

44

MR, MRIter?, PP, Classification Parallelism over ImagesStreaming

Page 43: BIMCV: The Perfect "Big Data" Storm.

22: Statistical Relational Artificial Intelligence for Health Care

• Application: The goal of the project is to analyze large, multi-modal medical data

including different data types such as imaging, EHR, genetic and natural language. This

approach employs the relational probabilistic models that have the capability of

handling rich relational data and modeling uncertainty using probability theory. The

software learns models from multiple data types and can possibly integrate the

information and reason about complex queries. Users can provide a set of descriptions

– say for instance, MRI images and demographic data about a particular subject. They

can then query for the onset of a particular disease (say Alzheimer’s) and the system

will then provide a probability distribution over the possible occurrence of this disease.

• Current Approach: A single server can handle a test cohort of a few hundred patients

with associated data of 100’s of GB.

• Futures: A cohort of millions of patient can involve petabyte datasets. Issues include

availability of too much data (as images, genetic sequences etc) that complicate

analysis. A major challenge lies in aligning the data and merging from multiple sources

in a form that can be made useful for a combined analysis. Another issue is that

sometimes, large amount of data is available about a single subject but the number of

subjects themselves is not very high (i.e., data imbalance). This can result in learning

algorithms picking up random correlations between the multiple data types as

important features in analysis.45MRIter, EGO Parallelism over People and their EMRStreaming

Page 45: BIMCV: The Perfect "Big Data" Storm.

V-OLUME V-ARIETY V-ELOCITY V-ALUE

El paradigma V4 en Big Data Medicina

Page 46: BIMCV: The Perfect "Big Data" Storm.

Big Data en Neuroimagen

Page 47: BIMCV: The Perfect "Big Data" Storm.
Page 48: BIMCV: The Perfect "Big Data" Storm.

human neuroimaging is now, officially, a “big data” science

• Among the examples of “big data” featured at the meeting was – no surprise - human neuroimaging

• The Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative

• Initiatives surrounding large-scale brain mapping are also underway in Europe http://www.humanbrainproject.eu

• Organization for Human Brain Mapping (OHBM; http://www.humanbrainmapping.org)

Page 49: BIMCV: The Perfect "Big Data" Storm.

How Big is “Big”?

• While size is a relative term when it comes to data, medical imaging applied to the brain comes in a variety of forms which each generating differing types and amounts of information about neural structure and/or function.

• NeuroImage, indicates that since 1995 the amount of data collected has doubled approximately every 26 months. At this rate, by 2015 the amount of acquired neuroimaging data alone, discounting header information and before more files are generated during data processing and statistical analysis, may exceed an average of 20GB per published research study

Page 50: BIMCV: The Perfect "Big Data" Storm.

Growth of Neuroimaging Study Size

0

5000

10000

15000

20000

1990 1995 2000 2005 2010 2015 2020

Me

gaB

yte

s

Year

Expected

Observed

Predicted

Van Horn and Toga (in press) Brain Imaging and Behavior

Page 51: BIMCV: The Perfect "Big Data" Storm.

Kryder’s law: Exponential Growth of Data

VOLUME OF DATA

MB = MEGABYTE = 106, GB = GIGABYTE = 109

TB = TERABYTE = 1012, PB = PETABYTE = 1015

COMPUTE

POWER

CPU TRANSISTOR

COUNTS

MOORE’S LAW

YEARS

SINGLE CRYO BRAIN VOLUME

1600 CM2

NEUROIMAGING

(ANNUALLY)

GENOMICS

(BP/YR)

Voxel Resolution Gray Scale Color 200 GB 10 MB 1x105 1985-1989

Size Count 8bits 16bits 24bits 1 TB 100 MB 1x106 1990-1994

1cm 12x15x9 1620 3000 4860 50 TB 10 GB 5x106 1995-1999

1mm120x

150x90

1.62

MB3.24 MB 4.86 MB 250 TB 1TB 1x107 2000-2004

100 µm1200x

1500x900

1.62

GB3.24 GB 4.86 GB 1 PB 30TB 8x106 2005-2009

10 µm

12000x

15000x

9000

1.62

TB3.24 TB 4.86 TB 5 PB 1 PB 1x109 2010-2014

1 µm

120000x

150000x

90000

1.62

PB3.24 PB 4.86 PB 10+ PB 20+ PB 1x1011 2015-2019

(estimated)

Page 52: BIMCV: The Perfect "Big Data" Storm.

Big Neuroimaging + Big Genetics =REALLY Big Data

• With the ability to obtain genome-wide sets of single nucleotide polymorphism (SNP) information becoming routine and the costs of full genomic sequencing rapidly becoming affordable.

• Next Generation Sequencing (NGS) methods, for major brain imaging studies such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (Weiner, Veitch et al. 2012), with its initially available sample of 832 subjects.

• As the bond between neuroimaging and genomics grows tighter, with both areas growing at incredible rates, disk storage, unique data compression techniques

Page 53: BIMCV: The Perfect "Big Data" Storm.

Multisite Consortia andData Sharing

• Examples of multisite neuroimaging efforts can be found in the ubiquitous application of neuroimaging in health but also in devestating illnesses such as:

• Parkinson’s (Evangelou, Maraganore et al. 2009)

• psychiatric disorders (Schumann, Loth et al. 2010)

• the mapping of human brain connectivity (Toga, Clark et al. 2012

• databases of aging and aging-related diseases, largescaleAutism Research (NDAR; Hall,Huerta et al. 2012) and the Federal Interacgency Traumatic Brain Injury Research (FITBIR; Bushnik and Gordon 2012)

Page 54: BIMCV: The Perfect "Big Data" Storm.

Multisite Consortia and Data Sharing

• The various “grass roots” collections of resting-state fMRI data maintained as part of the

“1000 Functional Connectomes” project

• http://fcon_1000.projects.nitrc.org/

(see Biswal, Mennes et al. 2010)

• Task-based OpenfMRI http://www.openfmri.org(Poldrack, Barch et al. 2013) are other notable examples.

Page 55: BIMCV: The Perfect "Big Data" Storm.
Page 56: BIMCV: The Perfect "Big Data" Storm.

The Role of Cyberinfrastructure

• Individual desktop computers are now no longer suitable for analyzing potentially petabytesworth of brain and genomics data at a time.

• While the National Science Foundation (NSF) has made major investments in the computer architecture needed for physics, weather, and geological data.

• Eg. XSEDE, https://www.xsede.org/ , and Open Science Grid, https://www.opensciencegrid.org

Page 57: BIMCV: The Perfect "Big Data" Storm.

The Role of Cyberinfrastructure

• The Neuroimaging Informatics Tools and Resources Clearinghouse

(NITRC; http://www.nitrc.org )

• The International Neuroinformatics Coordinating Facility (INCF; http://incf.org )

Have begun to deploy local clusters with Amazon EC2 server technology toward this goal but a larger effort will be required involving dedicated processing centers or distributed grids of linked compute centers.

Page 58: BIMCV: The Perfect "Big Data" Storm.

Many 1,000’s of Software Tools

• Biomedical Imaging– There are 100’s of different types of image

processing algorithms and filters

– For each type of process there may be dozens of

concrete software products (instance implementations)

• (Example) Neuroimaging– NITRC lists > 500 openly shared software tools

– For each openly shared tool there may be dozens of

proprietary or less commonly used analogues

• Acquisition, processing, storage/DB, service, migration, mining, analysis, visualization, annotation, … “(data-driven) process understanding”

Page 59: BIMCV: The Perfect "Big Data" Storm.

Millions of Dispersed Hardware Devices

• Cisco: "By the end of 2012, the number of mobile-connected devices will exceed the number of people on Earth”

• There will be over 10 billion mobile-connected devices in 2016; i.e., there will be 1.3 mobile devices per capita– These include phones, tablets, laptops, handheld gaming consoles, e-readers,

in-car entertainment systems, digital cameras, and “machine-to-machine modules”

• DBs, Clients, Servers, Compute-Nodes, Web-Services, Interfaces, …• Solution …

Dinov et al., BMC 2011

Page 60: BIMCV: The Perfect "Big Data" Storm.

Image spatial

alignment

Slicetiming

adjustment

Statisticalmodeling(e.g. GLM)

Functional –structural

co-registration

Raw fMRI time series

High-resolutionanatomical image

Standardizedbrain atlas template

Imagesmoothing

Gaussianspatial filtering

Experimentaldesign matrix

Study Meta DataScanner protocolsSubject demographicsStimulus timingetc.

Spatialnormalizationto atlas space

Statistical results maps

Table ofstatistically significant voxels in atlasspace coordinates

Graphicaloverlays

Van Horn et al., Nature Neuro, 2004

Page 61: BIMCV: The Perfect "Big Data" Storm.

Pipeline Version 5.9.1 FeaturesGraphical Programming Environment

11/17/2014 64

Page 62: BIMCV: The Perfect "Big Data" Storm.

Perfect Neuroimaging-Computation Storm?

• Single Subject Studies (N=1)– Genetics:

• Depending on Coverage(X)• Whole Genome Seq Data > 320GB (>80X)• Require 2+ TB RAM, and 100+ hrs CPU

– Imaging: • Depending on protocols• 40-512 gradient directions Diffusion imaging data• Raw (multimodal) Neuroimaging Data > 10 GB• Derived Data > 100 GB• Require 100GB RAM and 70+ hrs CPU

• Large Subject Studies– Cohort studies (N>10, Typically N~100’s)– Multi-Institutional Population-wide Studies (N>1,000)– Longitudinal (neuroimaging) studies …

Page 63: BIMCV: The Perfect "Big Data" Storm.

From Biomedical Challenges to Modeling, Computation, Tools and Curricular

Training • Quantitative Volumetric and Surface based Stats Analyses

– Interactome: Challenge↔Models↔Data Analysis↔Computation↔Education

– Statistics Online Computational Resource Che, et al., JSS (2009) No effectMarginalSignificant

Page 64: BIMCV: The Perfect "Big Data" Storm.

Grid & Cloud Computing

• UCLA Grids

Cerebro Medulla 1,200 cores

1.4TB RAM

12,000 jobs/day

700 users

• Amazon Cloud– EC2 (Elastic Cloud Computing)

– S3 (Simple Storage Service)

• UC Grid

• Globus GridFTP

• INI Cluster @ USC– 3328 cores, 128GB RAM per 16 cores, 26tb aggregate

memory space. Connectivity is 5Gbit per 16 cores, roughly 4terabit aggregate on comp and another 4.3Tbit on the storage. 2.43PB of online storage with over 50TB of SSD accelerating it currently.

4,300 cores

9.6 TB RAM

(new)

Page 65: BIMCV: The Perfect "Big Data" Storm.

Neuroimaging Applications: 56-ROI Global Shape Analysis (NC vs. IBS/Pain) Group

EffectsData Workflow Protocol Results

Structural T1 dataNC IBS

221 107

Mean-Curvature between-groupdifferences in:

L_cuneus

R_angular_gyrus

Left View

Right View

Page 66: BIMCV: The Perfect "Big Data" Storm.

LeftAnteriorInsula

Neuroimaging Applications: Stat Mapping of Cortical GM Thickness (Group Effects)

DataWorkflow Protocol

Structural T1 dataCortical Models

1.0

0.0

P-v

alu

e

Results

Page 67: BIMCV: The Perfect "Big Data" Storm.

Pipeline User Community

Page 68: BIMCV: The Perfect "Big Data" Storm.

Population Imaging

Page 69: BIMCV: The Perfect "Big Data" Storm.

Big Data y el sector de la Salud en Imagen Poblacional

• Según Bonnies Feldman “el potencial de Big Data en medicinareside en la posibilidad de combinar los datos tradicionales conotras nuevas formas de datos, tanto a nivel individual comoPoblacional”

• El potencial del Big Data indica que se pueden producir ahorros enel sector sanitario a través de varias vías:– Transformación de datos en información.

– Apoyo al autocuidado de las personas.

– Aumento del conocimiento.

– Concienciación del estado de salud.

• El Big Data es una metodología de acceso abierto para integrardiferentes tipos de datos en imagen poblacional, cuantificación deimagen y extracción de características.

Page 70: BIMCV: The Perfect "Big Data" Storm.

Tipos de Estudios

• Individual

• Longitudinal

• Transversal

0 1 2 M

Page 71: BIMCV: The Perfect "Big Data" Storm.

Estudios Poblacionales

• Estudios Poblacionales– Si no se forman grupos en la población, se calcula la media

del parámetro o parámetros.

– Si se forman grupos (control y Patológicos) se debe realizar

un contraste de hipótesis.

• Modelado Poblacional– Modelar la degeneración volumétrica de sustancia gris y

sustancia blanca

– Establecer parámetros de degeneración

– Contrastar el estado de un individuo con respecto a dicho

modelo.

Page 72: BIMCV: The Perfect "Big Data" Storm.

Aplicación en Alzheimer

Page 73: BIMCV: The Perfect "Big Data" Storm.

Aplicación a Casos Reales

Resultados de parámetros globales

Page 74: BIMCV: The Perfect "Big Data" Storm.

Aplicación a Casos Reales

Resultados de grosor y volumen por estructura, junto con los valores de referencia

Page 75: BIMCV: The Perfect "Big Data" Storm.

Aplicación a Casos Reales

• Representación de la diferencia del volumen en comparación con la población

Page 76: BIMCV: The Perfect "Big Data" Storm.

¿Porqué no podemos combinarBELLEZA Y CIENCIA?

Page 77: BIMCV: The Perfect "Big Data" Storm.

Objetivos BIMCV

• Desarrollar e implementar estrategias para

prevenir o tratar efectivamente las

enfermedades mediante una infraestructura de

investigación en imagen asociada a grandes

estudios poblacionales de imagen.

– Concepto de “Population Imaging”.

• Proporcionar datos,

herramientas y recursos de

proceso para realizar estudios

avanzados en imagen.

Page 78: BIMCV: The Perfect "Big Data" Storm.

volBrain system

Page 79: BIMCV: The Perfect "Big Data" Storm.

volBrain pipeline

Page 80: BIMCV: The Perfect "Big Data" Storm.

Segmentación no supervisada de Glioblastomas

Page 81: BIMCV: The Perfect "Big Data" Storm.

GIBI230

Luis Martí-BonmatíFernando ApariciAlexandre Pérez

Roberto Sanz

Carlos InfantesJose María Salinas

Cayetano Hernández

NEuro-Bioimaging VLCMariam de la Iglesia

IBIMEJuan M García-Gómez

Elies FusterJavier Juan-Albarracín

Page 82: BIMCV: The Perfect "Big Data" Storm.

BIMCV

Page 83: BIMCV: The Perfect "Big Data" Storm.

Nodo Valenciano

Euro-BioImagingInfraestructura Europea para la Investigación en

Tecnologías de Imagen Biomédica e Imagen Biológica.

Un proyecto sobre la hoja de ruta de las ESFRI en infraestructuras de investigación

www.eurobioimaging.eu

Page 84: BIMCV: The Perfect "Big Data" Storm.

• Established in 2006

• Staff: 4.5, incl. 3 Project Managers, 1 assistant

• Provision of services to Network Members + EIBIR bodies

• Monitoring European Affairs + research funding opportunities

• Project management and coordination

• Information activities and media work

• Promotion of Network Membership

• Website and data base updates

• Congress activities

• Scientific Advisory Board

EIBIR key facts and daily work

In the service of research, EIBIR offers to its Network Members:

- Multidisciplinary networking- Project Management- Research communication- Research Training- Meeting organisation

EIBIR Office

Page 85: BIMCV: The Perfect "Big Data" Storm.

88

Cronología & Financiación

• Evaluación & selección de nodos.

• Construccion de los nodes.

Financiado por los Estados Miembros (¿MINECO?)

Fase de Construcción

2013 - 2017

• Framework

• Definición de los criterios de elegibilidad para los nodos

• Llamada a los Nodos, Abierta.

Financiado por CE

Fase Preparatoria

2010 - 2013

• Acceso y formación• Tecnología y evaluación

para mejorar el servicio

Fase Operacional

………

Financiado por los Estados Miembros & EC

2017 - ….

Page 86: BIMCV: The Perfect "Big Data" Storm.

MULTIMODAL TECHNOLOGY

NODE

Imaging Infrastructure with open user access

FLA

GSH

IP N

OD

E

FLA

GSH

IP N

OD

E

FLA

GSH

IP N

OD

E

FLA

GSH

IP N

OD

E

USE

R T

RA

ININ

G STAFF TR

AIN

ING

Web-access portal

European life scientists as users

Data storage and analysis infrastructure

User returns with results for publication

HU

BN

OD

ES

MULTIMODAL TECHNOLOGY

NODE

Page 87: BIMCV: The Perfect "Big Data" Storm.
Page 88: BIMCV: The Perfect "Big Data" Storm.

1st Open Call

Euro-BioImaging Nodes – Expression of Interest

The 1st Open Call: 1 February – 30 April 2013

• Multi-Modal Molecular Imaging• Phase contrast Imaging• High-field MRI• MR-PET • Population Imaging• Data Infrastructure: Challenges Framework

• The biological imaging community will call for EoIs in 6 technologies

Page 89: BIMCV: The Perfect "Big Data" Storm.

Nodo Valenciano, BIMCV

Page 90: BIMCV: The Perfect "Big Data" Storm.

Resultados 1ª Convocatoria

Biological Imaging

Biomedical Imaging

9 NODOS ESPAÑOLES

– 18 Instituciones –

Page 91: BIMCV: The Perfect "Big Data" Storm.

MEDICAL IMAGING DATA BANK (BIMCV)Expresion of Interest: Population Imaging

BIG DATA DIASEASE SIGNATURESSINGLE TECHNOLOGY FLAGSHIPS

CONSORTIUM

Page 92: BIMCV: The Perfect "Big Data" Storm.

Evaluation summary and Final ranking

• The node develops and provides access to a large database of

imaging data and the associated clinical data records.

• Big Data repository from hospitals in the Valencia region (5 million

inhabitants living over an area of 23.255 Km2. average number of

5.3 million clinical cases per year, from 210 different imaging

modalities).

• The access to such data and tools will be an efficient way of

advancing population imaging studies and research.

• The node has ability to incorporate data from other facilities

Page 93: BIMCV: The Perfect "Big Data" Storm.

Services offered by the node

• BIMCV facility provides a multi-level and multi-ology storage

service (Vendor Neutral Archive).

• CEIB-CS node integrates access to high-performance

computational services from local and European

infrastructures (Principe Felipe Research Centre & UPV-I3M

Infrastructure).

• Open access methodology to integrate different data types for

population imaging, quantitative resources and feature

extraction.

• Comprehensive user training

Page 94: BIMCV: The Perfect "Big Data" Storm.

Other facilities

Single Technology Flagship Node – Population Imaging: Valencia

MEDICAL IMAGING DATA BANK (BIMCV)

Services offered by the node:

• BIMCV facility provides a multi-level and multi-ologystorage service (Vendor Neutral Archive).

• CEIB-AVS node integrates access to high-performance computational services from local and European infrastructures (Principe Felipe Research Centre & UPV-I3M Infrastructure).

• Open access methodology to integrate different data types for population imaging, quantitative resources and feature extraction.

• Comprehensive user training.

BIG DATA DIASEASE SIGNATURES

Evaluation summary and Final ranking:

• Requires minor improvements (training plan, actually corrected).

• The node develops and provides access to a large database of imaging data and the associated clinical data records.

• Big Data repository from hospitals in the Valencia region (5 million inhabitants living over an area of 23.255 Km2. average number of 5.3 million clinical cases per year, from 210 different imaging modalities).

• The access to such data and tools will be an efficient way of advancing population imaging studies and research.

• The node has ability to incorporate data from other facilities.

Page 95: BIMCV: The Perfect "Big Data" Storm.

Nodo Valenciano, BIMCV

Centro de Excelencia en Imagen Biomédica de la Conselleria de Sanitat

Sede CEIB clínica Sede CEIB computo

Page 96: BIMCV: The Perfect "Big Data" Storm.

Servicios

Page 97: BIMCV: The Perfect "Big Data" Storm.

With an Architecture Well Define

Page 98: BIMCV: The Perfect "Big Data" Storm.

Other facilities

Page 99: BIMCV: The Perfect "Big Data" Storm.

Neuroimaging. The landscapes' of the mind

Page 100: BIMCV: The Perfect "Big Data" Storm.

Human Neuroimaging as a “Big Data” Science

http://prezi.com/sseievn7ujcf/?utm_campaign=share&utm_medium=copy

The mind landscapes

Page 101: BIMCV: The Perfect "Big Data" Storm.

Estudio de la estructuraMorfometría

Page 102: BIMCV: The Perfect "Big Data" Storm.

Estudio de la estructuraTractografía

Page 103: BIMCV: The Perfect "Big Data" Storm.
Page 104: BIMCV: The Perfect "Big Data" Storm.

Estudio de la Función

Page 105: BIMCV: The Perfect "Big Data" Storm.

Estudio de la Función

Page 106: BIMCV: The Perfect "Big Data" Storm.

RESTING STATE

Page 107: BIMCV: The Perfect "Big Data" Storm.
Page 108: BIMCV: The Perfect "Big Data" Storm.

Conectomica

Page 109: BIMCV: The Perfect "Big Data" Storm.

Conectomica

Page 110: BIMCV: The Perfect "Big Data" Storm.

Acknowledgment

Page 111: BIMCV: The Perfect "Big Data" Storm.
Page 112: BIMCV: The Perfect "Big Data" Storm.
Page 113: BIMCV: The Perfect "Big Data" Storm.
Page 114: BIMCV: The Perfect "Big Data" Storm.
Page 115: BIMCV: The Perfect "Big Data" Storm.
Page 116: BIMCV: The Perfect "Big Data" Storm.
Page 117: BIMCV: The Perfect "Big Data" Storm.

Relevant facts

Page 118: BIMCV: The Perfect "Big Data" Storm.
Page 119: BIMCV: The Perfect "Big Data" Storm.

10 K Structural Modeling in Neuroimage of Valencia Region

• Dos becas de la Subdirección General de Sistemas parala Salud de la CS. Ingenieros Informáticos o Ingenierosde Telecomunicaciones (DOGV 9-07-2014).• Se van a medir las estructuras principales del cerebro.• En colaboración con LABMAN• En colaboración con Brain Dynamics• La universidad del Sur de California (Jack Van Horn)• Posiblemente con IBIME (volBrain system)

Page 120: BIMCV: The Perfect "Big Data" Storm.
Page 121: BIMCV: The Perfect "Big Data" Storm.

Prototipo de realidad Virtual AumentadaARiBraiN3T (Para android)