Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data...
Transcript of Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data...
![Page 1: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/1.jpg)
Big Data Challenges: Data Management, Analytics & Security
Ivo D. Dinov
Statistics Online Computational Resource University of Michigan
www.SOCR.umich.edu
![Page 2: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/2.jpg)
Big Data Challenges • Availability, Sharing, Aggregation and Services • Classical Data Science vs. Innovative Big Data Science
– Amateur Scientists vs. “Experts” – Data Scientists vs. Practitioners – Domain-specific vs. Trans-disciplinary knowledge
• Commercial vs. Open-source Resourceome • Rapid Big Data Evolution • Big Data IT proliferation • Big Data Security risks • Centralization won’t work in Big Data Space • Big Data is incredibly time, space, protocol, context dependent!
![Page 3: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/3.jpg)
Big Data Characteristics
* Mixture of quantitative & qualitative estimates Dinov et al. (2013)
![Page 4: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/4.jpg)
Availability, Sharing, Aggregation & Services • Cisco: "By the end of 2012, the number of mobile-connected
devices [exceeded] the number of people on Earth” • There will be over 10 billion mobile-connected devices in 2016;
i.e., there will be 1.3 mobile devices per capita
U.S. Bureau of Labor Statistics M
cKinsey Global Institute
Perc
ent G
row
th
Big Data Value Potential Index
Bubble Size ~ Relative size of GDP Industry Sector Computer & Electronic Products Information Services Manufacturing Admin, support & waste management Transportation & Warehousing Wholesale Trade Professional Services Healthcare Providers Real Estate and Rental Finance and Insurance Utilities Retail Trade Government Accomodation & Food Arts & Enterntainment Corporate Management Other Services Construction Education Services Natural Resources
![Page 5: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/5.jpg)
Amateur Scientists vs. “Experts”
• Democratization of Big Data Science • Doctorate studies/certification is not mandatory nor does it
guarantee appropriate Big Data expertise • Lower barriers of entry • Demand for constant “Continuing Education” and self-training • Dichotomy between theoretical and empirical sciences • Differences between fundamental knowledge and
experimental skills (big data properties closely approximate core scientific principles)
![Page 6: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/6.jpg)
Big Data Science
Medical Sciences Social Sciences Environmental Sciences ....
Math/Stats Physics Biology Chemistry ....
Engineering Computer Science Bioinformatics Biomath/Biostats ....
Domain-specific vs. Trans-disciplinary knowledge
![Page 7: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/7.jpg)
Commercial vs. Open-source Resourceome
• There is an explosion of open-data-science resources – www.data.gov – www.ncbi.nlm.nih.gov/gap
• Spawning of a number of industries and enterprises blending proprietary and open-source data, code, documentation, expert-support, infrastructure and services
• Big Data to Knowledge: www.BD2K.org • Google Cloud Platform (GCP) • Amazon Web Services (AWS)
![Page 8: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/8.jpg)
Commercial vs. Open-source Resourceome
![Page 9: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/9.jpg)
Rapid Big Data Evolution
• Millions of Grass-Roots initiatives addressing Big Data Challenges
• Big Data complexities require truly innovative, collaborative, trans-disciplinary solutions
• Increase of Data complexity – Sources – Heterogeneity – Datum-elements – Incongruent sampling
![Page 10: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/10.jpg)
Data Scientists vs. Practitioners
• Modelers, Engineers, (Applied) Users • No one user completely understands the entire pipeline of data
provenance, processing protocols, analytic strategies, or results interpretation
• Black-boxes …. – Accuracy – Privacy concerns – Consistency – Infrastructure
![Page 11: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/11.jpg)
Big Data Security Risks
• Big Data Fusion provides enormous opportunities … and presents significant challenges • Privacy, security and legal concerns, authenticity, accuracy,
consistency, reliability, availability • Healthcare
– The cloud services enable sharing big data – Significant security and privacy concerns exist, – Health Insurance Portability and Accountability Act (HIPAA) – EMR/EHR Federal, state and local regulations/policies (IRBMED)
• Genetics • Viral - Dual-use research of concern (DURC), 10.1126/science.1223995
– de novo synthesis of polio virus, the Australian mousepox experiment, the Penn State aerosolization study
![Page 12: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/12.jpg)
Kryder’s law: Exponential Growth of Data
Dinov, et al., 2013
Gryo_Byte
Cryo_Short
Cryo_Color0
2E+15
4E+15
6E+15
1 µm10 µm
100 µm1mm
1cm
Gryo_ByteCryo_ShortCryo_Color
Neuroimaging(GB)
Genomics_BP(GB)Moore’s Law (1000'sTrans/CPU) 0
5000000
10000000
15000000
1985-19891990-1994 1995-1999 2000-20042005-2009
2010-20142015-2019
(estimated)
Neuroimaging(GB)Genomics_BP(GB)Moore’s Law (1000'sTrans/CPU)
Increase of Imaging Resolution
Data volume Increases faster than computational power
![Page 13: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/13.jpg)
Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters
Goals predictive-power of combinations of biomarkers and imaging derivative measures to provide reliable predictors of conversion from MCI to Alzheimer’s disease
Data MCI converters to AD (24-month period) and stable non-converters; matched for age, gender, handedness, education level Imaging (sMRI), Behavioral, Clinical, Neuropsychiatric, Biological data
Approach Qualitative Exploratory Data Analysis and Quantitative Statistical Analysis (morphometric imaging correlates with clinical and genetics markers)
MCI = Mild Cognitive Impairment (prelude to dementia of Alzheimer’s type)
![Page 14: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/14.jpg)
![Page 15: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/15.jpg)
Subject
Demo-graphics
Gene-tics Clinical Neuroimaging …
Index
Age
Kg
Sex
APO
E A
1
APO
E A
2
NPI
SCO
RE
MM
SE
GD
TO
TAL
CDR
FAQ
TO
TAL
L Gy
rus R
ectu
s BL
L
Supe
rior
Occ
ipita
l Gyr
us B
L R
Fusif
orm
Gy
rus B
L
L Ca
udat
e BL
R Ca
udat
e BL
L Pu
tam
en B
L
R Pu
tam
en B
L
…
1 65 59 F 3 4 0 23 1 0.5 7 1695 3976 8363 1296 1992 1749 2776 …
2 73 93 M 3 3 7 19 1 1 8 1333 6016 13290 835 2137 2290 4327 …
... ... ... ... ... ... ... ... ... ... ... … … … … … … … …
N 64 63 F 3 3 3 29 6 0.5 2 2237 6887 16109 1223 2222 2525 4110 …
Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters
![Page 16: Big Data Challenges: Data Management, Analytics & Security - BigDataChallenges_AAAS_2014.pdfBig Data Challenges • Availability, Sharing , Aggregation and Services • Classical Data](https://reader034.fdocuments.net/reader034/viewer/2022051807/6005d7e09445fb23603ca264/html5/thumbnails/16.jpg)
Classification Results Using Baseline Data
True State (Dx at 24 month follow up)
Converter Stable Total Hierarchical Clustering
Prediction Ana (7 Regions)
Converter TP FP TP+FP Stable FN TN FN+TN Total TP+FN FP+TN N
Metric Value
Top 7 Regions Top 20 Regions Sensitivity 0.81 1.0 Specificity 0.61 0.87 Power to detect Converters 0.91 1.0
Accuracy 0.70 0.93
Alzheimer’s Case Study: Stable-MCI vs. MCI-Converters