Session 30 Powerful Ways to Use Hadoop in your Healthcare...
Transcript of Session 30 Powerful Ways to Use Hadoop in your Healthcare...
Session 30
Powerful Ways to Use Hadoop in your Healthcare Big Data Strategy
Bryan HintonSenior Vice President, Platform EngineeringHealth Catalyst
Sean StohlSenior Vice President, Product DevelopmentHealth Catalyst
Poll Question #1
3
What brought you here?
a) Everyone is talking about Big Data/Hadoop – What is it?b) Searching for uses cases – What is the value proposition?c) Need help implementing itd) Want to hear others’ experiencese) I got lost
Learning Objectives
Be able to explain
• What is Big Data and Hadoop
• Why do we need Big Data and Hadoop in Healthcare
• What are the challenges to adoption
• How do I get started
• See it in action
4
• Created by Doug Cutting and Mike Cafarella at Yahoo in 2005. • Hadoop named after Cutting’s son’s toy elephant.
• “The name my kid gave a stuffed yellow elephant. Short, relatively easy to spell and pronounce, meaningless, and not used elsewhere: those are my naming criteria. Kids are good at generating such. Googol is a kid’s term.” - Doug Cutting
• Open-sourced software framework that supports processing and storing of large data sets distributed across clusters of commodity hardware.
• Map Reduce - Parcels out work to various nodes within the cluster or map, and it organizes and reduces the results from each node into a cohesive answer to a query.
• HDFS – Hadoop Distributed File System. File System that provides the capability to distribute data across a cluster to take advantage of the parallel processing of Map Reduce.
History of Hadoop
Poll Question #2
11
How would you categorize your organization’s involvement with Hadoop?
1) Not interested2) Interested but no plans to implement3) Planning implementation4) Piloting Hadoop5) Heavily using Hadoop6) Unsure or not applicable
12
• Data Growth• Different Types of Workload
• Semi Structured• Archiving• Streaming• Machine Learning
Why Big Data and Hadoop in Healthcare
Just Beginning: Digitization of Health
13
“EMR data represents ~8% of the data we need for population health and precision medicine.” — Alberta Secondary Use Data Project
The Growing Ecosystem of Human Health Data
Healthcare Encounter
Data
7x24 Biometric
DataConsumer
Data
Genomic &
Familial Data
Social Data
Outcomes Data
14
• Structured• Data that can be stored relationally in RDBMS
• Semi Structured• Data that has some organizational properties but isn’t in a relational database format
• CSV, XML, X12 (835/837) , HL7, JSON
• Doctor Notes - Template Generated Sections
• Unstructured• E-mails, text messages, Word documents, videos, and pictures
• Doctor Notes – Free Form Sections
Types of Data
Poll Question #3
21
Which challenge has been or would be the greatest barrier for your organization to adopt Hadoop?
a) People with the right skill setsb) Funding hardware costsc) Defining the business valued) Security concernse) Unsure or not applicable
Administering Fewer experienced people Lack of best practices Myriad of tools Open Source yes – but lots of assembly required Security?
27
Meeting in the middle
33
RDBMS Vendors
• Oracle• SQL Server• Teradata• …
Hadoop Solutions
• Hortonworks• Cloudera• Mapr• Cloud• …
Convergence
Lessons Learned
38
1. Let use cases help drive the need to implementing Hadoop. (Be Pragmatic.)2. Think additive.3. Invest in people now.4. In general, the Cloud will give you the most flexibility in deploying Hadoop.
What You Learned…
40
Write down the key things you’ve learned related to each of the learning objectives
after attending this session