Ensuring compliance of patient data with big data
-
Upload
ayad-shammout -
Category
Healthcare
-
view
86 -
download
0
Transcript of Ensuring compliance of patient data with big data
April 10-12, Chicago, IL
Ensuring Compliance of Patient Data with Big Data and BIAyad Shammout & Denny Lee
3
Agenda
A Quick Big Data Primer
Healthcare and Big Data
Compliance and AuditingSQL Compliance Project
Compliance and Auditing with Big Data and BIBig Data: Unstructured Volumes of Data
Analytics: PowerPivot, Power View
4
What is Big Data?
VolumeExceeds physical limits of vertical scalability
VelocityDecision window small compared to data
change rate
VarietyMany different formats makes integration
expensive
VariabilityMany options or variable interpretations
confound analysis
5
10xincrease every five years
85% from new data types
Dataexplosion
VolumeVelocityVariety
Hadoop
Cloud
By 2015, organizations that
build a modern information
management system will
outperform their peers
financially by 20 percent.
– Gartner, Mark Beyer
“Information Management in the
21st Century”
12
Healthcare and IT
Often the laggard in technology
Yet application of IT to healthcare can radically change what we can do
Genomic Sequencing
Proteomic sequencing
Incidence Prediction
13
Healthcare Big Data Example Scenarios
Clinical Trial DeviationsOriginally Viagra was developed to lower blood pressure and treat Angina
Now its used to help newborn pulmonary hypertension and altitude sickness
Incidence PredictionMissed 4 or more visits, twice as likely to have an asthmatic incident
Particular Cardiac monitor sine wave points to highly likelihood of heart attack
CampaignsSocial media and advertising campaigns to understand user behavior and sentiment
Patient SatisfactionSocial media and advertising campaigns to understand user behavior and sentiment
14
BIDMC Auditing Scenario
Auditing is critical component HIPAA in ensuring patient privacy1 Billion rows+ of audit data
146 mission critical clinical applications
Comprehensive audits yield 300-500k transactions/day
HIPAA requires audit system with 20 years of data
Auditing ProjectAvailable to community as part of Compliance SDK
Updating for SQL Server 2012, HDInsight, Power View, and MobileBI*
Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool!
John Halamka’s Cool Technology of the Week
(Wellsphere Top Health Blogger, Health Impact Award)
15
BIDMC Compliance Project
HDInsight
Windows
HDInsight
Azure
SQ
L Serv
er
2008/2
012
Audit LogsETL Logs to
HDFS
Use Excel 2013
PowerPivot and Power
View
SSAS (tabular)
16
Auditing Sensitive Information
16
Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.
Security InformationPolicy Information
Process Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG application
audit log data; potentially can use Management Performance DW framework.
Caregroup Environment
File Server
SQL Audit
Connect/Logic
SSIS
CG Application Data
Intersystems Cache
SQL2005
Oracle
SQL2008 All-Actions Audit Data
SQL 2008 / 2012 R2
SSRS 2008 /Power View
Policy Analysis
Policy Reports
Policy Best Practices
Security Analysis
Security Reports
Compliance Reports
Feedback Action LoopUpdate systems to keep them
compliant and secure
18
Storage Infrastructure
18
Hadoop on Azure
Compute Nodes (Medium VMs)
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
19
Storage Infrastructure
19
Hadoop on Azure
Compute Nodes (Medium VMs)
Azure Storage Vault (ASV)
Azure Blob Storage
Azure Flat Network Storage
Stream data
To compute
Push data
Back to Storage
map sort shuffle reduce
http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/
23
Hadoop / Auditing: File sizes
Currently testing gz vs. raw
E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)
20x smaller size, ~same query time
Approx same map / reduce task utilization
File Size is 250MB-1GB
SSIS package takes care of the size
Future testing: avro, protobuf
23
Query Duration (s)
select count(*) from sql_audit_asv_raw 56.066
select count(*) from sql_audit_asv_gz 58.994
24
Hadoop / Auditing: Formats
For ease of processing, replace carriage returns within embedded SQL
statements, e.g.
select col1, col2
from tableA
to
select col1, col2 from tableA
This allows you to create a Hive table using CR as row delimiter (i.e.
does not have things like SQL quoted identifiers)
24
27
Big Data … Excel-lerated!
2 Server, 3mo
110 GB
binary
files
SSIS extraction
1.2GB of text
120MB gz
Hadoop to
PowerPivot
6MB