Ensuring compliance of patient data with big data

30
April 10-12, Chicago, IL Ensuring Compliance of Patient Data with Big Data and BI Ayad Shammout & Denny Lee

Transcript of Ensuring compliance of patient data with big data

April 10-12, Chicago, IL

Ensuring Compliance of Patient Data with Big Data and BIAyad Shammout & Denny Lee

April 10-12, Chicago, IL

Please silence cell phones

3

Agenda

A Quick Big Data Primer

Healthcare and Big Data

Compliance and AuditingSQL Compliance Project

Compliance and Auditing with Big Data and BIBig Data: Unstructured Volumes of Data

Analytics: PowerPivot, Power View

4

What is Big Data?

VolumeExceeds physical limits of vertical scalability

VelocityDecision window small compared to data

change rate

VarietyMany different formats makes integration

expensive

VariabilityMany options or variable interpretations

confound analysis

5

10xincrease every five years

85% from new data types

Dataexplosion

VolumeVelocityVariety

Hadoop

Cloud

By 2015, organizations that

build a modern information

management system will

outperform their peers

financially by 20 percent.

– Gartner, Mark Beyer

“Information Management in the

21st Century”

7

Big Data Business Value

140,000-190,000

1.5 million

$300 billion

15 out of 17

€250 billion 50-60%

8

Data

9

Hadoop: The most visible face of Big Data

10

HDInsight: Visit HadoopOnAzure.com

10

Healthcare

and Big Data

12

Healthcare and IT

Often the laggard in technology

Yet application of IT to healthcare can radically change what we can do

Genomic Sequencing

Proteomic sequencing

Incidence Prediction

13

Healthcare Big Data Example Scenarios

Clinical Trial DeviationsOriginally Viagra was developed to lower blood pressure and treat Angina

Now its used to help newborn pulmonary hypertension and altitude sickness

Incidence PredictionMissed 4 or more visits, twice as likely to have an asthmatic incident

Particular Cardiac monitor sine wave points to highly likelihood of heart attack

CampaignsSocial media and advertising campaigns to understand user behavior and sentiment

Patient SatisfactionSocial media and advertising campaigns to understand user behavior and sentiment

14

BIDMC Auditing Scenario

Auditing is critical component HIPAA in ensuring patient privacy1 Billion rows+ of audit data

146 mission critical clinical applications

Comprehensive audits yield 300-500k transactions/day

HIPAA requires audit system with 20 years of data

Auditing ProjectAvailable to community as part of Compliance SDK

Updating for SQL Server 2012, HDInsight, Power View, and MobileBI*

Creating an enterprise tool for consolidated storage, reporting and alerting of all application audit data - that's cool!

John Halamka’s Cool Technology of the Week

(Wellsphere Top Health Blogger, Health Impact Award)

15

BIDMC Compliance Project

HDInsight

Windows

HDInsight

Azure

SQ

L Serv

er

2008/2

012

Audit LogsETL Logs to

HDFS

Use Excel 2013

PowerPivot and Power

View

SSAS (tabular)

16

Auditing Sensitive Information

16

Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.

Security InformationPolicy Information

Process Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG application

audit log data; potentially can use Management Performance DW framework.

Caregroup Environment

File Server

SQL Audit

Connect/Logic

SSIS

CG Application Data

Intersystems Cache

SQL2005

Oracle

SQL2008 All-Actions Audit Data

SQL 2008 / 2012 R2

SSRS 2008 /Power View

Policy Analysis

Policy Reports

Policy Best Practices

Security Analysis

Security Reports

Compliance Reports

Feedback Action LoopUpdate systems to keep them

compliant and secure

Audit Logs

17

Storage Infrastructure

Transfer files to ASV via AzCopy,

CloudExplorer, etc.

18

Storage Infrastructure

18

Hadoop on Azure

Compute Nodes (Medium VMs)

Azure Storage Vault (ASV)

Azure Blob Storage

Azure Flat Network Storage

19

Storage Infrastructure

19

Hadoop on Azure

Compute Nodes (Medium VMs)

Azure Storage Vault (ASV)

Azure Blob Storage

Azure Flat Network Storage

Stream data

To compute

Push data

Back to Storage

map sort shuffle reduce

http://dennyglee.com/2013/03/18/why-use-blob-storage-with-hdinsight-on-azure/

2020

SSIS Processing

2121

SSIS to SSASPartitionManagement

22

SSAS Tabularof HoAAudit Data

23

Hadoop / Auditing: File sizes

Currently testing gz vs. raw

E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)

20x smaller size, ~same query time

Approx same map / reduce task utilization

File Size is 250MB-1GB

SSIS package takes care of the size

Future testing: avro, protobuf

23

Query Duration (s)

select count(*) from sql_audit_asv_raw 56.066

select count(*) from sql_audit_asv_gz 58.994

24

Hadoop / Auditing: Formats

For ease of processing, replace carriage returns within embedded SQL

statements, e.g.

select col1, col2

from tableA

to

select col1, col2 from tableA

This allows you to create a Hive table using CR as row delimiter (i.e.

does not have things like SQL quoted identifiers)

24

25

SQOOP, HiveODBC,

Templeton, CSV, etc

BI Connectivity

27

Big Data … Excel-lerated!

2 Server, 3mo

110 GB

binary

files

SSIS extraction

1.2GB of text

120MB gz

Hadoop to

PowerPivot

6MB

28

PowerPivot workbook of HoA Audit data

29

Power View of HoA Audit Data

April 10-12, Chicago, IL

Thank you!Diamond Sponsor