Ensuring Compliance of Patient Data with Big Data and BI · 2017-08-10 · Ensuring Compliance of...

31
Global Sponsor: Ensuring Compliance of Patient Data with Big Data and BI Denny Lee, Principal Program Manager at Microsoft Ayad Shammout, Principal Business Intelligence Consultant at BIDMC Big Data Innovations and Integration Track

Transcript of Ensuring Compliance of Patient Data with Big Data and BI · 2017-08-10 · Ensuring Compliance of...

Global Sponsor:

Ensuring Compliance of Patient Data with Big Data and BI

Denny Lee, Principal Program Manager at Microsoft Ayad Shammout, Principal Business Intelligence Consultant at BIDMC

Big Data Innovations and Integration Track

Agenda

A Quick Big Data Primer • Big Data on the Microsoft Platform by Andrew Brust • What is Big Data by Mark Whitehorn

Healthcare and Big Data Compliance and Auditing

• SQL Compliance Project

Compliance and Auditing with Big Data and BI • Big Data: Unstructured Volumes of Data • BI: PowerPivot, Power View

What is Big Data?

Volume Exceeds physical limits of vertical scalability

Velocity Decision window small compared to data change rate

Variety Many different formats makes integration expensive

Variability Many options or variable interpretations confound analysis

4

10x increase every five years

85% from new data types

Volume Velocity Variety

Hadoop

Cloud

By 2015, organizations that build a modern information management system will outperform their peers financially by 20 percent.

– Gartner, Mark Beyer “Information Management in the

21st Century”

Big Data Business Value

7

Data

Hadoop: The most visible face of Big Data

HDInsight: Visit HadoopOnAzure.com

9

Healthcare and Big Data

Healthcare and Big Data

• Often a laggard in technology • Yet, application of technology will be revolutionary to

understanding the human system • Genomic sequencing brings the promise of understanding

human biological systems • Proteomic sequencing brings the promise of building the

protein sequences to build customized drugs • Healthcare Incidence Prediction: Heart Attacks and Asthma

Healthcare Big Data Example Scenarios

• Clinical trials: not just examining existing drugs and efficacy, but also potential deviations • E.g. Originally Viagra was developed to lower blood pressure

and treat Angina; now it also helps with newborn pulmonary hypertension and altitude sickness

• Predicting healthcare incidences issues • Social media campaigns (e.g. advertising drugs) • Pharmaceutical campaign advertising analytics

• Modeling the consumer, trying to understand their user behavior (why are they purchasing this medication, how do they feel about their ailment, related behaviors, etc.)

• Patient Satisfaction Survey 12

Compliance

SQL Server Compliance: http://www.microsoft.com/sqlserver/2008/en/us/compliance.aspx Reaching Compliance Whitepaper: http://www.microsoft.com/en-us/download/details.aspx?id=6808

13

IT Control SOX

PCI

HIPA

A

GLB

A

ID Management

Separation of Duties

Encryption

Key Management

Auditing

Control Testing

Policy Management

Auditing: BIDMC Scenario

Auditing is critical component HIPAA compliance and ensuring patient privacy 1 Billion rows+ of audit data 146 mission critical clinical applications Comprehensive audits yield 300-500k transactions/day HIPAA requires audit system with 20 years of data

Auditing Project Available to community as part of Compliance SDK Collaboration of Caregroup, MCS, SQLCAT

Quote: Creating an enterprise tool for consolidated storage, reporting and alerting

of all application audit data - that's cool! John Halamka’s Cool Technology of the Week (Wellsphere Top Health

Blogger, Health Impact Award)

14

Compliance: Auditing

Audit specific users Typically want to do sysadmin But, many scenarios require auditing of more users because those users

have insert, update access Based on your policies

Audit specific tables Audit all tables that can only be modified or deemed as sensitive

Audit Objects Key and encryption access auditing (Audit action types:

DATABASE_OBJECT_ACCESS_GROUP and DATABASE_OBJECT_CHANGE_GROUP)

Audit everything approach Can grow quite quickly (i.e. lots of data) so may want to limit data Or have your audit reporting system filter out data you do not need

15

BIDMC Compliance Project

16

SSIS SSIS

SSIS

HDInsight Windows

HDInsight Azure

SQL Server 2008/2012 Audit Logs ETL Logs to HDFS

SSAS (tabular)

Use Excel 2013 PowerPivot and Power View.

Centralizing Audit Logs and Reporting

Centralizing Logs Allows you to have one system process all audit logs from your

servers Easier manageability Set files to 250MB in size (less files, but not too large to process) Optimized for Hadoop General Rule of Thumb: 250MB-1GB file sizes

Can also centralize processing … and centralize reporting

Compliance SDK contains the full project Organized by Server, Database, DDL, and DML actions

17

Auditing: Interesting Observations

Backup a user database: Need CREATE permissions on the master database to look at the

backup media The CREATE permission is a misnomer since you are not creating Nevertheless required to do a backup hence the RESTORE

LABELONLY statements in your audit

Server Principal Name is the user name A lot of VIEW SERVER STATE calls but is part of important server audit specification (may want to filter this out) Audit Logs can generate A LOT of data

• 2 medium servers generated 250GB of files in 6 hours!

18

Auditing Sensitive Information

19

Querying Audit InformationUse PowerPivot / Power View / Analysis Services to Query the data.

Security InformationPolicy Information

Process Audit InformationUse SSIS to process SQL2008 All-Actions Audit Information and other CG application audit log data; potentially can use Management Performance DW framework.

Caregroup Environment

File Server

SQL Audit

Connect/Logic

SSIS

CG Application Data

Intersystems Cache

SQL2005

Oracle

SQL2008 All-Actions Audit Data

SQL 2008 / 2012 R2

SSRS 2008 /Power View

Policy Analysis

Policy Reports

Policy Best Practices

Security Analysis

Security Reports

Compliance Reports

Feedback Action LoopUpdate systems to keep them

compliant and secure

Audit Logs

20

Storage Infrastructure

Transfer files to ASV via AzCopy, CloudExplorer, etc.

Storage Infrastructure

21

Hadoop on Azure Compute Nodes (Medium VMs)

Azure Storage Vault (ASV) Azure Blob Storage

Azure Flat Network Storage

22

SSIS Processing

Hadoop / Auditing: File sizes

Currently testing gz vs. raw, • E.g. 12MB raw text file vs. 633Kb gz file (~20x compression)

20x smaller size, ~same query time

• Approx same map / reduce task utilization

File Size is 250MB-1GB • SSIS package takes care of the size

Future testing: avro, protobuf

23

Query Duration (s)

select count(*) from sql_audit_asv_raw 56.066

select count(*) from sql_audit_asv_gz 58.994

Hadoop / Auditing: Formats

For ease of processing, replace carriage returns within embedded SQL statements, e.g. select col1, col2

from tableA

to select col1, col2 from tableA

This allows you to create a Hive table using CR as row delimiter (i.e. does not have things like SQL quoted identifiers)

24

25

SQOOP, HiveODBC, Templeton, CSV, etc

BI Connectivity

PowerPivot workbook of Audit data extracted from HoA via HiveODBC

PowerPivot workbook of Audit data extracted from HoA via HiveODBC

Power View workbook of Audit data extracted from HoA via HiveODBC

Global Sponsor:

Questions?

Global Sponsor:

Thank You for Attending