Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling

Post on 21-Apr-2017

3.708 views 3 download

Transcript of Transitioning from Traditional DW to Apache® Spark™ in Operating Room Predictive Modeling

Transitioning from Traditional DW to Spark in OR Predictive Modeling

Ayad Shammout and Denny LeeOctober 21st, 2015

About Ayad Shammout

• Director of Business Intelligence, Beth Israel Deaconess Medical Center

• Helped build Business Intelligence, highly available / disaster recovery infrastructure for BIDMC

2

About Denny Lee

• Technology Evangelist, Databricks

• Former Sr. Director of Data Sciences Eng, Concur

• Helped bring Hadoop onto Windows and Azure

3

We are Databricks, the company behind Spark

Founded by the creators of Apache Spark in 2013

Share of Spark code contributed by Databricksin 2014

75%

4

Data Value

Created Databricks on top of Spark to make big data simple.

Why is Operating Room Scheduling Predictive Modeling Important?

6

$15-$20 / minute for a basic surgical procedure

Time is an OR's most valuable resource

Lack of OR availabilitymeans loss of patient

OR efficiency differs depending on theOR staffing and allocation (8, 10, 13, or 16h), not the workload (i.e. cases)

7

“You are not going to get the elephant to shrink or change its size. You need to face the fact that the elephant is 8 OR tall and 11hr wide”

Steven Shafer, MD

8

Operating RoomBetter utilization =

Better profit margins

Reduce support andmaintenance costs

Medical StaffBetter utilization =

Better profit margins

Better medical staffefficiencies = Better

outcomes

PatientsShorter wait times

and less cancellations

Better medical staffefficiencies = Better

outcomes

Develop Predictive Model

• Develop a predictive model that would identify available OR time 15 business days in advance.

• Allow us to confirm wait list cases two weeks in advance, instead of when the blocks normally release four days out.

9

Forecast OR Schedule

• Case load 15 business days in advance

• Book more cases weeks in advance to prevent under-utilization

• Reduce staff overtime and idle time

10

Background

• Three surgical groups• GYN, urology, general surgery, colorectal, surgical

oncology• Eyes, plastics, ENT• Orthopedics, podiatry

• Currently built using SQL Server Data Mining

11

Using Traditional Data Warehousing Techniques

OR DWSSAS Data

MiningData Sources

OR Reports

Traditional Data Warehousing & Data Mining OR Predictive Model

Process mining model every 3 hours

OR Prediction DB

Data inserts every 3 hours

Prediction results

14

Original Design

• Multiple data sources pushing data into SQL Server and SQL Server Analysis Server Data Mining

• Hand built 225 different DM modules (5 days, 15 business days ahead, 3 different groups)

• Pipeline process had to run 225 times / day (3 pools x 75 modules)

15

Regression Calculations

SSAS Data Mining T-SQL Code

Intercept R2

Mean Adjusted R2

Coefficients Standard Deviation

Variance Standard Error

Taking advantage of Spark’s DW Capabilities and MLlib

OR DWData Sources

OR Reports

OR Predictive Model in Spark

Data inserts every 3 hours

18

demoOR Block SchedulingExtract History data and run linear regression with SGD with multiple variables

19

21

22

23

24

25

26

27

OR Schedule Report (example)

28

Why the model is working

• Can coordinate waitlist scheduling logistics with physicians and patients within two weeks of the surgery

• Plan staff scheduling and resources so there are less last-minute staffing issues for nursing and anesthesia

• Utilization metrics are showing us where we can maximize our elective surgical schedule and level demand

Key Learnings when Migrating from Traditional DW to Spark

30

Transitioning to the CloudBeth Israel Deaconess Medical Center is increasingly moving to cloud infrastructure services with the hopes of closing its data center when the hospital's lease is up in the next five years. CIO John Halamka says he's decommissioning HP and Dell servers as he moves more of his compute workloads to Amazon Web Services, where he's currently using 30 virtual machines to test and develop new applications. "It is no longer cost effective to deal with server hosting ourselves because our challenge isn't real estate, it's power and cooling," he says.

31

Transitioning to the Cloud

• Need time for engineers, analysts, and data scientists to learn how to build for the cloud

• Build for security right from start – process heavy, a lot of documentation, audits / reviews

• Differentiating data engineers and engineers (REST APIs, services, elasticity, etc.)

32

Transitioning to Spark

• No more stored procedures or indexes• Good for Spark SQL, services design

• Prototype, prototype, prototype • Leverage existing languages and skill sets • Leverage the MOOCs and other Spark training• Break down the silos of data engineers, engineers, data

scientists, and analysts

33

Transitioning DW to Spark• Understand Partitioning, Broadcast Joins, and Parquet

• Not all Hive functions are available in Spark (99% of the time that is okay) due to Hive context

• Don’t limit yourself to build star-schemas / snowflake schemas

• Expand outside of traditional DW: machine learning, streaming

Thank you.For more information, please contact ayad.shammout@hotmail.comdenny@databricks.com