The IBM Reference Architecture for Healthcare and Life ... · PDF fileThe IBM Reference...
Transcript of The IBM Reference Architecture for Healthcare and Life ... · PDF fileThe IBM Reference...
March 23-24, 2017
Doing It Right
SYMPOSIUM
The IBM Reference Architecture for
Healthcare and Life Sciences
Janis Landry-LaneIBM Systems Group
World Wide Program [email protected]
Big Data Symposium – March 24, 2017
The Era of Genomics Represents “BIG DATA”
Big Data Symposium – March 24, 2017 2
A New Era of Precision Healthcare
Green, ED et al (2011). Charting a course for genomic medicine from base pairs to bedside. Nature 470: 204-213
Key Client Interests
Clinical GenomicsWhat does my patient’s genomic information tell me about the treatment I should select?
Completion of the Human Genome Project in 2003 led to an expansion of research on the contributions of genomics in disease diagnosis, treatment, and prevention
Early DiscoveryWhat biological or environmental factors are causing disease? Can we design diagnostics and drugs to improve patient outcomes?
University Research / Pharmaceutical R&D Hospital Systems
Big Data Symposium – March 24, 2017 3
Key Technical IT Challenges
Big Data Symposium – March 24, 2017
Data Silos
Complex Workload
Big Data
…
Evolving Frameworks & Databases
……
International Collaboration
4
…
Enterprise Data Management
Disk TapeFlash POWERx86
Workload Orchestration
Industry Applications
IT Administration
VM
Data Repositories and Databases
…
Applications & Frameworks
Compute & Storage Servers
Optimize utilization of compute resources across the enterprise
Improve data access and optimize storage utilization across the enterprise
Software-DefinedInfrastructure
On / Off premises – Hybrid Cloud
Reference Architecture for Healthcare & Life Science Analytics
Big Data Symposium – March 24, 2017 5
IBM Spectrum Computing
POSIX
Disk TapeFlash POWERx86
Image AnalysisGenomic Analysis Cognitive AnalyticsClinical Informatics
IT Administration
IBM Spectrum StorageInformation Lifecycle Management High-Performance I/O Data Sharing
Workflow Admin Workload MonitoringResource Allocation Cluster ProvisioningMetadata Collection
Metadata Collection
EMR
EDW / UDMH
LIMS CPOE
Omics DW VNA Knowledge Base
NGS Ref Databases Imaging / PACS RIS Literature Ontologies
- - - -
HDFS POSIObject
VM
---
- ---… … …-
On / Off premises – Hybrid Cloud
Enterprise Data Management
Workload Orchestration
Big Data Repository
Sample Workloads
Workload Management
Enterprise Data
Management
Compute & Storage Servers
Reference Architecture for Healthcare & Life Science Analytics
Big Data Symposium – March 24, 2017 6
On-premise infrastructure
Spectrum Scale (GPFS)
Secure VPN tunnel Cloud Resident Cluster
Spectrum Scale (GPFS)
…
IBM Aspera FASP
Spectrum Scale AFM AFMIBM Elastic Storage
…
Cloud infrastructure
IBM Elastic Storage
On-Premise Cluster Spectrum LSFWorkloads
IBM designs a hybrid cloud architecture that supports seamless communication of workflows across on-premise and cloud environments
A Hybrid Cloud Architecture
Big Data Symposium – March 24, 2017 7
IBM Products Offer Flexibility for Customers
Big Data Symposium – March 24, 2017
Platform Symphony Scheduler
App1 App2
Spectrum Conductor for Spark
Job 1 Job 2
Hadoop MapReduce Apps
Spectrum LSF Scheduler
Job 1 Job 2
MPI/BatchApplications
Spectrum Conductor for Spark
App1 App2
IBM Platform Computing - Resource Orchestration and Monitoring
Single File System for POSIX, NFS, HDFS access for efficient data Sharing
IBM Spectrum Scale File System / Data Store ConnectorsPOSIX NFS HDFS Object
Storage rich servers TapeFlash Disk
Single platform for workload management forautomated resource sharing
Platform Process ManagerWorkflows/Pipelines
Spark ApplicationsPlatform SymphonyApplications
8
IBM provided: Spectrum Scale for performance, data management, and resiliencySpectrum Archive to support movement of 1.2 PB per month to tapeA robust scheduler that supports multi-thread processing stream to
accelerate compute and can efficiently process unpredictable data streams
An overall lowest cost of managing 10’s of PB of online data and an archive with site diversity
Mission: Deliver analysis to support personalized treatment to individual cancer patients where the Standard of Care has failedRequirements: a data architecture that provides Management, Resiliency, Scalability, Economics, and Long Term Retention
Case Study #1: Major Genomics Provider in NY:
Big Data Symposium – March 24, 2017 9
New Project Demands• Add 2 Spectrum Archive• Add 2 Spectrum Scale• Add TS4500 Exp frame• Add 8 TS1150• Add V7000 +SSD for
Metadata in Cluster• 3PB Peak
Throughput/mo
The Timeline: Major Genomics Center in NY:
010002000300040005000600070008000
Month 1 Month 4 Month 11 Month 21 Month 24
Site Diversity
Ingest RateTB/Month
Initial Assessment• 3 Spectrum
Archive• 3 Spectrum
Scale• TS4500/V7000• 6 TS1150• Inc Air Gap
Security (one tape copy)
Revised Assessment• Add 6 TS1150• Add V7000 disk
Organic Growth• No
change
Mission Critical (Planned)• System critical production operation• Site diversity• Second site is at a related institution, with high bandwidth connectivity (Internet 2)
Variety of All Data Sources Not Known & Growth
Unpredictable
Volume of Data Known & Growth Predictable
Value of Data Known
(TB)Infrastructure ScalesQuickly at Low $ Add
Infrastructure ScalesIncrementally for
Planned Growth
DR Extension
Big Data Symposium – March 24, 2017 10
Case Study #2: Alberta Children’s Hospital Research Inst.
Mission: Provide Precision Medicine for a variety of childhood diseases, including ”Care for Rare” consortium. Make available a robust platform for new discovery and model organisms. Requirements: A cost-effective, scalable platform for supporting the “breaking down of silos”
IBM provided: Spectrum Scale for performance, data management, and resiliency with an archive Compute enhancements to support projects with the ability to aggregate data into a data model for advanced analytics A Global namespace that provided the ability to “break down silos” and share amongst many research groups A robust scheduler that supports both existing as well as additional compute paradigms An overall architecture with the ability to add incrementally.
https://www.ibm.com/news/ca/en/2016/06/15/u599716r24585k82.html
Big Data Symposium – March 24, 2017 11
Mission: To achieve the most effective solution for genomic workloads without re-architecting the industry-standard software, we performed a rigorous analysis of usage statistics, benchmarks and available technologies to design a system for maximum throughput. IBM provided:
Spectrum Scale for performance, data management, and resiliency. IBM Flash tier supported by Spectrum Scale for optimization of workflow performance with this tier of SCRATCH to support small file sizes A robust scheduler that could handle 700,000 jobs in the queue Data management to move data from the Flash tier to disk as soon as the workflow completes A Global namespace to provide support for a large and growing faculty/staff user baseAn overall architecture with the ability to add incrementally.
https://hpc.mssm.edu/files/SC15-BODE-paper.pdf
Case Study #3: Mount Sinai School of Medicine
Big Data Symposium – March 24, 2017 12
Lessons Learned:
Customer Success is derived from: Managing explosive growth. Managing application performance– especially with clinical diagnostics and patients for whom the standard of care has failed A robust Data management/metadata search system that allows scientists to retrieve past data is a must as they will ultimately update their research with new algorithms or the data required for reproducible results. A Global namespace that breaks down silos, and provides a single copy of the data. A robust scheduler that supports both existing as well as additional compute paradigms is essential to enable sharing of a single IT environment with a myriad of applications
APPLICATIONS, they come and go, but ARCHITECTURE ENDURES……………
Big Data Symposium – March 24, 2017 13
Big Data Symposium – March 24, 2017 14