An Overview of Bionimbus (March 2010)

25
An Overview of Bionimbus and the Open Cloud Consortium Robert Grossman Open Cloud Consortium Institute for Genomics & Systems Biology University of Chicago Laboratory for Advanced Computing University of Illinois at Chicago

description

This is a talk I gave at NHGRI in March 2010.

Transcript of An Overview of Bionimbus (March 2010)

Page 1: An Overview of Bionimbus (March 2010)

An Overview of Bionimbus and the Open Cloud Consortium

Robert GrossmanOpen Cloud Consortium

Institute for Genomics & Systems BiologyUniversity of Chicago

Laboratory for Advanced ComputingUniversity of Illinois at Chicago

Page 2: An Overview of Bionimbus (March 2010)

Part 1. Bionimbus

www.bionimbus.org

Page 3: An Overview of Bionimbus (March 2010)

Database Services

Analysis Pipelines & Re-analysis

Services

Web Portal & Widgets

Large Data Cloud Services

Data Ingestion Services

Elastic Cloud Services

Scalable data transport

Page 4: An Overview of Bionimbus (March 2010)

Case Study 1: Cistrack

• Resource for cis-regulatory data.• Integrates databases and large data clouds.• Open source.• Contains raw data, intermediate, and analyzed

data from approximately 300 experiments from Agilent, Affy and Solexa platforms.

Page 5: An Overview of Bionimbus (March 2010)

Flynet Provides Web 2.0 Access to Cistrack

Page 6: An Overview of Bionimbus (March 2010)

Cube is an Elastic Cloud For Re-analysis

Page 7: An Overview of Bionimbus (March 2010)

Case Study 2

SNP concordance:

Alignment against gene models: 46%

TopHat alignment: 91%

71 rare, deleterious SNP genotypes were validated by Sequenom.

• Ran TopHat in Bionimbus using Cube-based VMs.• Total time went from 25 days to 1 day.

Page 8: An Overview of Bionimbus (March 2010)

App

OS

App

OS

App

OS

Hypervisers

Racks of Hardware

Private cloud (Eucalyptus & Cube)

Working Space

Simple Persistent

Storage (glusterfs)

Virtual MachinesmodENCODE Worm/Fly peak calling reanalysis

Case Study 3

ftp

ssh

Page 9: An Overview of Bionimbus (March 2010)

App

OS

App

OS

App

OS

Hypervisers

Hardware Cluster

Private / Community cloud

Virtual Machines

Bionimbus virtual machine images

Public cloud

ami-efa24c86

Hybrid Clouds

Page 10: An Overview of Bionimbus (March 2010)

Bionimbus Delivery Mechanisms

• Login and use the Bionimbus cloud.• Use Bionimbus Virtual Machine Images in a)

your private cloud; b) Bionimbus cloud; c) public clouds such as Amazon.

• Bionimbus is open source and you can build your own cloud (and interoperate with ours) (First release of integrated system 3Q 2010)

• Bionimbus data services for genomic data, even for large datasets

Page 11: An Overview of Bionimbus (March 2010)

Goal: Minimize latency and control heat.

Goal: Maximize data (with matching compute) and control cost.

Goal: Minimize cost of virtualized machines & provide on-demand.

HPC

Large Data Clouds

Elastic Clouds

Page 12: An Overview of Bionimbus (March 2010)

Persist & refresh data over the long term

High speed network to move & share the data

Web 2.0/3.0 user interface

Compute services at the scale of a data center.

A successful cloud will…

Page 13: An Overview of Bionimbus (March 2010)

Part 2.

www.opencloudconsortium.org

13

Page 14: An Overview of Bionimbus (March 2010)

• 501(c)(3) Not-for-profit corporation• Develops standards, interoperability

frameworks, and reference implementations.

• Operates clouds.• Develops benchmarks.• One area of focus: bridge between

private and public clouds.14

www.opencloudconsortium.org

Page 15: An Overview of Bionimbus (March 2010)

Operates Clouds

• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3

each year.

• Open Cloud Testbed• Open Science Data Cloud• Cloud-based Disaster

Relief Services

Page 16: An Overview of Bionimbus (March 2010)

OCC Members

• Companies: Yahoo, Cisco, Aerospace Corp., Booz Allen Hamilton, InfoBlox, Open Data Group, Raytheon

• Universities: CalIT2, Johns Hopkins, Northwestern University, University of Chicago, University of Illinois at Chicago

• Government agencies: NASA

16

Page 17: An Overview of Bionimbus (March 2010)

Open Cloud Consortium Perspective

• Vendor neutral• Open, interoperable

architecture• Experiment at scale• Operate infrastructure at the

scale of a small data center• Long term point of view

(think like a library not cloud service provider)

• Think public, private & hybrid clouds

Page 18: An Overview of Bionimbus (March 2010)

Raywulf rack

Condo Clouds

Page 19: An Overview of Bionimbus (March 2010)

Open Cloud Testbed

Phase 2• 9 racks• 250+ Nodes• 1000+ Cores• 10+ Gb/s

19

MREN

CENIC Dragon

Hadoop Sector/Sphere Thrift KVM VMs Eucalyptus Nova

C-Wave

Page 20: An Overview of Bionimbus (March 2010)

Open Science Data Cloud

20

Astronomical dataBiological data (Bionimbus)

Networking dataImage processing for disaster relief

Page 21: An Overview of Bionimbus (March 2010)

Storage Services

Compute Services

Applications

Virtual Network Manager

Data Services

Network Transport

Virtual Machine Manager

CloudMetadata Services

Identity Manager

IaaS

PaaS

Apps

Page 22: An Overview of Bionimbus (March 2010)

Standards

Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)

Platform as a Service– Cloud Compute Services– Data/Table Cloud Services– Cloud Storage Services

Open Virtualization Format (OVF)

Open Cloud Computing Interface (OCCI)

SNIA Cloud Data Management Interface (CDMI)

Large Data Cloud Interoperability Framework

Page 23: An Overview of Bionimbus (March 2010)

OCC Benchmarks

MalStone A MalStone BLarge Data Cloud 1a 455m 13s 840m 50s

Large Data Cloud 1b 87m 29s 142m 32s

Large Data Cloud 2 33m 40s 43m 44s

There are surprises.

Page 24: An Overview of Bionimbus (March 2010)

Acknowledgements

Page 25: An Overview of Bionimbus (March 2010)

Thank You

• For more information:– www.bionimbus.org– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)