Bionimbus - An Overview (2010-v6)

25
An Overview of Bionimbus and the Open Cloud Consortium Robert Grossman Open Cloud Consortium Institute for Genomics & Systems Biology University of Chicago Laboratory for Advanced Computing University of Illinois at Chicago

description

Bionimbus is an open source cloud based system for managing, analyzing and sharing genomic data.

Transcript of Bionimbus - An Overview (2010-v6)

Page 1: Bionimbus - An Overview (2010-v6)

An Overview of Bionimbus and the Open Cloud Consortium

Robert GrossmanOpen Cloud Consortium

Institute for Genomics & Systems BiologyUniversity of Chicago

Laboratory for Advanced ComputingUniversity of Illinois at Chicago

Page 2: Bionimbus - An Overview (2010-v6)

Part 1. Bionimbus

www.bionimbus.org

Page 3: Bionimbus - An Overview (2010-v6)

Database Services

Analysis Pipelines & Re-analysis

Services

Web Portal & Widgets

Large Data Cloud Services

Ingestion Services

Elastic Cloud Services

Scalable data transport

Page 4: Bionimbus - An Overview (2010-v6)

Case Study 1: Cistrack

• Resource for cis-regulatory data.• Integrates databases and large data clouds.• Open source.• Contains raw data, intermediate, and analyzed

data from approximately 300 experiments from Agilent, Affy and Solexa platforms.

Page 5: Bionimbus - An Overview (2010-v6)

Flynet Provides Web2.0 Access to Cistrack

Page 6: Bionimbus - An Overview (2010-v6)

Cube is an Elastic Cloud For Re-analysis

Page 7: Bionimbus - An Overview (2010-v6)

App

OS

App

OS

App

OS

Hypervisers

Racks of Hardware

Private cloud (Eucalyptus & Cube)

Working Space

Simple Persistent

Storage (glusterfs)

Virtual MachinesmodENCODE Worm/Fly peak calling reanalysis

Case Study 2

ftp

ssh

Page 8: Bionimbus - An Overview (2010-v6)

App

OS

App

OS

App

OS

Hypervisers

Hardware Cluster

Private / Community cloud

Virtual Machines

Virtual machine containing (small) data & pipelines

Public cloud

ami-efa24c86

Hybrid Clouds

Page 9: Bionimbus - An Overview (2010-v6)

Case Study 3

SNP concordance:

Alignment against gene models: 46%

TopHat alignment: 91%

71 rare, deleterious SNP genotypes were validated by Sequenom.

• Ran TopHat in Bionimbus using VMs and cube.• Total time went from 25 days to 1 day.

Page 10: Bionimbus - An Overview (2010-v6)

Bionimbus Delivery Mechanisms

• Login and use the Bionimbus cloud.• Use Bionimbus Virtual Machine Images in a)

your private cloud; b) Bionimbus cloud; c) public clouds such as Amazon.

• Bionimbus is open source and you can build your own cloud (and interoperate with ours) (First release of integrated system 3Q 2010)

• Bionimbus data services for genomic data, even for large datasets

Page 11: Bionimbus - An Overview (2010-v6)

Goal: Minimize latency and control heat.

Goal: Maximize data (with matching compute) and control cost.

Goal: Minimize cost of virtualized machines & provide on-demand.

HPC

Large Data Clouds

Elastic Clouds

Page 12: Bionimbus - An Overview (2010-v6)

Persist & refresh data over the long term

High speed network to move & share the data

Web 2.0/3.0 user interface

Compute services at the scale of a data center.

A successful cloud will…

Page 13: Bionimbus - An Overview (2010-v6)

Part 2.

www.opencloudconsortium.org

13

Page 14: Bionimbus - An Overview (2010-v6)

• 501(c)(3) Not-for-profit corporation• Develops standards, interoperability

frameworks, and reference implementations.

• Operates clouds.• Develops benchmarks.• One area of focus: bridge between

private and public clouds.14

www.opencloudconsortium.org

Page 15: Bionimbus - An Overview (2010-v6)

Operates Clouds

• 500 nodes• 3000 cores• 1.5+ PB• Four data centers• 10 Gbps• Target to refresh 1/3

each year.

• Open Cloud Testbed• Open Science Data Cloud• Intercloud Testbed• Cloud-based Disaster

Relief Services

Page 16: Bionimbus - An Overview (2010-v6)

OCC Members

• Companies: Yahoo, Cisco, Aerospace Corp., Booz Allen Hamilton, InfoBlox, Open Data Group, Raytheon

• Universities: CalIT2, Johns Hopkins, MIT Lincoln Lab, Northwestern Univ., University of Chicago, University of Illinois at Chicago

• Government agencies: NASA

16

Page 17: Bionimbus - An Overview (2010-v6)

Open Cloud Consortium Perspective

• Vendor neutral• Open, interoperable

architecture• Experiment at scale• Operate infrastructure at the

scale of a small data center• Long term point of view

(think like a library not cloud service provider)

• Think public, private & hybrid clouds

Page 18: Bionimbus - An Overview (2010-v6)

Raywulf rack

Condo Clouds

Page 19: Bionimbus - An Overview (2010-v6)

Open Cloud Testbed

Phase 2• 9 racks• 250+ Nodes• 1000+ Cores• 10+ Gb/s

19

MREN

CENIC Dragon

Hadoop Sector/Sphere Thrift KVM VMs Eucalyptus VMs

C-Wave

Page 20: Bionimbus - An Overview (2010-v6)

Open Science Data Cloud

20

Astronomical dataBiological data (Bionimbus)

Networking dataImage processing for disaster relief

Page 21: Bionimbus - An Overview (2010-v6)

Storage Services

Compute Services

Applications

Virtual Network Manager

Data Services

Network Transport

Virtual Machine Manager

CloudMetadata Services

Identity Manager

IaaS

PaaS

Apps

Page 22: Bionimbus - An Overview (2010-v6)

Standards

Infrastructure as a Service– Virtual Data Centers (VDC)– Virtual Networks (VN)– Virtual Machines (VM)

Platform as a Service– Cloud Compute Services– Data/Table Cloud Services– Cloud Storage Services

Open Virtualization Format (OVF)

Open Cloud Computing Interface (OCCI)

SNIA Cloud Data Management Interface (CDMI)

Large Data Cloud Interoperability Framework

Page 23: Bionimbus - An Overview (2010-v6)

OCC Benchmarks

MalStone A MalStone BLarge Data Cloud 1a 455m 13s 840m 50s

Large Data Cloud 1b 87m 29s 142m 32s

Large Data Cloud 2 33m 40s 43m 44s

There are surprises.

Page 24: Bionimbus - An Overview (2010-v6)

Acknowledgements

Page 25: Bionimbus - An Overview (2010-v6)

Thank You

• For more information:– www.bionimbus.org– www.opencloudconsortium.org– rgrossman.com (for research papers, etc.)