Bionimbus - Northwestern CGI Workshop 4-21-2011

Bionimbus: A Cloud-Based Infrastructure for Managing,

Analyzing and Sharing Genomics Data

Robert GrossmanInstitute for Genomics & Systems Biology (IGSB)

Computation InstituteUniversity of Chicago

andOpen Cloud Consortium

April 21, 2011

Background

Growth of Genomic Data

Sanger Sequencing

Microarray technology

454, Solexa sequencing

2001HGP

2003ENCODESequence

species

Sequence everythingSequence

environment

Genbank 10^5 10^8 10^10

2003GFS

2008Hadoop 2006

Source: Lincoln Stein

The Challenge is to Support Cubes of High Throughput Sequence Data

Perturb the environment

Different developmental stages

Each cell in data cube can be ChIP-chip, ChIP-seq, RNA-seq, movie, etc. data set.

Different pathologies

We Have a Problem

• More and more of your colleagues produce so much data that they cannot easily manage, move, analyze and share it.

• Centers and large projects build their own infrastructure.• Every else is on their own.

Part 1. Using Bionimbus

www.bionimbus.org

Bionimbus is a community cloud for storing, analyzing and sharing genomics and related data.

Enabling a broad community to utilize genome research

Bionimbus Cloud Sequencing Partner

or Center

Step 1. Prepare a Sample

Step 2. Login to Bionimbus and get a Bionimbus Key.

Step 3. Fedex your sample to CGI.

Step 4. Login on to Bionimbus and view your data

Step 5. Use Bionimbus to perform standard and custom pipelines.

Using the ability of Bionimbus to launch multiple virtual machines reduced this analysis from 25 days to 1 day.

Bionimbus Private Cloud

Bionimbus Community

Bionimbus Private

Cloud XYAmazondbGaP

CGIInternalSequencers

Step 1. Get Bionimbus ID (BID), assign project, private/community, public cloud, etc.

Step 2. Send sample tobe sequenced.

BID Generator

Step 3b. Returnvariant calls, CNV, annotation…

Step 4. Secure datarouting to appropriatecloud based upon BID.

Step 5. Cloud based analysis

using IGSB and 3rd party tools and applications.

Step 3a. Return rawreads.

Part 2. Introduction to Clouds

Clouds provide on-demand computing and storage resources at the scale and with the reliability of a data center.

Computer scientists were caught by surprise.

What is a Cloud?

Software as a Service (SaaS)

What Else a Cloud?

Infrastructure as a Service (IaaS)

Users get one or more virtual machines “on demand”

Are There Other Types of Clouds?

Hadoop was developed for processing Internet scale data for ad targeting and related applications but is now used for processing genomics data and may other applications.

ad targeting

What is a new about clouds?

Scale is New

Elastic, On-Demand Computing with Usage Based Pricing Is New

1 computer in a rack for 120 hours

120 computers in three racks for 1 hour

costs the same as

Data center scale computing often leverages virtualization technologies.

Part 3. Some Bionimbus Cases

Case Study: Public Datasets in Bionimbus

Case Study: ModENCODE

• Bionimbus is used to process the modENCODE data from the White lab (over 1000 experiments).

• Bionimbus VMs were used for some of the integrative analysis.

• Bionimbus is used as a backup for the modENCODE DCC

>300 ChIP datasets-Chromatin/RNA timecourse-CBP-PolII-Pho/silencers-HDACs-Insulators-TFs

Predictions537 silencers2,307 new promoters12,285 enhancers14,145 insulators

www.modencode.orgwww.cistrack.orgNegre et al. Nature 2011

Case Study: IGSB

• All samples processed by the Institute for Genomics & Systems Biology High-Throughput Genome Analysis Core (HGAC) at the University of Chicago use Bionimbus.

Bionimbus Virtual Machine Releases Peak Calling MAT

MA2CPeakSeqMACSSPP

Quality Control

Various

Alignment & Genotyping

Bowtie

TopHatSamtoolsPicard

Part 4

Data Centers for Science

experimental science

simulation science

datascience

160930x

1670250x

197610x-100x

200410x-100x

Astronomical dataBiological data (Bionimbus)

NSF-PIRE OSDC Data ChallengeEarth science data (& disaster relief)

Open Science Data Cloud

The goal is to build a data center in Chicago for biological, scientific,

medical and health care data in 4 to 5 years.

Part 5. More About Bionimbus

Database Services

Analysis Pipelines & Re-analysis

Services

GWT-based Front End

Large Data Cloud Services

Data Ingestion Services

Elastic Cloud Services

Intercloud Services

Database Services

Analysis Pipelines & Re-analysis

Services

GWT-based Front End

Large Data Cloud Services

Data Ingestion Services

Elastic Cloud Services

Intercloud Services

(Hadoop,Sector/Sphere)

(Eucalyptus,OpenStack)

(PostgreSQL)

(IDs, etc.)(UDT, replication)

Bionimbus Deployment Options

Bionimbus Community Cloudwww.bionimbus.org

Bionimbus AMIs & Amazon hosted applications

Bionimbus Private Clouds

1. Provide long term persistent storage services at the scale of a data center.

A successful cloud will…

3. High performance ingestion and transport of data.2. Provide

Compute services at the scale of a data center.

6. Peer with private genomics clouds.

A successful cloud will…

5. Peer with public clouds.

4. Support the liberation of data.

Bionimbus satisfies each of these six requirements.

Bionimbus Road Map

Over the next 3 to 4 months, we will:• Launch Bionimbus (we are in a pre-launch)• Add Galaxy-based workflow to Bionimbus• Add secure routing of genomes• Add more public datasets• Add more pipelines

For More Informationwww.bionimbus.org

Bionimbus - Northwestern CGI Workshop 4-21-2011

Technology

Transcript of Bionimbus - Northwestern CGI Workshop 4-21-2011

Architectural CGI

Northwestern School of Commerce • Northwestern Business … · 2018-04-19 · Northwestern School of Commerce • Northwestern Business College • NBC-Tech • Northwestern College

Bionimbus: Lessons from a Petabyte-Scale Science …...Bionimbus: Lessons from a Petabyte-Scale Science Cloud Service Provider (CSP) Robert Grossman Institute for Genomics & Systems

CGI Q3 2019 FS ENTitle: CGI Q3 2019 FS EN Author: CGI Inc. Subject: Investors Keywords: CGI Created Date: 7/30/2019 5:51:43 PM

3.0.1.3.2 – Introduction to CGI 4/8/2004 3.0.1.3.2 - Introduction to CGI 1 3.0.1.3.2 Introduction to CGI – Session 2 · Introduction to CGI: Generating.

CGI Group - 1998 Annual Report · cgi group inc. 1998 annual report commitment experience the cgi group inc. 1998annual report growing CGI and you financial highlights. average% Years

CGI Annual Information Form - cgi.com · Conseillers en gestion et informatique CGI inc. Canada CGI Information Systems and Management Consultants Inc. Canada CGI Technologies and

CGI Mission

3.0.1.3.3 – Introduction to CGI 4/15/2004 3.0.1.3.3 - Introduction to CGI 1 3.0.1.3.3 Introduction to CGI – Session 3 · Introduction to CGI: Data persistence.

Universidade de Brasíliarepositorio.unb.br/.../1/2012_HugoVasconcelosSaldanha.pdfSaldanha, Hugo Vasconcelos. BioNimbus: uma arquitetura de federac~ao de nuvens computacionais h brida

The Bionimbus PDC: Obtaining Access FAQ · 2016-06-17 · Prerequisites) The Bionimbus PDC is a HIPAA compliant cloud for analyzing and sharing protected data. The Bionimbus PDC is

Scott Gray's Gaming CGI Programs - unseelie.orgunseelie.org/srccgi/ScottsGamingCgi.pdf · Scott Gray's Gaming CGI Programs Contents: CGI Card Counter 2 CGI Dice Pool Calculator 8

CGI Corrugated - Stratco Patios | Sheds | Fencing ... CGI... · CGI Corrugated. 16mm ... Stratco Corrugated Iron (CGI) provides you with the style and strength that have been popular

CARGOGLIDEcargoglide.com/downloads/cargoglide-parts-catalog-pages9-26-18.pdfSnap Ring 12mm CYR 1-1/4 CGI CGI OOOXL #0801 16 Snap Ring 17mm CYR 1-1/2 CGI 200 CGI 500 CG1800HD CGI 500XL

Certificate CGI

Larry Amiot Northwestern University amiot@northwestern

CGI Polska jako pracodawca Wyniki badania …...CGI POLSKA Kanadyjskie korzenie, polski charakter CGI Polska wchodzi w skład międzynarodowej grupy CGI, światowego lidera w dziedzinie

Northwestern Ontario Innovation Centre - Northwestern ...€¦ · Introduction.........................................................................................................4

The CGI Space Story - CGI Group · journey into outer space The CGI Space Story ... ground segment for ESA's Envisat satellite Hubble ... CGI uses satellite communications network

Bionimbus:*From*Big*Data* to*Clouds*and*Commons*pire.opensciencedatacloud.org/talks/grossman-bio... · Bionimbus:*From*Big*Data* to*Clouds*and*Commons* RobertGrossman* University*of*Chicago*

Bionimbus:FromBigData toCloudsandCommonspire.opensciencedatacloud.org/talks/grossman-bio... · Bionimbus:FromBigData toCloudsandCommons RobertGrossman* UniversityofChicago*