Genomic Data Analysis Services Available for PL-Grid Users

11
Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus EUROPEAN UNION EUROPEAN REGIONAL DEVELOPMENT FUND INNOVATIVE ECONOMY NATIONAL COHESION STRATEGY Genomic Data Analysis Services Available for PL-Grid Users Tomasz Waller, Tomasz Gubała, Kazimierz Murzyn Academic Computer Centre Cyfronet AGH, cyfro.net Klaster LifeScience Kraków, lifescience.pl Recent Advances in Omics Research, Kraków, October 2014

description

Genomic Data Analysis Services Available for PL-Grid Users. Tomasz Waller, Tomasz Gubała , Kazimierz Murzyn. Academic Computer Centre Cyfronet AGH, cyfro.net Klaster LifeScience Kraków , lifescience.pl. Recent Advances in Omics Research, Kraków, October 2014. - PowerPoint PPT Presentation

Transcript of Genomic Data Analysis Services Available for PL-Grid Users

Page 1: Genomic Data Analysis Services Available for PL-Grid Users

Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus

Domain-oriented services and resources of Polish Infrastructure for Supporting Computational Science in the European Research Space – PLGrid Plus

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

Genomic Data Analysis ServicesAvailable for PL-Grid Users

Tomasz Waller, Tomasz Gubała, Kazimierz Murzyn

Academic Computer Centre Cyfronet AGH, cyfro.netKlaster LifeScience Kraków, lifescience.pl

Recent Advances in Omics Research, Kraków, October 2014

Page 2: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

2

Academic Computer Centre Cyfronet AGH• Established in 1973 (40 years of experience)• Provides network, computational power and data storage

capabilities for Polish science• ~374 TFlops (zeus, 175@top500), 2.5 PB (disks)

and 3.5 PB (tapes)• 1.7 PFlops (prometheus) with 10 PB of disks,

expected first half of 2015• Regular and bigmem nodes, vSMP, GPGPU, FPGA,

MPI over Infiniband• Details: http://kdm.cyfronet.pl/

PL-Grid Infrastructure for Polish science• Five computing centers with Cyfronet as

the consortium leader• Total: ~588 TFlops and ~5.6 PB (disks) but

soon to grow considerably (see above)• Available free of charge to all Polish scientists

and their foreign collaborators• Details: http://www.plgrid.pl

ACC Cyfronet AGH andPL-Grid Infrastructure

Page 3: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

3

Register at https://portal.plgrid.plUser verification process based on Polish OPI numberAssistants and foreigners are confirmed by Polish PIsVariety of basic and higher level services available after login

Local SSH access, cloud computing, middlewaresConsiderable library of installed applications

GATK, MACS, SAMTools, Picard, TopHat, Bowtie, (p)BWA, R/Bioconductor, AutoDock/AutoGrid, BLAST, Clustal, CPMD, Gromacs, NAMD, Matlab, Mathematica …Free to compile and install own applications using the shell loginPossibility to use own commercial licenses on HPC resources

Specific services dedicated to the Life Science domain

Using PL-Grid Infrastructure

Page 4: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

4

https://lifescience.plgrid.pl/For people who perform biological investigations using DNA microarraysGoal: help to analyze gene expression information and correlate it with other clinical dataAnalyses available now: normalization, clustering, SAM, T-test, GO-based enrichment, ANNs, PCA, panel filtering’Integromics’ analyses in ’beta’ (testing) stage

CCA, PLS (gene expression and lipidomics)Roleswitch, TargetScore (gene expression and miRNA)

Still in continuous development (Pathways, EBI export etc.)Supported models: some Affymetrix, Agilent SurePrint (adding support for others is possible, in case of demand)

DNA Microarray Integromics Analysis Platform (1/2)

Page 5: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

5

Notable featuresIntegration with EBI ArrayExpress (import, MIAME)Sharing experiments with othersImporting own data for further analysisSupported languages: PL, EN

Manual: https://docs.cyfronet.pl/x/JpaZCooperation

Jagiellonian University Medical Collage, KrakówMedical University of Silesia, KatowiceInstitute of Oncology, Gliwice

DNA Microarray Integromics Analysis Platform (2/2)

Page 6: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

6Agilent GeneSpring GX

RDP: genespring.plgrid.plUsed with Windows Remote DesktopIntegrated with the DNA Integromics Platform for uniform microarray files management5-year, single-seat license for all registered Polish scientistsManual: https://docs.cyfronet.pl/x/JIq1

Page 7: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

7Galaxy NGS Server (1/4)

Page 8: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

8

https://galaxy.plgrid.pl/”Galaxy is an open, web-based platform for data intensive

biomedical research.”Goal: deploy high-performance, high-throughput NGS data analysis solution on top of HPC resources for PL-Grid usersNeeds a lot of adjustments and in-house add-on developmentWork started 12.2013, and still at a beta stage… - but accessible to anyone willing to test and to helpPlanned integrated tools (list not closed): GATK, SAMtools, Bowtie, TopHat, BWA, bedtools, Cufflinks, Picard, SnpEff/SnpSift, Flexbar, FastQC, MACSTargeted platforms: Illumina *Seq, Ion Proton, Roche 454

Galaxy NGS Server (2/4)

Page 9: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

9Galaxy NGS Server (3/4)

Notable featuresFull integration with Zeus cluster and disk arraysPBS and MQ system for effective job queuingSecured environment (open for all PL-Grid users, not ”public”)All major Galaxy features (history, sharing, viewers)

Well documented workflows designed by NGS experts

Basics (alignment and quality control, trimming, filtering)DNA-Seq, RNA-Seq, variant calling, SNP calling, methylation, exome analysis with annotations

Manual: https://docs.cyfronet.pl/x/voasCooperation

Institute of Pharmacology, Polish Academy of Sciences, KrakówOMICRON, Jagiellonian University Medical Collage, KrakówNational Research Institute of Animal Production, Kraków-Balice

Page 10: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

10

Current challengesSome security issues in the Galaxy code prevent the production deploymentCluster integration is there, yet rather unstable and prone to fail (quite an intricate contraption, it is)Broad variety of integrated tools and wrappers does not help

Call to action – who is neededUsers: the bigger the community, the easier to make us visibleEarly adopters: tell us what you need, help us test and integrate the tools and workflows you useProgrammers: if you’d like to help us bring a dedicated HPC-powered Galaxy for Polish scientists, any assistance is greatly appreciatedContact: [email protected]

Galaxy NGS Server (4/4)

Page 11: Genomic Data Analysis Services Available for PL-Grid Users

EUROPEAN UNIONEUROPEAN REGIONALDEVELOPMENT FUND

INNOVATIVE ECONOMYNATIONAL COHESION STRATEGY

11

These resources, services and tools (and much more) are available after registering to PL-Grid

https://portal.plgrid.pl/

PL-Grid User Manualhttps://docs.plgrid.pl/podrecznik_uzytkownika (PL)https://docs.plgrid.pl/display/PLGDoc/User+manual (EN)

Questions, problems, requests about PL-Gridhttps://helpdesk.plgrid.pl or [email protected]

Contact for LifeScience domain [email protected]

Links, Contact, Partners