Galaxy History: Genome Informatics 2008

  • Galaxy James Taylor, Emory University
  • Galaxy?
  • Galaxy goals Making large-scale computational analysis more accessible Facilitating transparent analysis Ensuring that analyses are reproducible
  • What Galaxy provides An open-source framework for integrating various computational tools and databases into a cohesive workspace A web-based service we provide, integrating many popular tools and resources for comparative genomics A completely self-contained application for building your own Galaxy style sites
  • So, what about all this data?
  • Tool suites
  • What is a Galaxy Tool? The basic unit of analysis in Galaxy A program, script, external web resource, whatever... Adapted to a standard structured interface Parameters, data inputs, data outputs
  • Short read sequence analysis Analyzing read quality and filtering Genomic analysis Mapping against assembled genomes Coverage, polymorphism, ... Metagenomic analysis Mapping against sequence databases Taxonomy analysis, visualization, ...
  • Statistical Genetics Quality control and filtering Estimating ancestry and correction Case control analysis ...
  • Data and analysis management
  • The Galaxy History
  • Beyond the history
  • Beyond the History I Workflows
  • Galaxy workflows Abstract description of an analysis procedure Essentially: what tools to run, and the flow of data between tools
  • Beyond the History II Data Libraries
  • Galaxy Data Libraries Mechanism for storing and organizing shared datasets in a Galaxy instance An instance can have many libraries, each containing datasets organized using folders as well as tags Full type specific metadata like any other dataset in Galaxy
  • Driving use cases Large shared datasets Genotype data Sequencing reads Direct from the instrument! Data management for distributed projects
  • What about protected data?
  • Galaxy dataset security Fine grained access controls for Galaxy datasets Dierent actions on datasets require dierent permissions Users and groups are granted these permissions Enforced throughout Galaxy e.g. a History can still be shared, but access to individual datasets in the history is controlled
  • Security customization Authentication mechanism can be replaced, or can leverage a single sign-on mechanism (e.g. through a proxying web server) Authorization provider can be customized or replaced
  • Completely integrated with analysis Dataset restrictions propagate through an analysis Analyses that combine datasets also combine their restrictions
  • Up next... Libraries: sequencer integration versioning tagging and annotation automatic workflow triggering Security configurable adapters to dierent authorization providers (e.g. directory services)
  • Acknowledgements Data and browser connections UCSC Biomart GMOD Intermine Funding National Science Foundation Huck Institutes, Pennsylvania Dept. of Health
  • The Galaxy Team Guru Ananda | Penn State Dan Blankenberg | Penn State Wen-Yu Chung | Penn State Nate Coraor | Penn State Greg Von Kuster | Penn State Sergei Kosakovsky | UCSD Ross Lazarus | Harvard MS Anton Nekrutenko | Penn State
