Jillian ms defense-4-14-14-ja-novideo

53
Bacterial Gene Neighborhood Inves5ga5on Environment: A Scalable Genome Visualiza5on for Big Displays Jillian Aurisano Master of Science Defense April 16, 2014

description

 

Transcript of Jillian ms defense-4-14-14-ja-novideo

Page 1: Jillian ms defense-4-14-14-ja-novideo

Bacterial  Gene  Neighborhood  Inves5ga5on  Environment:    A  

Scalable  Genome  Visualiza5on  for  Big  Displays  

Jillian  Aurisano  Master  of  Science  Defense  

April  16,  2014    

Page 2: Jillian ms defense-4-14-14-ja-novideo

Science  has  historically  looked  like  this:  

Page 3: Jillian ms defense-4-14-14-ja-novideo

Up  un5l  very  recently  

“Observa)ons!”  

Exper5se  

Explore  

Collect  samples,  Catch  errors  

Page 4: Jillian ms defense-4-14-14-ja-novideo

“No  one  looks  under  a  microscope  anymore.  Its  all  DNA.  ”  

How  do  scien)sts  make  discoveries?  

Page 5: Jillian ms defense-4-14-14-ja-novideo

How  do  we  bring  experts  into  the  loop?  

•  From  direct  collec5on  of  data,  direct  observa5on  of  results  direct  interpreta5on  and  analysis    

•  To  automated  data  collec5on,  automated  filtering  and  automated  analysis  

•  Need  visualiza5on  to  bring  experts  into  the  loop  

•  But  how  do  we  handle  big  data?  

•  What’s  our  Big  Data  microscope?    

“  Picard:    Computer;  scan  everything,  run  diagnos5cs,  and  tell  us  the  

answer.”  

“Computer:  Results  are  inconclusive”  

Page 6: Jillian ms defense-4-14-14-ja-novideo

Can  Big  Displays  help?  

•  Evidence  suggests  that  these  environments  can  have  a  posi5ve  impact  on  percep5on  and  cogni5on  

•  But  how  do  we  use  them  to  effec5vely  address  big  data  problems?  

•  Can  exis5ng  visualiza5ons  simply  be  ‘scaled-­‐up’  to  fit  or  are  new  approaches  needed?  

Page 7: Jillian ms defense-4-14-14-ja-novideo

In  this  thesis  I  will…  Examine  a  specific  big  data  visualiza5on  problem:  compara5ve  gene  neighborhood  analysis  in  bacterial  genomics  I  worked  closely  over  several  years  with  a  team  of  computa5onal  biologists    This  work  has  led  to  the  design  and  implementa5on  of  a  new  visualiza5on  approach  designed  to  scale  to  big  data  and  big  displays    

BactoGeNIE    (‘Bact(o)erial  Gene  Neighborhood  Inves5ga5on  

Environment’)      

Page 8: Jillian ms defense-4-14-14-ja-novideo

Outline  1)  Describe  compara5ve  bacterial  gene  

neighborhood  analysis  to  understand  how  to  bring  experts  into  the  loop  

2)  Examine  poten5al  impact  of  Big  Displays  on  Big  Data  visualiza5on    

3)  Evaluate  scalability  in  exis5ng  compara5ve  genomics  visualiza5ons  

My  work:  BactoGeNIE  4/5/6)    Describe  my  design,  implementa5on,  results  7)  Think  about  the  future  In  the  process,  learn  something  about  scaling  up  visual  approaches  to  big  data  and  big  displays  

Page 9: Jillian ms defense-4-14-14-ja-novideo

Warning:    Biology  is  used  in  this  thesis!  

Page 10: Jillian ms defense-4-14-14-ja-novideo

Genome  sequencing  boom  •  Sequencing  costs  decreasing  faster  than  Moore’s  Law  

•  So,  we  are  able  to    produce  massive  volumes  of  sequence  data  

•  Bacterial  genomes  are  small,  so  we  are  genera5ng  thousands  of  complete  bacterial  genome  sequences   Wejerstrand  K.A.,  DNA  Sequencing  Costs:  Data  from  the  NHGRI  Large-­‐

Scale  Genome  Sequencing  Program,  2012  <www.genome.gov/sequencingcosts>    

Page 11: Jillian ms defense-4-14-14-ja-novideo

What  is  a  genome?    What  is  a  gene?  

•  Genomes  consists  of  one  or  more  long  molecules  of  ‘DNA’  

•  DNA  consists  of  chained  nucleo5de  molecules  (A,  C,  T,  G)  also  called  ‘base  pairs’  

•  All  the  genes  in  an  organism  are  in  its  ‘genome’    

•  Genes  determine  traits  in  an  organism  

•  Genes  ‘code’  for  proteins,  and  proteins  do  the  work  to  make  traits  happen  

 

Page 12: Jillian ms defense-4-14-14-ja-novideo

How  are  genomes  sequenced?  •  Sequencing  •  Assembly  •  Annota5on    •  Output:  – Genome  feature  files  

– Raw  sequence  files  

Michael  Schatz    Cold  Spring  Harbor    

Page 13: Jillian ms defense-4-14-14-ja-novideo

Lots  of  genome  sequences-­‐>  opportunity  

Big  challenge:  Hard  to  figure  out  what  a  novel  gene  does  •  Tradi5onally:  do  wet-­‐lab  research  to  figure  out  –  but  expensive,  5me-­‐consuming  

•  Sequence  the  gene,  and  use  computa5onal  methods  to  predict  the  func5on  of  the  protein  –  If  novel  gene,  may  not  provide  answer    

•  Can  complete  genome  sequences  help?  •  Compara5ve  gene  neighborhood  analysis  

Page 14: Jillian ms defense-4-14-14-ja-novideo

From  genome  structure    to  gene-­‐product  func5on  

•  In  bacteria,  genes  whose  products  are  involved  in  similar  func5ons  onen  placed  close  to  each  other  in  the  genome.    

•  Research  suggests  that  it  is  possible  to  predict  gene-­‐product  func5on  in  bacteria  based  on  commonly  recurring  gene  neighbors    

•  But,  need  to  examine  lots  of  genomes  for  sta5s5cal  significance?  

gene1 gene2 gene3 gene4

Biological process

?

Page 15: Jillian ms defense-4-14-14-ja-novideo

Comparing  gene  neighborhoods  across  different  genomes  

•  Genes  with  similar  sequences  likely  produce  proteins  with  similar  func5ons  

•  Orthologs:  similar  genes  from  different  genomes  •  Algorithms  to  compare  genes  between  different  genomes  

DeMeo  et  al.  BMC  Molecular  Biology  2008  9:2      doi:10.1186/1471-­‐2199-­‐9-­‐2  

Page 16: Jillian ms defense-4-14-14-ja-novideo

Role  for  visualiza5on  in  this  problem  

•  Why  not  use  automated  methods  to  find  common  sets  of  genes  around  gene  targets?    

•  Why  visualiza5on?  •  3  E’s:  Explora5on,  Exper5se,  Errors  

Automated methods:Target: gene B

Common subsequences:Strains 1, 2, 3: {A, B, C, D}

Page 17: Jillian ms defense-4-14-14-ja-novideo

•  Pajerns  and  anomalies  without  knowing  in  advance  what  you  are  looking  for      

Explora5on  

Automated methods:Target: gene B

Common subsequences:Strains 1, 2, 3: {A, B, C, D}

Duplication

Strain 1

Strain 2

Strain 3

A B D

A

A

C

CC

D

D

B C

CBB

B

Truncation

Strain 1

Strain 2

Strain 3

A B C D

A

A B C

D

D

B C

Deletion

Strain 1

Strain 2

Strain 3

A B

C

D

A

A

C

D

D

B

B

Inversion

Strain 1

Strain 2

Strain 3

A B C D

A

A B C

D

D

CB

Page 18: Jillian ms defense-4-14-14-ja-novideo

Exper5se  

•  Experts  make  connec5ons  that  will  be  missed  by  automated  methods  – Not  just  the  anomaly,  but  significance  of  the  anomaly  – Knowledge  about  strains,  protein  families  involved  in  finding  significant  anomalies  

StrainA

StrainB

StrainC

!

Page 19: Jillian ms defense-4-14-14-ja-novideo

Errors  

•  Verify  automated  methods  

•  Uncertainty  and  errors  in  data  genera5on  

 

Data

Strain 1

Strain 2

Strain 3

Automated methods:

Common subsequences:Strains 1 and 3: {A, B, C, D}Strain 2: {A, D}

Ground truth

Strain 1

Strain 2

Strain 3

A B C D

A B C D

A

A B C

D

D

A

A B C

D

D

Data

Strain 1

Strain 2

Strain 3

Automated methods:

Common subsequences:Strains 1 and 3: {A, B, C, D}Strain 2: {A, B}

Ground truth

Strain 1

Strain 2

Strain 3

Strain 2

A B C

Breaks  in  assembly   Missed  gene  boundaries  

Page 20: Jillian ms defense-4-14-14-ja-novideo

To  address  this  problem:  

•  Visualiza5on  must  help  bring  experts  into  the  data  mining  loop  1)  Helps  experts  iden5fy  sources  of  error    2)  Allows  experts  explore  the  data    3)  Enable  researchers  to  integrate  exper(se  in  data  

analysis    So:  overview  visualiza5on  not  enough.      Need  gene-­‐neighborhood  details    

•  Visualiza5on  must  scale  to  enable  comparisons  between  hundreds  to  thousands  of  genomes    

Page 21: Jillian ms defense-4-14-14-ja-novideo

Big  displays:  Opportunity  for  big  data?  

•  The  ques5on  is:    can  these  environments  be  used  to  visualize  big  data  sets  bejer?  

•  Evidence  suggests  yes:  –  Physical  naviga5on  over  virtual  naviga5on    

•  Reduced  need  pan  and  zoom  •  Reduced  need  for  context  switching  •  U5lize  embodied  cogni5on  •  Mul5ple  levels-­‐of  detail  accessible  through  physical  movement  

–  Externalize  more  informa5on  that  can  be  accessed  simultaneously  

 

Lance  Long  

Page 22: Jillian ms defense-4-14-14-ja-novideo

Por5ng  from  small  to  big  displays  

•  Maybe  por5ng  genome  visualiza5ons  to  these  environments  is  sufficient?  

•  Ruddle2013:  –  Export  high-­‐resolu5on  graphical  output  from  exis5ng  genomics  visualiza5ons  

– Display  these  large  images  on  big  display  –  Evidence  that  this  had  a  posi5ve  impact  on  researcher  reasoning  

•  However,  effec5ve  visualiza5on  on  big  displays  involves  more  than  simply  scaling  up  the  representa5on  

Page 23: Jillian ms defense-4-14-14-ja-novideo

Pixel-­‐Density  Scalability  

•  As  pixel-­‐density  increases,  does  a  visual  approach  take  advantage  of  increased  pixels-­‐per-­‐inch  to  show  more  en55es,  rela5onships  or  to  show  data  at  higher  detail    

Evalua5on:  •  High-­‐Density  Representa5on?  •  use  increased  pixels  per  inch  to  show  more  en55es  and  

rela5onships  at  higher  detail?  

•  Simultaneous  detail  and  overview?  •  With  increased  pixel  density,  representa5on  shows  details  

and  overviews  at  the  same  5me,  without  relying  on  Focus+Context  

Page 24: Jillian ms defense-4-14-14-ja-novideo

Display-­‐Size  Scalability  

•  As  display  size  increases,  does  a  visual  approach  take  advantage  of  the  increased  space  to  depict  more  en55es  or  rela5onships?  

Evalua5on  •  Encode  big  data  spa5ally  •  Cluster  related  elements:  •  spa5al  memory    •  direct,  visual  comparisons    

•  Physical  naviga5on  over  virtual  naviga5on:  •  Overviews  at  a  distance,  details  up-­‐close  

 

 

Page 25: Jillian ms defense-4-14-14-ja-novideo

Perceptual  and  Analy5c  Task  Scalability  

•  Does  a  visual  approach  scale  up  to  enable  the  performance  of  an  analy5c  task  across  more  data,  more  space,  more  pixels.    

•  Does  percep5on  suffer  if  you  scale  the  approach  up?  

•  Analy5c  tasks  performed  pre-­‐ajen5vely    •  Analy5c  tasks  aided  by  visual  queries    •  Aids  to  visual  search  for  performing  analy5c  tasks    

Page 26: Jillian ms defense-4-14-14-ja-novideo

Examining  current  genomic  data  visualiza5ons  

•  Does  it  address  this  problem?  •  Show  gene  neighborhoods  •  Compara5ve  

•  Does  this  visualiza5on  allow  comparison  between  more  than  a  few  gene  neighborhoods?  

•  If  you  scale  the  visual  approach  up,  does  it:    •  Allow  more  comparisons  of  gene  neighborhoods  (Analy5c  

Task  Scalability)  •  Take  advantage  of  big  displays  in  size  and  pixel-­‐density  

(Display  Resolu5on  Scalability  and  Display  Size  Scalability)  •  In  the  process,  remain  sensible  to  a  human  viewer  

(Perceptual  scalability)    

Page 27: Jillian ms defense-4-14-14-ja-novideo

Line-­‐based  compara5ve  approaches  •  On  load,  align  1-­‐2  genes  to  a  chosen  gene  in  a  reference  genome  

•  Draw  a  line  or  a  band  to  connect  orthologs    

•  In  many  cases,  repurpose  genome  browsers  to  be  compara5ve  by  adding  compara5ve  track  

•  Tools:  PSAT,  GBrowse_syn,  SynView,  ACT,  CGAT,  Combo,  MizBee,  Mauve  

Pan,  X.  et  al.  (2005).  SynBrowse:  a  synteny  browser  for  compara5ve  sequence  analysis.  Bioinforma)cs  (Oxford,  England).  

McKay  et  al.  Using  the  Generic  Synteny  Browser  (GBrowse_syn).  Current  protocols  in  Bioinforma)cs    Hoboken,  NJ,  USA:  John  Wiley  &  Sons  

Page 28: Jillian ms defense-4-14-14-ja-novideo

Line-­‐based  approaches  expanded:    Mauve  

•  Like  parallel  coordinates  

•  Draw  lines  between  orthologs  

•  Color  genes  by  their  block  with  that  genome  (not  colored  by  orthology)  

•  Example  shows  9  genomes  

Darling,  Aaron  CE,  et  al.  "Mauve:  mul5ple  alignment  of  conserved  genomic  sequence  with  rearrangements."  Genome  research  14.7  (2004):  1394-­‐140  

Page 29: Jillian ms defense-4-14-14-ja-novideo

Line-­‐based  approaches:  Cri5que  •  Pixel-­‐density  scalable?  

–  Not  a  high-­‐density  representa5on  –  Need  space  for  the  ‘compara5ve  track’  

•  Display  size  scalable?  –  Hard  to  follow  lines  across  a  display  –  Hard  to  compare  similar  neighborhoods  

across  the  display  –  No  overview  from  a  distance,  details  up  

close  •  Perceptual  scalability  for  comparing  

gene  neighborhoods?  –  Lots  of  visual  clujer  –  Comparisons  not  pre-­‐ajen5ve  –  No  aid  to  visual  search  

•  Number  of  genomes  –  Published  up  to  9  –  Private  groups  have  adapted  frameworks  

for  10-­‐50  genomes  on  big  display  

Darling,  Aaron  CE,  et  al.  "Mauve:  mul5ple  alignment  of  conserved  genomic  sequence  with  rearrangements."  Genome  research  14.7  (2004):  1394-­‐140  

Page 30: Jillian ms defense-4-14-14-ja-novideo

PSAT:  Color  and  alignment  

•  PSAT  – Orthologs  encoded  using  color  

– Strand  on  which  gene  is  posi5oned  is  encoded  by  orienta5on  to  the  center  line  

– Text  is  given  by  default  

Fong,  Chris5ne,  et  al.  "PSAT:  a  web  tool  to  compare  genomic  neighborhoods  of  mul5ple  prokaryo5c  genomes."  BMC  bioinforma5cs  9.1  (2008):  170.  

Page 31: Jillian ms defense-4-14-14-ja-novideo

PSAT:  Cri5que  

•  Pixel-­‐Density  Scalability  – Not  high-­‐density  representa5on  because  of  text  labels  

•  Perceptual  scalability  for  comparing  gene  neighborhoods?  – Can’t  scale  to  large  number  of  genes-­‐  not  enough  colors  

 

Fong,  Chris5ne,  et  al.  "PSAT:  a  web  tool  to  compare  genomic  neighborhoods  of  mul5ple  prokaryo5c  genomes."  BMC  bioinforma5cs  9.1  (2008):  170.  

Page 32: Jillian ms defense-4-14-14-ja-novideo

GeneRiViT:  Alignment  and  color  

•  GeneRiViT  – Align  against  arbitrary  gene  

–  Color  by  presence/absence    

–  Examples  show  4  genomes  –  Cri5que:  

•  No  discussion  of  scalability  •  Overview  visualiza5on  •  Doesn’t  address  our  problem  

Price,  A.  et  al  "Gene-­‐RiViT:  A  visualiza5on  tool  for  compara5ve  analysis  of  gene  neighborhoods  in  prokaryotes."  Biological  Data  Visualiza5on  (BioVis),  2012  IEEE  Symposium  on.  IEEE,  2012.  

Page 33: Jillian ms defense-4-14-14-ja-novideo

Dot  plots  •  Coordinates  of  genes  in  two  genomes  are  used  as  x  and  y  axis  

•  Orthologous  genes  in  other  genomes  are  plojed  

•  Each  genome  given  a  unique  color  

•  Cri5que:  –  Doesn’t  provide  ‘gene-­‐neighborhood’  view  

–  Overview  tool  –  Hard  to  follow  beyond  a  few  genomes  

Price,  A.  et  al  "Gene-­‐RiViT:  A  visualiza5on  tool  for  compara5ve  analysis  of  gene  neighborhoods  in  prokaryotes."  Biological  Data  Visualiza5on  (BioVis),  2012  IEEE  Symposium  on.  IEEE,  2012.  

Page 34: Jillian ms defense-4-14-14-ja-novideo

Overview  Visualizaiton:  Sequence  Surveyor  

•  Not  this  domain  problem,  but  interes5ng  approach  

•  Each  gene  is  drawn  as  a  rectangle  

•  Several  possible  variables  for  posi5on:  Ordinal  posi5on  

•  Several  possible  variables  for  color:  –  Posi5on  in  one  reference  genome  

–  Use  a  color  ramp,  for  wide  range  of  colors  

Albers,D.  et  al  "Sequence  surveyor:  Leveraging  overview  for  scalable  genomic  alignment  visualiza5on."  Visualiza5on  and  Computer  Graphics,  IEEE  Transac5ons  on  17.12  (2011):  2392-­‐2401.  

Page 35: Jillian ms defense-4-14-14-ja-novideo

Overview  Visualizaiton:  Sequence  Surveyor  

•  Pixel-­‐density  scalable  – High-­‐density  representa5on  – High-­‐detail  representa5on  

•  Display  size  scalability  – May  be  difficult  to  compare  pajerns  from  one  side  of  display  to  another  

•  Perceptual  Scalability  –  Colors  allow  for  pre-­‐ajen5ve  iden5fica5on  of  pajerns  

– Avoids  visual  clujer    

Albers,D.  et  al  "Sequence  surveyor:  Leveraging  overview  for  scalable  genomic  alignment  visualiza5on."  Visualiza5on  and  Computer  Graphics,  IEEE  Transac5ons  on  17.12  (2011):  2392-­‐2401.  

Page 36: Jillian ms defense-4-14-14-ja-novideo

Copy  number  varia5ons  on  big  displays  

•  Orchestral:  –  Visualiza5on  of  a  different  data  type  –  Effec5ve  use  of  color  to  enable  pre-­‐ajen5vely  iden5fica5on  of  similari5es  across  genomes  

– High-­‐density  representa5on  – Details-­‐up-­‐close,  overview  from  a  distance  

Ruddle,  Roy  A.,  et  al.  "Leveraging  wall-­‐sized  high-­‐resolu5on  displays  for  compara5ve  genomics  analyses  of  copy  number  varia5on."  Biological  Data  Visualiza5on  (BioVis),  2013  IEEE  Symposium  on.  IEEE,  2013.  

Page 37: Jillian ms defense-4-14-14-ja-novideo

BactoGeNIE  Demo  

•  Video  at:hjps://www.youtube.com/watch?v=yrSyi1RWcUw  

Page 38: Jillian ms defense-4-14-14-ja-novideo

Program  details  •  Implemented  in  C++  using  Qt  and  the  QGraphicsView  

framework  •  Upload:      

–  genome  feature  files  –  Fasta  files  (raw  gene  sequences)  

•  Cd-­‐hit  algorithm  processes  sequence  files  to  compute  ortholog  ‘clusters’    

•  MySQL  database  to  store  big  datasets  –  Loads  1000  con5gs  into  memory,  rest  stored  in  database  

•  Op5mized  for  PubMed  datasets  •  Prototyped  on  E.Coli  dran  genomes  

–  Capable  of  displaying  any  con5gs  from  thousands  of  E.Coli  dran  genomes  

•  On  EVL  Cyber-­‐commons  wall,  around  400  con5gs  in  view  

Page 39: Jillian ms defense-4-14-14-ja-novideo

BactoGeNIE:  High  density  representa5on  

•  Compressed  genome  encoding  

•  No  text  labels,  instead  ‘on-­‐demand’  

•  No  ‘compara5ve  track’  •  Encode  orthology  using  

–  User  applied  color:  pre-­‐ajen5ve  orthology  iden5fica5on  

–  Coordinated  highligh5ng:  scalable    visual  query  

–  Alignment:  use  space  to  encode  similarity  

Page 40: Jillian ms defense-4-14-14-ja-novideo

Use  space  to  encode  similarity  •  Goals:  – Make  it  easier  to  perform  comparisons  across  many  genomes  (Analy5c  task  scalability)  

– Accommodate  increased  display  size  (Display  Size  Scalability)  

– Make  similari5es  and  differences  easy  to  see  (Perceptual  Scalability)  

•  Sor5ng  and  Alignment  –  Sort  by  con5g  length  –  Sort  by  gene  content  – Dynamically  align  against  any  gene  

 

Page 41: Jillian ms defense-4-14-14-ja-novideo

Interac5vity  •  On  hovering,  con5g  expands  in  height,  so  easier  to  select  genes  of  interest  in  high-­‐density  view  

•  ‘Pop-­‐up’  menu  for  each  gene  that  gives  info  and  allows  for:  –  applica5on  of  color:    

•  ‘tagging’  opera5on  •  Scalable  query  

–  “targe5ng”  opera5on  (described  next)  •  User  can  sort  genomes  by  :  – Gene  target  –  Con5g  length  

Page 42: Jillian ms defense-4-14-14-ja-novideo

‘Gene  Targe5ng’  Func5on  to  create  high  resolu5on,  compara5ve  ‘maps’  

•  User  selects  a  gene  of  interest  •  This  gene  is  given  a  base  color  •  Two  color  ramps  are  applied  to  adjacent  genes,  one  ‘upstream’  and  one  ‘downstream’  

•  Orthologous  genes  in  related  genomes  are  given  the  same  colors  

•  Con5gs  containing  this  gene  are  brought  to  the  top    

•  The  target  gene  is  centered  •  Orthologs  are  aligned  to  the  target  

Page 43: Jillian ms defense-4-14-14-ja-novideo

Gene  targe5ng  func5on  •  Clustering  to  promote  direct  comparisons  

•  Overviews  at  a  distance  

•  Details  up  close  •  Pre-­‐ajen5ve  iden5fica5on  of  similari5es  and  differences  between  gene  neighborhoods  

Lance  Long  

Page 44: Jillian ms defense-4-14-14-ja-novideo

Examples  

Page 45: Jillian ms defense-4-14-14-ja-novideo

Pixel-­‐density  Scalability  

BactoGeNIE  fits  the  pixel-­‐density  scalability  criteria:  High-­‐density  data  display,  iden5fier  display  and  orthology  encoding  

Page 46: Jillian ms defense-4-14-14-ja-novideo

Display  Size  Scalability  

•  BactoGeNIE  is  the  only  approach  to  use  clustering  and  show  mul5ple  levels  of  detail  

Page 47: Jillian ms defense-4-14-14-ja-novideo

Perceptual  Scalability  and  Analy5c  Tasks  

BactoGeNIE:  •  Similarity  is  pre-­‐ajen5vely  accessible  

•  Avoids  visual  clujer  

•  Visual  query  for  orthologs  

Page 48: Jillian ms defense-4-14-14-ja-novideo

Graphical  Scalability:      Display  Resolu5on  vs  Number  of  

Genomes  

0  

100  

200  

300  

400  

500  

600  

700  

800  

900  

1000  

480   720   1080   1440   2160   2880   3240   4320  

BactoGeNIE  

GeneRiViT  

SynBrowse  

SynView  

 PSAT  

Geco  

Mauve  

Pixels  

Genomes  

Page 49: Jillian ms defense-4-14-14-ja-novideo

Preliminary  User  Feedback  •  A  version  of  BactoGeNIE  used  by  computa5onal  biology  team  on  NxN  pixels  

and  MxM  inches  resolu5on  5led  display  wall      

•  “This  tool  has  been  widely  used  by  members  of  the  team  to  show  the  compara)ve  analyses  of  genomic  context  for  several  bacterial  genomes”  

•  “Genome  browsers  such  as  JBrowse  enable  researchers  to  do  compara)ve  genome  analyses  for  nearly  10-­‐50  genomes.  But  fail  to  work  when  we  are  studying  several  hundreds  of  genomes  of  interest.      

•  This  tool  is  really  unique  and  it’s  the  only  tool  that  I  am  aware  of  that  can  scale  up  to  any  number  of  genome  comparisons.  

•  The  ability  to  load  mul)ple  tracks  of  genomes,  and  the  zoom  in  and  out  op)ons  with  color  coding,  annota)on  tracks  makes  it  very  convenient  for  scien)sts  to  quickly  look  at  paXerns.    

•  This  tool  has  a  poten)al  to  serve  both  for  visualiza)on  as  well  as  data  mining  needs.”      

 Usage  of  a  version  without  the  gene  targe5ng  approach.  Future  study  will  concentrate  on  this  feature  with  a  wider  community  of  users    

Page 50: Jillian ms defense-4-14-14-ja-novideo

Summary  of  contribu5ons  •  A  novel  design  that  is  the  first  to  enable  direct  comparisons  between  hundreds  of  gene  neighborhoods  in  one  view  

•  First  interac5ve,  large-­‐scale  compara5ve  gene  neighborhood  approach,  with  on-­‐the-­‐fly  sor5ng,  dynamic  alignment,  user-­‐selected  color  and  color  ramps  

•  First  to  show  overviews  with  gene  neighborhood-­‐details,  that  can  be  accessed  through  physical  movement    

•  introduces  a  novel  visualiza5on  approach  ‘gene  targe5ng’  that  translates  genomic  data  into  high-­‐resolu5on  genomic  maps  

 

Page 51: Jillian ms defense-4-14-14-ja-novideo

What’s  next?  

Design  •  Integra5on  with  different  levels  of  detail  •  Mul5ple  color  ramps  •  Advanced  ordering  in  y,  based  on  similarity  to  target  or  strain  phylogeny  

Implementa5on  •  Scalability  in  rendering  using  paralleliza5on  on  the  GPU  •  Port  to  SAGE                                          Evalua5on  •  User  studies  and  evalua5ons  of  perceptual  scalability  

Page 52: Jillian ms defense-4-14-14-ja-novideo

Scalable  Design,  Big  Data,  Big  Displays  

•  Need  visualiza5on  to  provide  an  interface  between  automated  analysis  and  the  expert  

•  Por5ng  exis5ng  visual  approaches  to  big  data  and  big  displays  will  not  always  work  

•  Need  to  design  for  increased    – pixel-­‐density  – display  size    – volume  of  analy5cal  tasks  

 

Page 53: Jillian ms defense-4-14-14-ja-novideo

Thanks!  

•  Acknowledgements:    –  Jason  Leigh,  Andy  Johnson,  Khairi  Reda,  Lance  Long,  Uthman  Shabazz,  and  everyone  in  the  Electronic  Visualiza5on  Laboratory  

– Barry  Goldman,  David  Bush,  Niran  Iyer,  Shawn  Stricklin  and  the  rest  of  the  computa5onal  biology  team  at  Monsanto  

 

•  Ques5ons?