Salmon Genome Project (SGP) Resourcesgenomics.aquaculture-europe.org/.../Aquafunc/doc/SGP.pdf ·...

34
Salmon Genome Project (SGP) Resources Bjørn Høyheim Norwegian School of Veterinary Science Department of Basic Sciences and Aquatic Medicine

Transcript of Salmon Genome Project (SGP) Resourcesgenomics.aquaculture-europe.org/.../Aquafunc/doc/SGP.pdf ·...

Salmon Genome Project (SGP)Resources

Bjørn HøyheimNorwegian School of Veterinary ScienceDepartment of Basic Sciences and Aquatic Medicine

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

www.salmongenome.no

SGP resourcesSummary

Genetic map for Atlantic salmon- 450 markers on the consensus map, Average 1 marker pr 2-3 cM- A bank of approx. 1200 microsatellites

BAC library- A well characterized library- 18 fold genome coverage- Screened for over 200 genes/markers resulting in more than 4000 BACs identified

cDNA libraries- 23 libraries consisting of 160 000 gridded clones- 68 000 sequences from these

- Trimmed and clustered- 30% annotated

www.salmongenome.no

SGP resourcesSummary

A bioinformatics infrastructure- Database for sequences- Database for genetic maps- Pipelines for sequence manipulations- Pipelines for clustering/annotation- BLAST search locally

Microarray chip being developed- 17 500 genes in duplicate (2005)

www.salmongenome.no

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

www.salmongenome.no

Atlantic salmonGenetic maps

www.salmongenome.noMarkers with descriptionsGraphical view

Genotype database

Atlantic salmonGenetic maps

www.salmongenome.noMarkers with descriptionsGraphical view

Genotype database

Atlantic salmonGenetic markers

Atlantic salmonGenetic markers

Atlantic salmonGenetic markers

Atlantic salmonGenetic markers

MapView

All markers“clickable”

Markerview

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

www.salmongenome.no

BAC library

BAC library construction:

- Constructed by Jim Thorsen- Collaboration with Pieter de Jong (USA) and the Canadian Genomic Research on Atlantic salmon Project (GRASP)

- 313 000 clones picked and gridded average insert size 190kb CHORI-214 (~ 18 x coverage)

- High density filters

- Copies in Norway and Canada- Main distribution Dr. Pieter de Jong bacpac.chori.org

Size distribution of insert

Thorsen et al 2005 BMC Genomics 6:50

Fingerprinting of BAC DNA

96 samples on one gel, separatedby marker lanes every five lanes.

- Most of the fingerprinting done in Canada - Genome Sciences Centre (BCCA)

- 186 000 clones HindIII fingerprinted- 37 285 BACs as singletons- 4 354 contigs

Ng et al 2005 Genomics 86: 396-404

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

www.salmongenome.no

Sequencing pipeline

SequencingVerifying results / “cleaning up”

- removing bad sequenceClustering

- how many are from the same gene?BLAST/Annotation

- Which genes do we have?

cDNA-libraries

- Developed tissue and developmental specific libraries 15 tissues from pre- and post-smolt

λ A B C D E F G H I J λ

Tissues- Spleen- Kidney Headkidney Kidney - Heart- Brain- Swimbladder- Skin- Muscle White Red- Liver- Ovaries- Testes- Eye- Gills- Intestine

- All directionally cloned- Picked and gridded primary clones- Approx. 11 500 clones from each tissue

Sequencing

Sequenced approx. 68 000 ESTs from pre-smolt-- Primarily 5’ sequencesSequenced over 1 100 full length cDNAsDeveloped a “preAssemble” pipeline for automaticsequence processing

- A chance to take out bad quality sequences prior to clustering

Automatic Sequence processing

Assembly

Clustering

Clustered 56 000 sequences- 6 203 contigs- 13 220 singlets- 19 423 total

Currently clustering and annotating all salmon ESTs available- 189 000 (will be 300 000 in near future)

Also doing tissue specific clustering

Scripts for pipelines to automatic clustering, BLASTagainst pdb, swissprot, nr and nt and annotation based onGO with outputs in html format

SNP identification pipelineIdentification of putative SNPs by running Phred – Phrap –Poly Bayes software on the data produced by sequencingmachines (raw sequence data). This data is stored in the SGPdatabase.Results produced by the Pre-assembly sequence processingpipeline are used to create a “good” data subset

Annotation

Automatic annotation based on Gene Ontology (GO) using swissprot

Organised according to:- Molecular function- Biological process- Cellular component

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

www.salmongenome.no

SGP computational resources

Sun serverDisk space 420 GB

SGPDatabase

Oracle

SGP web server

HP Super DomeSGP: 4 processors

Disk space 10x73 GB

EMBL database

GenBank database

Batch jobs

data transferLinux

workstation

Linux workstation

Windows workstation

HP Itanium server4 CPU

Disk space 2500 GB

BLAST and pipelines

SGPweb resources

Web siteDatabase (Oracle)Query system andinterfaceData loadingSequenceprocessing andanalysisUser accessProject InformationBLAST service

www.salmongenome.no

SGP Blast servicePublic blastInternal blast

Up to 100 sequencesUp to 36 hours on the HPSuper Dome supercomputerResults can be received byemailResults can be saved asHTMLRun blast directly from thedatabase query results page

Blast service

Jon K. Laerdahl

www.notur.org

www.biotek.uio.nowww.salmongenome.no

All EMBL databases

All NCBIpolynucleotidedatabases

All NCBIpolypeptidedatabases

Local SGP database(Currently only ”PrivateBLAST”)

Salmon Genome Project (SGP)

- Genetic map

- BAC library/physical map

- EST-sequencing

- Expression profiling

- QTL studies

- Genomic sequencing

- Bioinformatics

MicroarrayAtlantic salmon

Progress:- 3 700 gene preliminary Canadian chip- Tested in Canada and Norway

Tested for:- Cross-species- Various disease challenged fish- Smoltification- Environmental issues (polluted rivers)

Canadian cDNA chip- Atlantic salmon and rainbow trout sequences

Next generation:- 16 000 gene chip (Canadian)- June 2004

A second chip has been developed as acollaboration between the UK SalmonTraits Project and SGP- 17 500 genes in duplicate

AcknowledgementSGP:Bjørn HøyheimHeidi Hagen-LarsenJim ThorsenSigbjørn LunnerOle Albert Guttersrud Kristin VekterudJill AndersenAgate HansenKatrine Hånes

Alexei AdzhubeiAnna SadymJon Lærdahl

Rune MaleSigve Ladstein

Hee-Chan SeoSutada Mungpakdee

Knut Jørstad

Inge JonassenBjarte DysvikKetil Malde

Finn DrabløsHeather Wieman

GRASP:William S. DavidsonBen F. Koop

SALMAP:Bjørn HøyheimMonica SkogenSven Martin Jørgensen

John TaggartMargaret Cairney

Lars-Erik Holm

Richard PowellGavin Cloherty

René GuyomardKarim Gharbi

Roy DanzmannMoira Fergusson

Nobuaki OkamotoTakashi Sakamoto

SALGENE:Richard PowellGrace Davey

John TaggartMargaret Cairney

Bjørn HøyheimHeidi Hagen-LarsenAgate Hansen

Christian BendixenLars-Erik HolmFrank PanitzCHORI:

Pieter de JongDr. Kazutoyo Osoegawa Dr. Baoli ZhuGery Vessere

Former coworkers

TRAITS (UK)Alan TealeJohn TaggartChris SecombesSam MartinGlen SweeneyPaul DearUnited Gene InstituteShanghaiYing KangCo-workers