Comparative Assembly for Cancer Human Genome

18
Comparative Assembly for Cancer Human Genome Gao Song 2010/02/03

description

Comparative Assembly for Cancer Human Genome. Gao Song 2010/02/03. Content. Background Knowledge Problem Description Framework of Solution Own Methods Results. Background Knowledge. Pair End Tag (PET). Background Knowledge. Concordant PET (CPET) Discordant PET (DPET) - PowerPoint PPT Presentation

Transcript of Comparative Assembly for Cancer Human Genome

Page 1: Comparative Assembly for Cancer Human Genome

Comparative Assemblyfor

Cancer Human GenomeGao Song

2010/02/03

Page 2: Comparative Assembly for Cancer Human Genome

Background Knowledge Problem Description Framework of Solution Own Methods Results

Content

Page 3: Comparative Assembly for Cancer Human Genome

Pair End Tag (PET)

Background Knowledge

Page 4: Comparative Assembly for Cancer Human Genome

Concordant PET (CPET)

Discordant PET (DPET)◦ Distance or orientation is incorrect◦ Map to different chromosomes

DPET Cluster

Background Knowledge

Page 5: Comparative Assembly for Cancer Human Genome

Given:◦ Frequency of DPET and CPET along the reference

genome◦ DPET Cluster

Requirement:◦ Find rearrangement of cancer genome compare to

normal human genome◦ Now focus on Amplicons

Problem Description

Page 6: Comparative Assembly for Cancer Human Genome

The reference genome is cut when CPET is 0=> some big contigs

According to DPET, find the breakpoints Using CPET to check if there is connection

between breakpoints Convert DPET Cluster into edges in the

graph Using high copy edges to form subgraph of

amplicons

Framework of Solution

Page 7: Comparative Assembly for Cancer Human Genome

Framework of Solution

DPET Start and End Breakpoint

CPET

Filted BreakPoints

Original Contigs

Small Contigs

DPETReference Genome

Edges CPETNodes

Graph

Page 8: Comparative Assembly for Cancer Human Genome

DPET Frequency Curve Using DPET directly

choose a threshold to Select the breakpoint

Problem:◦ How to choose the threshold◦ Within amplicon region, it is hard to find the

breakpoint – basic frequency is too much

Own Methods-NaiveChromosome 9

Page 9: Comparative Assembly for Cancer Human Genome

Using slope(differentiation)

Problem:◦ How to define threshold◦ Too many false positive◦ Also miss some DPET cluster

Own Methods - Slope Chromosome 9

Page 10: Comparative Assembly for Cancer Human Genome

In breakpoint, DPET increases, CPET decreases

Can be used as another criteria Problem

◦ Another Parameter!

Own Method – Consider Ratio

Page 11: Comparative Assembly for Cancer Human Genome

Using slope to find the threshold The previous missing point can be found

New methods of finding breakpoint

Page 12: Comparative Assembly for Cancer Human Genome

Localize checking Using two consecutive windows

◦ Each window has: μ σ

◦ Null Hypothesis: σ2 is not significantly larger than σ1

◦ Using Binomial Testing:

Significance level: 0.05

Own Method – Hypothesis Testing

window1 window2

Page 13: Comparative Assembly for Cancer Human Genome

Some details:◦ Check if the cluster region is included in window

Not finished yet Calculating σ is time-consuming

- have to recalculate after each step

Own Method – Hypothesis Testing

Page 14: Comparative Assembly for Cancer Human Genome

Results(slope)

10k 20k# of subgraph 72 35

Max chromosome inOne subgraph

4 4

Average chromosomeIn one subgraph

1.18 1.23

Max edge inOne subgraph

42 44

Average edgeIn one subgraph

5.47 5.77

Page 15: Comparative Assembly for Cancer Human Genome

One Special Case

Page 16: Comparative Assembly for Cancer Human Genome

10k Lib

Page 17: Comparative Assembly for Cancer Human Genome

20k Lib

Page 18: Comparative Assembly for Cancer Human Genome

10k lib 20k lib

Another example