MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog,...

25
MPL The DNA Sequence of chim panzee chromosome 22 and comparative analysis wit h its human ortholog, ch romosome 21 Bioinformatics Dae-Soo Kim

Transcript of MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog,...

Page 1: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPL

The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, ch

romosome 21

Bioinformatics

Dae-Soo Kim

Page 2: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Comparative analysis of Human and chimpanzee genome

Human-chimp comparative genome research is essential for narrowing down the genetic change involved in the acquisitions of unique human features

We report the high quality DNA sequence of 33.3Mb of chimpanzee chromosome 22.

1.44% of the chromosome consisted of single base substitutions in addition to nearly 68,000 INDEL

83% of the 231 coding sequence show difference at the amino acid sequence level.

Page 3: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Introduction

Estimates of nucleotide substitution rates of aligned sequences were quite ranging from 1.23% by BAC end sequencing to about 2% by molecular analysis

Molecular analysis of HSA21 and its genes is of central medical interest because of trisomy 21, the most common genetic cause of metal retardation in the human population.

Page 4: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Mapping, sequencing and global view of chimpanzee chromosome 22

Genomic DNA origination from three male chimpanzee individuals.

Sequence coverage of the euchromatic potion of the long arm of chromosome 22 is 98.6%.

Accuracy was calculated as 99.99% from the overlap clone sequence

Page 5: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Overall differences

The overall structural features of PTR22 are almost the same as those of HSA21.

About a 400kb or 1.2% difference in size with HSA21 being larger then PTR22 (ISRs;53.7% and simple repeats;9.54%)

The pericentromeric copy of a 200kb region found duplicated in HSA21 is missing in PTR22

We also detected apparently human specific sequences (first intron PFKL of HSA21a)

Page 6: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Two large indel hot spots werw found around 9.5~11.5Mb and 16.5~17.5Mb from the centromere

We found large human insertion/chimpanzee deletions in the first introns of the NCAM2(~10kb)and GRIK1(~4kb) (Neural functions)

One of the largest structural changes identified here is a 54kb region located at 11.4Mb from the centromere in HSA21 but absent in PTR22.(flanked by HSAT5 satellite repeat and consists of 164 fragments from 64 different LTR)

Page 7: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Size (bp)*1

Unaligned sites*2 25,242 101,709

# of sequencing gaps 14

# of clone gaps*3 3

Estimated total clone gap size 73,108

G+C% 40.94%

CG dinucleotide 361,259

CpG islands 950

Nucleotide diversity 0.072% 0.14%

Repeats bp # bp #

SINEs 3,649,153 15,137 3,614,825 15,048

Young Alus *4 21,557 75 2,606 10

LINEs 5,853,821 8,737 5,736,911 8673

Young L1s *5 82,493 48 78,657 55

LTRs 3,621,501 7,282 3,550,807 7,180

Transposons 949,215 3,363 945,129 3,350

RNAs*6 8,830 100 8,722 99

Satellite 19,327 21 14,773 18

Others 30,452 38 34,776 43

Total 14,132,299 34,678 13,905,943 34,411

42.7% 42.4%

*1 Size of the contig data after the site where the first base of the PTR22q contig is aligned

*2 Regions extended into HSA21q clone gaps and subtelomeric unmatched regions

*3 Excluding pericentromeric and subtelomeric gaps

*4 AluYa5, AluYa8, AluYb8 and AluYb9

*5 L1HS and L1PA2*6 snRNA, scRNA, 5S rRNA, tRNA, 7SL RNA and other small RNA genes

358,450

885

HSA21q

33,127,944

2

41.01%

PTR22q

32,799,845

22

74,311

Page 8: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Base substitutions

The overall nucleotide substitution level in aligned regions between PTR22 and HSA21 is about 1.44%(excluding INDEL)

The most conserved region was around 12.5Mb corresponding to the distal boundary region of the gene desert.

Page 9: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Page 10: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Repetitive elements

HSA21 is about 1.2% longer in size than PTR22 Five LTR subfamilies LTR are more abundant

in HSA21 All MER4A1-int and MER83B-int elements are

specific to HSA21 All of the seven AluYb9’s found in HSA21 and

the one in PTR22 are lineage specific Although the AluYa8 subfamily is though to be

a recent derivative of AluYa5

Page 11: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Lineage specific insertions and deletions

We identified about 68,000 INDEL is total Greater than 99% of the INDELs were shorter tha

n 300bp These site should be produced either through h-in

s/p-dels or p-ins/h-dels We tested 567 INDEL larger than 300bp in size u

sing DNA samples from 5 human ,5chimpanzee ,1 gorilla, 2 orangutan

Insertions being mostly produced by the integration of Alu and L1 elements

Page 12: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

0

50

100

150

200

250

300

350

400

50 100 150 200 250 300 350 400 450 500

Page 13: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Lineage specific insertion

Lineage specific deletion

Page 14: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

0

10

20

30

40

50

60

70

2.4 2.6 2.8 3 3.2 3.4 3.6

HSA21q insertion

PTR22q insertion

HSA21q deletion

PTR22q deletion

251 398 631 1000 1585 2512 3981

Page 15: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Deletions not being related to particular repetitive structures except for a few cases.

We found that most of the insertions 300-350bp in length were members of AluY family in both chromosome

Between 370-1000bp only a smaller number of insertions mostly L1 and LTR

We observed that the distribution of newly integrated Alu are quit different between HSA21 and PTR 22 (HSA21; 56% high G+C ,PTR22;70% low G+C)

Page 16: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Unlike the insertion, deletions do not exactly correspond to any ISR elements, indicating that deletion events are independent of ISRs.

The deletion of these elements may have also been generated by homologous recombination between these relatively short identical or similar flanking segments.

HSA21 gained 32kb but lost 39kb while PTR22 gained 25kb and lost 53kb(INDEL 300~5000bp)

PTR 22 has suffered more losses than HSA21 since speciation.

Page 17: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

A neighbor joining analysis show that such AluY elements can be largely separated into chimp and human groups as expected(AluY was inserted after speciation)

Humans seem to have experienced such expansions more frequently and more recently than chimp

Page 18: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

HSA21 120.AluY PTR22 033.AluY PTR22 097.AluY PTR22 075.AluY

PTR22 063.AluY PTR22 140.AluY

PTR22 058.AluY PTR22 153.AluY

PTR22 069.AluY PTR22 147.AluY

PTR22 096.AluY PTR22 010.AluY

HSA21 211.AluY PTR22 192.AluY

HSA21 172.AluY HSA21 197.AluYa5

HSA21 121.AluYa5 HSA21 045.AluYa5

HSA21 216.AluYa5 HSA21 017.AluYa5

HSA21 131.AluYa5 HSA21 166.AluYa5

PTR22 098.AluY HSA21 215.AluY

HSA21 201.AluY HSA21 148.AluY HSA21 188.AluY

HSA21 132.AluY HSA21 106.AluY

HSA21 208.AluY HSA21 218.AluYb8

HSA21 018.AluYb8 HSA21 034.AluYb8

HSA21 174.AluYb9 HSA21 135.AluYb9

HSA21 020.AluYb8 HSA21 036.AluYb8

HSA21 025.AluYb8 HSA21 187.AluYb8

HSA21 206.AluYb8 HSA21 076.AluYb8

HSA21 013.AluYb8 HSA21 168.AluYb8

HSA21 244.AluYb8 HSA21 213.AluY

PTR22 082.AluY HSA21 153.AluY

96

60

96

83

52

83

54

85

65

64

54

75

69

54

58

0.01

Page 19: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Gene catalogue and structural characterization of coding sequences

We have annotated 284 protein coding genes and 98 pseudogenes for HSA21 and 272 genes and 89 pseudogenes for PTR22

All the conserved pseudogenes showed the same size except for KRTAP21P1 which is non processed in HSA21 but processed in PTR22

Six HSA21 genes showing hallmarks of retrogenes were not found in PTR 22 and are likely to have inserted during human evolution (H2BFS;histon family S,5 keratin associated protein)

The minimum nucleotide sequence identity is 83%(KRTAP6-3) and the maximum is 100%

We compared the human and chimp coding sequences in 231 genes (omitted 41)

Page 20: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Among the 231 genes associated to a canonical ORF 179 show a coding sequence of identical length in human and chimpanzee and exhibit similar intron-exon boundaries

39genes shown an identical amino acid and nucleotide sequence between human and chimp (biological process 5, metabolic enzymes 5, signal transduction 8, protein folding 2)

One hundred and forty out of these 179 genes show amino acid replacements but no gross structural changes and expected.

Page 21: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Ka/Ks analysis

10% of the genes had Ka/Ks rations >1 with the highest value being 3.37 for the human hair keratine associated protein

Relatively rapidly evolving genes may be estimated from Ka, Ka+Ks or just nucleotide divergence values. (3 KRTAP gene, KCNE1; potassium channel protein ,TCP10L;complex protein, B3GALT5;galctocyltransferase,IGSF5;immunoglobulin)

Page 22: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Promoter analysis

Computation analysis of the transcription factor binding site within the l-kb upstream region of each gene.

All of the specific TFBSs were caused by base substitution in either human or chimpanzee

These may mot clearly account for the expression changes observed in this study

Page 23: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Red: TF binding sites found only in human

Blue: TF binding sites found only in chimpanzee

Yellow: TF binding sites common in huamn, chimpanzee and mouse

Grey: TF binding sites common in human and mouse.

Position 1 locates 1000 bases upstream from the coding sequence of gene

Page 24: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Page 25: MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.

MPLBIOINFORMATICS

Conclusion

This study shows for the first time a chromosome wide comparison between human and chimpanzee using high quality sequence.