An introduction to bioinformatics for glycomics...

114
1 An introduction to bioinformatics for glycomics research Kiyoko F. Aoki-Kinoshita Soka University, Japan

Transcript of An introduction to bioinformatics for glycomics...

1

An

in

tro

du

cti

on

to

bio

info

rmati

cs f

or

gly

co

mic

s

rese

arc

h

Kiyoko F. Aoki-Kinoshita

Soka University, Japan

2

Introduction

Biological role of carbohydrates as

inform

ation containingmolecules

3

Genomes

Transcriptome

Proteome

Glycosyltransferases

Glycosidases

Carbohydrate-m

odifying

enzym

es

Lectins, CBPs,

growth factors

Post-

Genomics

> 6600 ORF

>7200 ORF

> ??? ORF

> ??? ORF

Glyco-

sciences

Glycome

Gly

co

syla

tio

n b

y

gen

ep

rod

ucts

Gen

e p

rod

ucts

,in

tera

cti

on

s

wit

hcarb

oh

yd

rate

en

vir

on

men

t

Glycoproteins

N-Glycans

O-Glycans

GPI-Anchor

Glycosaminoglycans

Proteoglycans

Microbial

Polysaccharides

Diseases

Fundamental biol.

processes

Diagnostic

Tools

Treatments

Aging

Braindevelopment

EmbryonicDevelopm

ent

Fertilization

Infections, AIDS,

Malaria, Tuberculosis,

Cancer/Metastasis,

CDGs,Allergy

Genetic

Tests,

MicrobialA

ntigens,

AllergyMarkers,

CancerMarkers

Synthetic

Vaccines,

Immunotherapy,

Anti-inflammantory

Drugs, newAntibiotics

Genomics

Glycolipids

CDG=Congenital D

isorder of Glycosylation

CBP=Carbohydrate Binding Protein

4

Introduction

Nomenclature of C

arbohydrates

5

Gly

cera

lald

eh

yd

e,

the

sim

ple

st

Ald

ose

, co

nta

ins o

ne

ch

ira

lca

rbo

n a

tom

carr

yin

g f

ou

r

dif

fere

nt

su

bs

titu

en

tsan

d h

as t

here

fore

tw

o d

iffe

ren

t e

nan

tio

mers

.

6

7

So

me

co

mm

on

an

d b

iolo

gic

all

y im

po

rta

nt

mo

no

sa

cc

hari

de

s

Glc Gal

Xyl

Man

Fuc

Fru

GlcNAc

Neu5NAc

GalNAc

GalA

IdoA

Rib

8

Re

fere

nce

Da

tab

as

e o

f M

on

os

ach

ari

de

s

htt

p:/

/ww

w.m

on

osa

cc

ha

rid

ed

b.o

rg

9

Oligosaccharide description

�Tree structuresof

monosaccharides and

linkages

�Nodes =

sugars/monosaccharides

�Edges = bonds/linkages

Root node

α ααα6

β βββ4

α ααα3

β βββ4

β βββ2 β βββ

2

β βββ4

β βββ4

β βββ4

β βββ4

10

Introduction

Glyco-related pathways

11

Overv

iew

of

gly

can

bio

syn

the

tic

path

wa

ys

.

Hu

ds

on

H F

ree

ze

Genetic defects in

the hum

an glycome.

Nat Rev Genet.

2006 Jul;7(7):537-51

12

Gly

can

cla

sse

s:

fun

cti

on

s a

nd

bio

syn

the

sis

Hu

ds

on

H F

ree

ze

Genetic defects in

the hum

an glycome.

Nat Rev Genet.

2006 Jul;7(7):537-51

13

N-l

ink

ed

gly

ca

n b

ios

yn

the

tic p

ath

way

Hu

dso

n H

Fre

eze

Genetic defects in

the hum

an glycome.

Nat Rev Genet.

2006 Jul;7(7):537-51

Steps in the pathway at which genetic

disorders occur are indicated, with the

associated genes underneath, as are steps at

which ananimal m

odel is available. MPDU1

encodes a protein that enables the utilization of

dolichol-P-m

annose and dolichol-P glucose,

but does not catalyse the reactions.

14

Human diseases caused by genetic

defects in N-glycosylation pathways

�Congenital disorders of glycosylation

(19 distinct genes)

•Mental retardation, seizures, epilepsy,...

�Mucolipidosis I & II

•Coarsening features, organomegaly, joint stiffness, ...

�Congenital dyserythropoietic anaemia (CDA II)

•Anaemia, jaundice, splenomegaly, gall bladder

disease

15

O-m

an

no

se a

nd

O-x

ylo

se b

iosyn

theti

c p

ath

ways

NS, 2S, 3S, 4S

and 6S represent

2-N-, 2-O

-, 3-O-,

4-O

-and 6-O

-sulphate, in that

order.

16

Hu

man

dis

eases c

au

sed

by g

en

eti

c

defe

cts

in

O-g

lyco

syla

tio

n p

ath

ways

�Walker-Warburg syndrome

�Fukuyama m

uscular dystrophy

�Ehlers-Danlos syndrome

�Chondrodysplasias

�Macular corneal dystrophy

�Tn syndrome

�others

Hu

man

dis

eases c

au

sed

by g

en

eti

c

defe

cts

in

gly

co

lip

id s

yn

thesis

�Paroxysomal nocturnal haemoglobinuria

�Amish infantile epilepsy

17

Ca

lne

xin

an

d c

alr

eti

cu

lin

are

re

late

d p

rote

ins

th

at

co

mp

ris

e a

n E

R

ch

ap

ero

ne

sys

tem

th

at

en

su

res

th

e p

rop

er

fold

ing

an

d q

ua

lity

co

ntr

ol

of

new

ly s

yn

the

siz

ed

gly

co

pro

tein

s.

Williams DB

,Beyond lectins: the calnexin/calreticulin

chaperone system of the endoplasm

ic reticulum.

J Cell Sci. 2006 Feb 15;119(Pt 4):615-23.

18

Glyco-databases and

data form

ats

19

Carbohydrate Structure

Databases

�CarbBank

�SWEET-DB / glycosciences.de

�KEGG GLYCAN

�Consortium for Functional G

lycomics

�BCSDB

�EuroCarbDB

�Commercial databases:

•GlycoSuite

(Proteome Systems, Ltd.)

•Glycomics DB (Glycominds, Ltd.)

20

CarbBank

�Developed by Complex Carbohydrate

Research Center, University of G

eorgia

�Community database of carbohydrates

�Project ended due to lack of funding in

1996

21

GLYCOSCIENCES.de DB

�http://www.glycosciences.de

�Combines CarbB

ank and Sugabase

using

a common web-based interface

�Provides searching by bibliography,

structure, NMR and MS, as well as by

LINUCS ID

22

SWEET-DB (2)

23

SWEET-DB (2)

24

SWEET-DB (3)

25

KEGG GLYCAN

�http://www.genome.jp/kegg/glycan/

�Based on CarbBank as well as input from

scientists

�All data is linked with KEGG’sother

resources: GENES, PATHWAY, KO and

literary databases

�Several tools for analysis available

26

KEGG Glycan Page

�http://www.genome.jp/kegg/glycan/

27

28

CSM

29

KEGG Glycan Page

�http://www.genome.jp/kegg/glycan/

30

31

32

KEGG’sGlycan Biosynthesis and

Metabolism Pathways

N-Glycan biosynthesis

High-m

annose type N-glycan

biosynthesis

N-Glycan degradation

O-G

lycan biosynthesis

Chondroitin/ heparan

sulfate

biosynthesis

Keratansulfate biosynthesis

Glycosaminoglycandegradation

Lipopolysaccharidebiosynthesis

Peptidoglycanbiosynthesis

Glycosylphosphatidylinositol(GPI)

-anchor biosynthesis

Glycosphingolipidmetabolism

Blood group glycolipid

biosynthesis -lactoseries

Blood group glycolipid

biosynthesis -neo-lactoseries

Globosidemetabolism

Gangliosidebiosynthesis

Glycan structures -biosynthesis 1

Glycan structures -biosynthesis 2

Glycan structures -degradation

33

34

KEGG Glycan Structure

Map

35

36

G00078

37

“View Structure”

38

39

40

41

42

Consortium for Functional

Glycomics (CFG)

�Consortium home page:

http://www.functionalglycomics.org/

�Consortium of major universities and research

institutes worldwide

�Aim: to provide a central resource for glycomics

research

�Also provides requested resources to promote

participating investigators’research

•Glycan arrays and data

•Mass spectra analysis…

�CFG glycan database

43

CFG (2)

44

CFG (3)

45

CFG (4)

46

CFG (5)

47

CFG (6)

48

CFG (7)

49

CFG (7)

50

BCSDB: Bacterial C

arbohydrate Structure DataBase

http://www.glyco.ac.ru/bcsdb/start.shtm

l

�Provides structural, bibliographic, taxonomic and related

inform

ation on bacterial carbohydrate structures.

�Data based on Carbbank and m

anual data posting (structures

published after 1995, approx. 3000 records).

�>

95

% c

ov

era

geof the scope of bacterial carbohydrates.

•Bacterial= structure has been found in bacteria or obtained by

modification of those found in bacteria.

•Carbohydrate= structure composed of any residues linked by glycosidic,

ester, amidic, ketal, phospho-or sulpho-diester bonds, in which at least

one residue is a sugar or its derivative.

�Each record includes structure, bibliography, abstract,

keyw

ords, biological source, methods used to elucidate the

structure, bioactivity, NMR assignment tables, etc.

�Search by IDs, bibliographic data and keyw

ords, biological

source, the fragment of structure and NMR data.

�Data cross-linked with GlycoSCIENCES.DB

51

BCSDB

52

53

54

55

56

57

EuroCarbDB –Design Study

�http://www.eurocarbdb.org/

�Based in Europe, but participants from

universities and research groups

worldwide

�Distributed infrastructure to integrate

multiple resources with a single interface

58

Data Modeling

�Foremost issue in handling glycan

structures for comparison and analysis

�A few models/form

ats currently available:

•LINUCS

•KCF

•Linear Code©

•GLYDE(XML)

•GlycoCT

59

Glycome inform

atics

�Glycome: the repertoire of glycans in a cell,

tissue, or organism

�Glycome inform

atics: Algorithms, methods

and computational m

odels for the study of

the glycome

60

Current glycome inform

atics

�Glycomics:

•Automatedmass spectrometry annotation

�Computer-theoretic algorithms for tree

alignments

�Probabilistic models (mining) for patterns

in glycans

�Kernel m

ethods for glycan classification

61

GlycomicsTechniques

�Mass spectrometry of glycoproteins:

prediction/annotation

•Mizuno et al., Anal. Chem, 1999

•GlycoMod(Cooper et al, Proteomics, 2001)

•STAT (Gaucheret al, Anal. Chem., 2000)

•StrOligo(M

. Ethieret al, Methods Mol B

iol., 2006)

•Cartoonist (D. Goldberg et al, Proteomics, 2005)

•Glyco-Peakfinder (K. Maas, R. Ranzinger et al,

Proteomics, 2007)

•GlycoWorkbench

(A. C

eorni et al., 2007)

•GLYCH(H. Tang et al, Bioinform

atics, 2005)

62

Automated Annotation of Mass

Spectrometry Data

63

GlycoMod

�http://www.expasy.ch/tools/glycomod/

�Predicts the possible oligosaccharide

structures that occur on proteins from their

experimentally determ

ined masses.

�Can be used for free or derivatized

oligosaccharides and for glycopeptides

64

Ex

pe

rim

en

tal

wo

rkfl

ow

fo

r (s

em

i-)a

uto

ma

tic

de

term

ina

tio

n o

f

gly

ca

n s

tru

ctu

res

fro

m r

aw

da

ta t

o f

ull

y a

ss

ign

ed

sp

ec

tru

m v

ia

co

mp

os

itio

n a

na

lys

is (

Gly

co

Pe

ak

Fin

de

r) a

nd

fra

gm

en

t m

atc

hin

g

(Gly

co

Wo

rkb

en

ch

).

65

No

me

nc

latu

re o

f M

S f

rag

me

nts

of

ca

rbo

hyd

rate

s a

s d

efi

ne

d b

y

Do

mo

na

nd

Co

ste

llo

66

GlycoWorkbench

MS: Annotation of fragments

http://www.eurocarbdb.org/

applications

67

GlycoWorkbench

MS: Annotation of fragments

http://www.eurocarbdb.org/

applications

68

Current glycome inform

atics

�Automated mass spectrom

etry annotation

�Computer-theoretic algorithms for tree

alignments

�Probabilistic models (mining) for patterns

in glycans

�Kernel m

ethods for glycan classification

69

Computer Theoretic Techniques

�KCaM: K.F. Aoki et al, NAR, 2004

�Score matrix for glycan linkages, K.F. Aoki et al,

Bioinform

atics, 2005

�Least common supertreeapproximation

algorithm for reconstructing glycans from

spectral data, K.F. Aoki-Kinoshita et al, ISAAC

2006

70

Glycan structure comparison

�Calculating glycan “similarity”

•Efficiency

•Biologically meaningful

�Data mining techniques

�Prediction:

•In laym

an’s term

s: determ

ining whether or not

a given glycan belongs to a particular class

71

Glycan structure comparison:

KCaM

�KEGG Carbohydrate Matcher

�Glycan alignment tool for KEGG GLYCAN

�Maximum

Common Subtreealgorithm

�Dynamicprogram

ming approach

•Smith-W

aterm

an

•Needleman-W

unsch

72

KCaM: KEGG Carbohydrate

Matcher

�Smith-W

aterm

ansequence alignment

algorithm (global and local)

73

β βββ2

β βββ4

β βββ4

β βββ4

α ααα6

β βββ4

α ααα3

β βββ4

KCaM: KEGG Carbohydrate

Matcher

�Maximum

Common SubtreeAlgorithm

74

KCaM Example

6 4

73 2 1

8 5

E

D C B A

F

FEDCBA

87

65

43

21

11

1

11

1

11

1

11

10

0

11

10

0

23

31

0 0 0

11

0

34

2

0 0 0 0 0

14

54 1 0 0 0 0 0 0

R:

75

Glycan Score Matrix

�Like PAM or BLO

SUM for proteins

�Improved KCaM using score matrix

�Similarity measures of m

atrix com

ponents

(glycan components)

�Statistical insight into glycan composition

76

Method

�Matrix entries:

“link”=monosaccharides+bondtype

�“Families”determ

ined by hierarchically

clustering KEGG GLYCAN based on

KCaM similarity scores

�Calculations perform

ed similar to

BLOSUM matrix for protein sequences

77

Improved alignments

78

Individual M

atrix Entries 1.9

600

5G

lcN

Ac1

, b6

Gal

NA

cG

lcN

Ac1

, b6

Gal

NA

c

1.9

849

3G

lc1, a3

Glc

Glc

1, a3

Glc

1.9

900

1G

lc1, a2

Glc

Glc

1, a2

Glc

1.9

977

Man

1, b bbb3

Glc

Man

1, a aaa3

Glc

2.0

322

2M

an1, a6 666

Glc

Man

1, a4 444

Glc

2.0

847

2G

lc1, b4

Glc

NA

cG

lc1, b4

Glc

NA

c

2.3

251

6M

an1, b4

Glc

NA

cM

an1, b4

Glc

NA

c

2.3

754

9G

lcN

Ac1

, b4

Glc

NA

cG

lcN

Ac1

, b4

Glc

NA

c

2.4

525

4Fuc

1, a6

Glc

NA

cFuc

1, a6

Glc

NA

c

Score

Score

Score

Score

Alig

ned

Lin

kage

Par

ent

Alig

ned

Lin

kage

Par

ent

Alig

ned

Lin

kage

Par

ent

Alig

ned

Lin

kage

Par

ent

Alig

ned

Lin

kage

Child

Alig

ned

Lin

kage

Child

Alig

ned

Lin

kage

Child

Alig

ned

Lin

kage

Child

79

Current glycome inform

atics

�Automated mass spectrom

etry

annotation

�Computer-theoretic algorithms for tree

alignments

�Probabilistic models (mining) for patterns

in glycans

�Kernel m

ethods for glycan classification

80

Mining in Glycome Inform

atics

�Probabilistic Models

•PSTMM, N. Ueda et al, TKDE, 2005

•Profile PSTMM, K.F. Aoki-Kinoshita et al, ISMB 2006

•OTMM, Hashimoto et al, KDD 2006

�Previous work on probabilistic trees

•Hidden Tree M

arkov Model, HTMM (Diligentiet al.,

2003) for image classification

81

HTMM Cannot Capture Sibling

Dependencies!

82

Probabilistic Sibling Tree M

arkov

Model (PSTMM)

HTMM

83

Inference and learning

�Estimating the parameters:

•To “learn”patterns found in given data

�Calculating the likelihood of a set of trees:

•To determ

ine which data are considered to belong to

same class as learned data

�Finding the most likely state transition:

•To retrieve the learned patterns

•To apply to m

ultiple tree alignments

84

Learned Classification

High M

annose

Hybrid

Complex

85

Summary of PSTMM Results

�There indeed seem to exist sibling-

dependent relationships in glycans!

�Statistical analysis of glycans seem

appropriate considering the noisiness of

the data

•Prediction of missing inform

ation

•Further classification groups based on

patterns found within a class of glycans

86

Profile PSTMM

�Provided binding affinity

data for a specific lectin,

compute the most likely

structure being recognized

�Statistically compute the key

patterns of sulfation in

GAGs based on various

biological m

easurements

(i.e. inhibition)

87

Glycan recognition

�Glycans are modified, degraded,

recognized by various types of proteins

•Much research focuses on understanding the

structure of the lectins that bind to glycans

•Recognition of the substructures at the leaves

88

Lectin-glycan experim

ent

�Many classes of lectins (glycan-binding

proteins)

•Recognize specific monosaccharides at the

leaves

�Galectinsrecognize Galactose residues

�FAC analysis has enabled high-throughput

binding affinity analysis of galectinsand

glycans (J. Hirabayashiet al, 2002)

89

Lectin-glycan experim

ent

90

J. Hirabayashi, et al. Oligosaccharide specificity of galectins:a search by frontal

affinity chromatography. Biochim Biophys Acta, 1572(2–3):232–54, 2002.

91

Gal-3

Gal-9N

Accuracy

.847

.910

Precision

1.0

.918

AUC

.930

.931

Lectin-binding glycan profiles

92

Current glycome inform

atics

�Automated mass spectrom

etry

annotation

�Computer-theoretic algorithms for tree

alignments

�Probabilistic models (mining) for patterns

in glycans

�Kernel m

ethods for glycan classification

93

Kernel M

ethods

�Machine learning m

ethod

•e.g. S

upport V

ector

Machines (SVM)

�Can handle features in

high-dimensions

•e.g. E

xpression data,

pathway inform

ation,

localization inform

ation, etc.

�Statistically computes

commonalities by

reducing the dimensions

of the data

•Data classification

•Feature extraction

http://www-kairo.csce.kyushu-u.ac.jp/~norikazu/research.en.html

94

Leukemia-specific features

�Hizukuriet al, Carbohydr. Res. 340, 2270-2278

(2005).

�Used KEGG GLYCAN data:

•Entries whose CarbBank annotations were related to

leukemiccells, erythrocytes, plasm

a and serum

•Predicted possible glycan m

arkers

•Correlated well with experimental data

�Assessed CarbBank data and retrieved leukemia-

specific glycans via annotations

�Found that glycan substructures of three residues

(trimers) produced best accuracy

�Also used the fact that structures at the leaves

should be distinguished from those at the root

95

Leukemia Kernel

�Layer-specific trim

ersfor each glycan

Hizukuri et al., Carbohydrate Research, 2005.

96

Leukemia Kernel

�A vector of all possible trim

ers n where x

nis

the number of tim

es trimer x appears in a

particular glycan G = G(x

1, x 2, ... x

n)

�Glycans X and Y are com

pared by the

following function:

97

Leukemia Markers

�Supported experimental results

98

Gram distribution kernel

�Kuboyama et al., Genome Inform

atics, 2006.

�Took the distribution of dimers, trimers,

quatrimers, etc. to represent a glycan

�Able to extract features of any size

�Used the concept of q-grams

99

Q-gram

100

Gram distribution kernel

�Possible to count all q-grams for rooted ordered

trees in linear tim

e (Kuboyama et al., LLLL 2006)

�By calculating the distribution of q-grams in a tree,

this kernel is able to capture more inform

ation,

including a variety of q for various path lengths

�To verify the perform

ance of the gram distribution

kernel, used the same data set as used for testing

the Layered-Trimer Kernel

�Also tested a data set of glycans related to the

keyw

ords “cystic fibrosis,”“bronchial m

ucin,”and

“respiratory mucin”

101

Results: Features extracted

102

Results: perform

ance

�Gram distribution vs. Leukemia kernel

(layered trimerkernel)

103

Results: marker size

104

Results: marker size

105

Systems approach to unveiling

structure-function relationship

106

Gly

can

syn

thes

is i

s n

on

te

mp

late

dri

ven

pro

cess.

We c

an

ne

ver

be

su

re t

ha

tth

e

co

mp

lete

str

uc

tura

l sp

ace

of

gly

can

sis

rep

res

en

ted

in

th

e d

ata

bases.

Th

eo

reti

cal

Nu

mb

er

of

Iso

mers

=

En

x 2

n(anom

er) x

2n

(conf) x (

4n

-1)

Monosaccharide1

4

Disaccharide 2

256

Trisaccharide

3

27,648

Tetrasaccharide4

4,194,304

Pentasaccharide5

819,200,00

Hexasaccharide6 195,689,447,42 4

Wh

ich

gly

can

str

uctu

res

reall

y e

xis

t in

cert

ain

sp

ecie

s ?

Wh

at

do

th

e d

ata

bas

es s

ay ?

Unknow

nstructuralspaceforglycan structure

Unknow

nstructuralspaceforglycan structure

107

Ź

Mo

no

sacch

ari

de

nam

e

mam

mal

ian

#M

am

malia

n

[%]

hu

man

# h

um

an

[%]

1B-D-GLC

PNAC

7319

26,1%

4705

26,69%

2B-D-GALP

6389

22,8%

4178

23,70%

3A-D-MANP

3659

13,1%

2073

11,76%

4A-D-NEUP5A

C2101

7,5%

1465

8,31%

5A-L-FUCP

1971

7,0%

1461

8,29%

6B-D-MANP

1486

5,3%

900

5,10%

7D-GLC

NAC

675

2,4%

403

2,29%

8D-GLC

NAC-OL

598

2,1%

399

2,26%

9D-GALN

AC-OL

511

1,8%

355

2,01%

10B-D-GLC

P423

1,5%

244

1,38%

11B-D-GALP

NAC

431

1,5%

230

1,30%

12SULFATE

450

1,6%

198

1,12%

13A-D-GALP

NAC

248

0,9%

171

0,97%

14D-GLC

197

0,7%

151

0,86%

15A-D-GALP

287

1,0%

103

0,58%

16D-GALN

AC

116

0,4%

910,52%

17A-D-GLC

P161

0,6%

680,39%

18B-D-GLC

PA

940,3%

540,31%

19D-GAL

560,2%

370,21%

20D-GLC

-OL

370,1%

340,19%

21A-D-GLC

PNAC

890,3%

310,18%

22D-GAL-OL

380,1%

270,15%

23D-GLC

PNAC

370,1%

170,10%

24A-D-NEUP5G

C132

0,5%

160,09%

25B-D-XYLP

230,1%

130,07%

26D-GALP

220,1%

120,07%

27A-L-4-EN-THRHEXPA

400,1%

120,07%

28?-D-GALP

NAC

150,1%

110,06%

29P

200,1%

110,06%

30D-2,5-ANHYDRO-MAN-OL13

0,0%

90,05%

Occurrence of monosaccharide residues

(CarbBank nomenclature)

Occurrence of monosaccharide residues

(CarbBank nomenclature)

Ma

mm

ali

an

:5

33

9

Hu

ma

n

:2

12

8

10

88.4

%

20

97.5

%

30

99.1

%

To

tal n

um

ber

of

dif

fere

nt

res

idu

es

Ma

mm

alia

n :

86

Hu

ma

n :

83

Ste

ph

an

Herg

et

/Ren

e R

an

zin

ger

108

Parent

from

to

Child

#1

B-D-GLC

P-2NAC

41

B-D-GALP

2837

2A-D-MANP

21

B-D-GLC

P-2NAC

1382

3B-D-GALP

31

B-D-GLC

P-2NAC

860

4B-D-MANP

61

A-D-MANP

776

5B-D-MANP

31

A-D-MANP

771

6B-D-GALP

32

A-D-NEUP-5AC

742

7B-D-GLC

P-2NAC

41

B-D-MANP

732

8B-D-GALP

62

A-D-NEUP-5AC

467

9B-D-GALP

21

A-L-FUCP

436

10B-D-GLC

P-2NAC

31

A-L-FUCP

418

11A-D-MANP

41

B-D-GLC

P-2NAC

340

12B-D-GLC

P-2NAC

31

B-D-GALP

300

13A-D-MANP

61

B-D-GLC

P-2NAC

255

14B-D-GLC

P4

1B-D-GALP

219

15B-D-GALP

61

B-D-GLC

P-2NAC

186

16B-D-GLC

P-2NAC

41

B-D-GLC

P-2NAC

175

17A-D-MANP

21

A-D-MANP

156

18B-D-GALP

31

A-D-GALP

-2NAC

119

19B-D-GLC

P-2NAC

41

A-L-FUCP

117

20B-D-GLC

P-2NAC

41

B-D-GALP

-2NAC

110

21A-D-MANP

31

A-D-MANP

9222

A-D-MANP

61

A-D-MANP

8823

B-D-GLC

P-2NAC

61

A-L-FUCP

8624

B-D-MANP

41

B-D-GLC

P-2NAC

7825

B-D-GALP

31

A-D-GALP

6826

B-D-GALP

41

B-D-GALP

-2NAC

6227

B-D-GALP

-2NAC

31

B-D-GALP

4528

A-D-GALP

-2NAC

31

B-D-GALP

3929

A-D-NEUP-5AC

82

A-D-NEUP-5AC

31

Occurrence of disaccharide residues (CarbBank nomenclature)

Occurrence of disaccharide residues (CarbBank nomenclature)

Hu

ma

n

:2

12

8 10

71.7

%

20

89.9

%

30

95.5

%

To

tal n

um

ber

of

dif

fere

nt

Dis

acc

ha

rid

e

Hu

ma

n :

17

1

on

ce

:6

5

twic

e

:

20

Th

ree

Tim

es

:1

0

Ste

ph

an

Herg

et

/Ren

e R

an

zin

ger

109

Topologies of G

lycans

Topologies of G

lycans

Siz

e o

f G

lyc

an

(R

es

idu

es

)N

um

be

r o

f B

ran

ch

ing

po

ints

Ste

ph

an

Herg

et

/Ren

e R

an

zin

ger

110

Mathematical M

odelling to explore the structural space of

glycan using Inform

ation from carbohydrate active enzymes

Mathematical M

odelling to explore the structural space of

glycan using Inform

ation from carbohydrate active enzymes

A M

athematicalM

odel of N-LinkedGlycosylation

Frederick J. Krambeck, Michael J. Betenbaugh; BiotechnolBioeng. 2005 Dec 20;92(6):711-28.

Th

e f

ull m

od

el g

en

era

tes 7

565 N

-gly

can

str

uctu

res in

a

netw

ork

of

22,8

71 r

eacti

on

s

En

zym

es

inclu

ded

Ab

bre

via

tio

ns

EC

No

.M

an

I3

.2.1

.11

3

Ma

nII

3.2

.1.1

14

Fu

cT

2.4

.1.6

8

Gn

TI

2.4

.1.1

01

Gn

TII

2.4

.1.1

43

Gn

TII

I2

.4.1

.14

4

Gn

TIV

2.4

.1.1

45

Gn

TV

2.4

.1.1

55

Gn

TE

2.4

.1.1

49

Ga

lT2

.4.1

.38

Sia

T2

.4.9

9.6

En

zym

e r

eacti

on

ru

le t

ab

les t

o m

od

el

reacti

on

netw

ork

s:

Parameters:spatial distribution of enzymes, transport,

reaction kinetcs, donor concentrations.

Gly

co

form

descri

pti

on

sch

em

e

Man

Number of mannose residues

Fuc

Number of fucoseresidues.

Gnb

Number of bisecting GlcNAcresidues

Gal

Number of galactose residues

Sia

Number of sialic acid (NeuAc) residues

Br1

Extension level of branch 1.

Br2

Extension level of branch 2.

Br3

Extension level of branch 3.

Br4

Extension level of branch 4.

111

7538

3550

607

3381

total

total

ligands

O-

glycans

N-glycan

0.7

52

0.4

15

00

1.1

37

8

1.0

74

0.4

15

0.7

41.7

55

7

1.9

142

1.2

42

0.3

22.9

98

6

2.5

186

2.3

83

0.8

52.9

98

5

3.1

234

4.2

149

0.3

22.5

83

4

8.6

649

9.3

329

1,7

10

9.2

310

3

20.4

1534

22.9

812

4.8

29

20.5

693

2

61.4

4625

59.0

2093

91.4

555

58.5

1977

1

%#

%#

%#

%#

Chain

lengthD

istr

ibu

tio

n o

f carb

oh

yd

rate

ch

ain

s in

PD

B

(Release September 2004)

112

Re

co

mm

en

dati

on

1:

De

velo

pm

en

t o

f a r

ob

ust,

cen

trali

zed

,

an

d t

ho

rou

gh

ly c

ura

ted

gly

can

str

uctu

res d

ata

bas

e

“We need to be able to search databases for what is out there.

Imagine genomics and proteomics withoutGenBank”

The current state of glyco-related databases can be characterized

as “the biggest defect in the field”. (Ajit Varki).

To smooth the way for central carbohydrate structure database the active larger

initiatives agreed to im

mediately start with the necessary preparatory steps for

the conversion of CarbBank data into the GLYDE-IIform

at

113

Summary

�Understanding protein modifications such as

glycosylation is crucial to understand function

�Databases for Glyco-inform

atics Research is

starting to come together

•XML standardization

•Major databases (G

lycosciences.de, KEGG, CFG)

�More advanced inform

atics approaches can be

applied to various facets of glyco-research

�Goal: to get the trueoverall picture of cellular

processes

114

For further questions:

�Kiyoko F. Aoki-Kinoshita

[email protected]