An introduction to bioinformatics for glycomics...
Transcript of An introduction to bioinformatics for glycomics...
1
An
in
tro
du
cti
on
to
bio
info
rmati
cs f
or
gly
co
mic
s
rese
arc
h
Kiyoko F. Aoki-Kinoshita
Soka University, Japan
3
Genomes
Transcriptome
Proteome
Glycosyltransferases
Glycosidases
Carbohydrate-m
odifying
enzym
es
Lectins, CBPs,
growth factors
Post-
Genomics
> 6600 ORF
>7200 ORF
> ??? ORF
> ??? ORF
Glyco-
sciences
Glycome
Gly
co
syla
tio
n b
y
gen
ep
rod
ucts
Gen
e p
rod
ucts
,in
tera
cti
on
s
wit
hcarb
oh
yd
rate
en
vir
on
men
t
Glycoproteins
N-Glycans
O-Glycans
GPI-Anchor
Glycosaminoglycans
Proteoglycans
Microbial
Polysaccharides
Diseases
Fundamental biol.
processes
Diagnostic
Tools
Treatments
Aging
Braindevelopment
EmbryonicDevelopm
ent
Fertilization
Infections, AIDS,
Malaria, Tuberculosis,
Cancer/Metastasis,
CDGs,Allergy
Genetic
Tests,
MicrobialA
ntigens,
AllergyMarkers,
CancerMarkers
Synthetic
Vaccines,
Immunotherapy,
Anti-inflammantory
Drugs, newAntibiotics
Genomics
Glycolipids
CDG=Congenital D
isorder of Glycosylation
CBP=Carbohydrate Binding Protein
5
Gly
cera
lald
eh
yd
e,
the
sim
ple
st
Ald
ose
, co
nta
ins o
ne
ch
ira
lca
rbo
n a
tom
carr
yin
g f
ou
r
dif
fere
nt
su
bs
titu
en
tsan
d h
as t
here
fore
tw
o d
iffe
ren
t e
nan
tio
mers
.
7
So
me
co
mm
on
an
d b
iolo
gic
all
y im
po
rta
nt
mo
no
sa
cc
hari
de
s
Glc Gal
Xyl
Man
Fuc
Fru
GlcNAc
Neu5NAc
GalNAc
GalA
IdoA
Rib
9
Oligosaccharide description
�Tree structuresof
monosaccharides and
linkages
�Nodes =
sugars/monosaccharides
�Edges = bonds/linkages
Root node
α ααα6
β βββ4
α ααα3
β βββ4
β βββ2 β βββ
2
β βββ4
β βββ4
β βββ4
β βββ4
11
Overv
iew
of
gly
can
bio
syn
the
tic
path
wa
ys
.
Hu
ds
on
H F
ree
ze
Genetic defects in
the hum
an glycome.
Nat Rev Genet.
2006 Jul;7(7):537-51
12
Gly
can
cla
sse
s:
fun
cti
on
s a
nd
bio
syn
the
sis
Hu
ds
on
H F
ree
ze
Genetic defects in
the hum
an glycome.
Nat Rev Genet.
2006 Jul;7(7):537-51
13
N-l
ink
ed
gly
ca
n b
ios
yn
the
tic p
ath
way
Hu
dso
n H
Fre
eze
Genetic defects in
the hum
an glycome.
Nat Rev Genet.
2006 Jul;7(7):537-51
Steps in the pathway at which genetic
disorders occur are indicated, with the
associated genes underneath, as are steps at
which ananimal m
odel is available. MPDU1
encodes a protein that enables the utilization of
dolichol-P-m
annose and dolichol-P glucose,
but does not catalyse the reactions.
14
Human diseases caused by genetic
defects in N-glycosylation pathways
�Congenital disorders of glycosylation
(19 distinct genes)
•Mental retardation, seizures, epilepsy,...
�Mucolipidosis I & II
•Coarsening features, organomegaly, joint stiffness, ...
�Congenital dyserythropoietic anaemia (CDA II)
•Anaemia, jaundice, splenomegaly, gall bladder
disease
15
O-m
an
no
se a
nd
O-x
ylo
se b
iosyn
theti
c p
ath
ways
NS, 2S, 3S, 4S
and 6S represent
2-N-, 2-O
-, 3-O-,
4-O
-and 6-O
-sulphate, in that
order.
16
Hu
man
dis
eases c
au
sed
by g
en
eti
c
defe
cts
in
O-g
lyco
syla
tio
n p
ath
ways
�Walker-Warburg syndrome
�Fukuyama m
uscular dystrophy
�Ehlers-Danlos syndrome
�Chondrodysplasias
�Macular corneal dystrophy
�Tn syndrome
�others
Hu
man
dis
eases c
au
sed
by g
en
eti
c
defe
cts
in
gly
co
lip
id s
yn
thesis
�Paroxysomal nocturnal haemoglobinuria
�Amish infantile epilepsy
17
Ca
lne
xin
an
d c
alr
eti
cu
lin
are
re
late
d p
rote
ins
th
at
co
mp
ris
e a
n E
R
ch
ap
ero
ne
sys
tem
th
at
en
su
res
th
e p
rop
er
fold
ing
an
d q
ua
lity
co
ntr
ol
of
new
ly s
yn
the
siz
ed
gly
co
pro
tein
s.
Williams DB
,Beyond lectins: the calnexin/calreticulin
chaperone system of the endoplasm
ic reticulum.
J Cell Sci. 2006 Feb 15;119(Pt 4):615-23.
19
Carbohydrate Structure
Databases
�CarbBank
�SWEET-DB / glycosciences.de
�KEGG GLYCAN
�Consortium for Functional G
lycomics
�BCSDB
�EuroCarbDB
�Commercial databases:
•GlycoSuite
(Proteome Systems, Ltd.)
•Glycomics DB (Glycominds, Ltd.)
20
CarbBank
�Developed by Complex Carbohydrate
Research Center, University of G
eorgia
�Community database of carbohydrates
�Project ended due to lack of funding in
1996
21
GLYCOSCIENCES.de DB
�http://www.glycosciences.de
�Combines CarbB
ank and Sugabase
using
a common web-based interface
�Provides searching by bibliography,
structure, NMR and MS, as well as by
LINUCS ID
25
KEGG GLYCAN
�http://www.genome.jp/kegg/glycan/
�Based on CarbBank as well as input from
scientists
�All data is linked with KEGG’sother
resources: GENES, PATHWAY, KO and
literary databases
�Several tools for analysis available
32
KEGG’sGlycan Biosynthesis and
Metabolism Pathways
N-Glycan biosynthesis
High-m
annose type N-glycan
biosynthesis
N-Glycan degradation
O-G
lycan biosynthesis
Chondroitin/ heparan
sulfate
biosynthesis
Keratansulfate biosynthesis
Glycosaminoglycandegradation
Lipopolysaccharidebiosynthesis
Peptidoglycanbiosynthesis
Glycosylphosphatidylinositol(GPI)
-anchor biosynthesis
Glycosphingolipidmetabolism
Blood group glycolipid
biosynthesis -lactoseries
Blood group glycolipid
biosynthesis -neo-lactoseries
Globosidemetabolism
Gangliosidebiosynthesis
Glycan structures -biosynthesis 1
Glycan structures -biosynthesis 2
Glycan structures -degradation
42
Consortium for Functional
Glycomics (CFG)
�Consortium home page:
http://www.functionalglycomics.org/
�Consortium of major universities and research
institutes worldwide
�Aim: to provide a central resource for glycomics
research
�Also provides requested resources to promote
participating investigators’research
•Glycan arrays and data
•Mass spectra analysis…
�CFG glycan database
50
BCSDB: Bacterial C
arbohydrate Structure DataBase
http://www.glyco.ac.ru/bcsdb/start.shtm
l
�Provides structural, bibliographic, taxonomic and related
inform
ation on bacterial carbohydrate structures.
�Data based on Carbbank and m
anual data posting (structures
published after 1995, approx. 3000 records).
�>
95
% c
ov
era
geof the scope of bacterial carbohydrates.
•Bacterial= structure has been found in bacteria or obtained by
modification of those found in bacteria.
•Carbohydrate= structure composed of any residues linked by glycosidic,
ester, amidic, ketal, phospho-or sulpho-diester bonds, in which at least
one residue is a sugar or its derivative.
�Each record includes structure, bibliography, abstract,
keyw
ords, biological source, methods used to elucidate the
structure, bioactivity, NMR assignment tables, etc.
�Search by IDs, bibliographic data and keyw
ords, biological
source, the fragment of structure and NMR data.
�Data cross-linked with GlycoSCIENCES.DB
57
EuroCarbDB –Design Study
�http://www.eurocarbdb.org/
�Based in Europe, but participants from
universities and research groups
worldwide
�Distributed infrastructure to integrate
multiple resources with a single interface
58
Data Modeling
�Foremost issue in handling glycan
structures for comparison and analysis
�A few models/form
ats currently available:
•LINUCS
•KCF
•Linear Code©
•GLYDE(XML)
•GlycoCT
59
Glycome inform
atics
�Glycome: the repertoire of glycans in a cell,
tissue, or organism
�Glycome inform
atics: Algorithms, methods
and computational m
odels for the study of
the glycome
60
Current glycome inform
atics
�Glycomics:
•Automatedmass spectrometry annotation
�Computer-theoretic algorithms for tree
alignments
�Probabilistic models (mining) for patterns
in glycans
�Kernel m
ethods for glycan classification
61
GlycomicsTechniques
�Mass spectrometry of glycoproteins:
prediction/annotation
•Mizuno et al., Anal. Chem, 1999
•GlycoMod(Cooper et al, Proteomics, 2001)
•STAT (Gaucheret al, Anal. Chem., 2000)
•StrOligo(M
. Ethieret al, Methods Mol B
iol., 2006)
•Cartoonist (D. Goldberg et al, Proteomics, 2005)
•Glyco-Peakfinder (K. Maas, R. Ranzinger et al,
Proteomics, 2007)
•GlycoWorkbench
(A. C
eorni et al., 2007)
•GLYCH(H. Tang et al, Bioinform
atics, 2005)
63
GlycoMod
�http://www.expasy.ch/tools/glycomod/
�Predicts the possible oligosaccharide
structures that occur on proteins from their
experimentally determ
ined masses.
�Can be used for free or derivatized
oligosaccharides and for glycopeptides
64
Ex
pe
rim
en
tal
wo
rkfl
ow
fo
r (s
em
i-)a
uto
ma
tic
de
term
ina
tio
n o
f
gly
ca
n s
tru
ctu
res
fro
m r
aw
da
ta t
o f
ull
y a
ss
ign
ed
sp
ec
tru
m v
ia
co
mp
os
itio
n a
na
lys
is (
Gly
co
Pe
ak
Fin
de
r) a
nd
fra
gm
en
t m
atc
hin
g
(Gly
co
Wo
rkb
en
ch
).
65
No
me
nc
latu
re o
f M
S f
rag
me
nts
of
ca
rbo
hyd
rate
s a
s d
efi
ne
d b
y
Do
mo
na
nd
Co
ste
llo
68
Current glycome inform
atics
�Automated mass spectrom
etry annotation
�Computer-theoretic algorithms for tree
alignments
�Probabilistic models (mining) for patterns
in glycans
�Kernel m
ethods for glycan classification
69
Computer Theoretic Techniques
�KCaM: K.F. Aoki et al, NAR, 2004
�Score matrix for glycan linkages, K.F. Aoki et al,
Bioinform
atics, 2005
�Least common supertreeapproximation
algorithm for reconstructing glycans from
spectral data, K.F. Aoki-Kinoshita et al, ISAAC
2006
70
Glycan structure comparison
�Calculating glycan “similarity”
•Efficiency
•Biologically meaningful
�Data mining techniques
�Prediction:
•In laym
an’s term
s: determ
ining whether or not
a given glycan belongs to a particular class
71
Glycan structure comparison:
KCaM
�KEGG Carbohydrate Matcher
�Glycan alignment tool for KEGG GLYCAN
�Maximum
Common Subtreealgorithm
�Dynamicprogram
ming approach
•Smith-W
aterm
an
•Needleman-W
unsch
73
β βββ2
β βββ4
β βββ4
β βββ4
α ααα6
β βββ4
α ααα3
β βββ4
KCaM: KEGG Carbohydrate
Matcher
�Maximum
Common SubtreeAlgorithm
74
KCaM Example
6 4
73 2 1
8 5
E
D C B A
F
FEDCBA
87
65
43
21
11
1
11
1
11
1
11
10
0
11
10
0
23
31
0 0 0
11
0
34
2
0 0 0 0 0
14
54 1 0 0 0 0 0 0
R:
75
Glycan Score Matrix
�Like PAM or BLO
SUM for proteins
�Improved KCaM using score matrix
�Similarity measures of m
atrix com
ponents
(glycan components)
�Statistical insight into glycan composition
76
Method
�Matrix entries:
“link”=monosaccharides+bondtype
�“Families”determ
ined by hierarchically
clustering KEGG GLYCAN based on
KCaM similarity scores
�Calculations perform
ed similar to
BLOSUM matrix for protein sequences
78
Individual M
atrix Entries 1.9
600
5G
lcN
Ac1
, b6
Gal
NA
cG
lcN
Ac1
, b6
Gal
NA
c
1.9
849
3G
lc1, a3
Glc
Glc
1, a3
Glc
1.9
900
1G
lc1, a2
Glc
Glc
1, a2
Glc
1.9
977
Man
1, b bbb3
Glc
Man
1, a aaa3
Glc
2.0
322
2M
an1, a6 666
Glc
Man
1, a4 444
Glc
2.0
847
2G
lc1, b4
Glc
NA
cG
lc1, b4
Glc
NA
c
2.3
251
6M
an1, b4
Glc
NA
cM
an1, b4
Glc
NA
c
2.3
754
9G
lcN
Ac1
, b4
Glc
NA
cG
lcN
Ac1
, b4
Glc
NA
c
2.4
525
4Fuc
1, a6
Glc
NA
cFuc
1, a6
Glc
NA
c
Score
Score
Score
Score
Alig
ned
Lin
kage
Par
ent
Alig
ned
Lin
kage
Par
ent
Alig
ned
Lin
kage
Par
ent
Alig
ned
Lin
kage
Par
ent
Alig
ned
Lin
kage
Child
Alig
ned
Lin
kage
Child
Alig
ned
Lin
kage
Child
Alig
ned
Lin
kage
Child
79
Current glycome inform
atics
�Automated mass spectrom
etry
annotation
�Computer-theoretic algorithms for tree
alignments
�Probabilistic models (mining) for patterns
in glycans
�Kernel m
ethods for glycan classification
80
Mining in Glycome Inform
atics
�Probabilistic Models
•PSTMM, N. Ueda et al, TKDE, 2005
•Profile PSTMM, K.F. Aoki-Kinoshita et al, ISMB 2006
•OTMM, Hashimoto et al, KDD 2006
�Previous work on probabilistic trees
•Hidden Tree M
arkov Model, HTMM (Diligentiet al.,
2003) for image classification
83
Inference and learning
�Estimating the parameters:
•To “learn”patterns found in given data
�Calculating the likelihood of a set of trees:
•To determ
ine which data are considered to belong to
same class as learned data
�Finding the most likely state transition:
•To retrieve the learned patterns
•To apply to m
ultiple tree alignments
85
Summary of PSTMM Results
�There indeed seem to exist sibling-
dependent relationships in glycans!
�Statistical analysis of glycans seem
appropriate considering the noisiness of
the data
•Prediction of missing inform
ation
•Further classification groups based on
patterns found within a class of glycans
86
Profile PSTMM
�Provided binding affinity
data for a specific lectin,
compute the most likely
structure being recognized
�Statistically compute the key
patterns of sulfation in
GAGs based on various
biological m
easurements
(i.e. inhibition)
87
Glycan recognition
�Glycans are modified, degraded,
recognized by various types of proteins
•Much research focuses on understanding the
structure of the lectins that bind to glycans
•Recognition of the substructures at the leaves
88
Lectin-glycan experim
ent
�Many classes of lectins (glycan-binding
proteins)
•Recognize specific monosaccharides at the
leaves
�Galectinsrecognize Galactose residues
�FAC analysis has enabled high-throughput
binding affinity analysis of galectinsand
glycans (J. Hirabayashiet al, 2002)
90
J. Hirabayashi, et al. Oligosaccharide specificity of galectins:a search by frontal
affinity chromatography. Biochim Biophys Acta, 1572(2–3):232–54, 2002.
92
Current glycome inform
atics
�Automated mass spectrom
etry
annotation
�Computer-theoretic algorithms for tree
alignments
�Probabilistic models (mining) for patterns
in glycans
�Kernel m
ethods for glycan classification
93
Kernel M
ethods
�Machine learning m
ethod
•e.g. S
upport V
ector
Machines (SVM)
�Can handle features in
high-dimensions
•e.g. E
xpression data,
pathway inform
ation,
localization inform
ation, etc.
�Statistically computes
commonalities by
reducing the dimensions
of the data
•Data classification
•Feature extraction
http://www-kairo.csce.kyushu-u.ac.jp/~norikazu/research.en.html
94
Leukemia-specific features
�Hizukuriet al, Carbohydr. Res. 340, 2270-2278
(2005).
�Used KEGG GLYCAN data:
•Entries whose CarbBank annotations were related to
leukemiccells, erythrocytes, plasm
a and serum
•Predicted possible glycan m
arkers
•Correlated well with experimental data
�Assessed CarbBank data and retrieved leukemia-
specific glycans via annotations
�Found that glycan substructures of three residues
(trimers) produced best accuracy
�Also used the fact that structures at the leaves
should be distinguished from those at the root
95
Leukemia Kernel
�Layer-specific trim
ersfor each glycan
Hizukuri et al., Carbohydrate Research, 2005.
96
Leukemia Kernel
�A vector of all possible trim
ers n where x
nis
the number of tim
es trimer x appears in a
particular glycan G = G(x
1, x 2, ... x
n)
�Glycans X and Y are com
pared by the
following function:
98
Gram distribution kernel
�Kuboyama et al., Genome Inform
atics, 2006.
�Took the distribution of dimers, trimers,
quatrimers, etc. to represent a glycan
�Able to extract features of any size
�Used the concept of q-grams
100
Gram distribution kernel
�Possible to count all q-grams for rooted ordered
trees in linear tim
e (Kuboyama et al., LLLL 2006)
�By calculating the distribution of q-grams in a tree,
this kernel is able to capture more inform
ation,
including a variety of q for various path lengths
�To verify the perform
ance of the gram distribution
kernel, used the same data set as used for testing
the Layered-Trimer Kernel
�Also tested a data set of glycans related to the
keyw
ords “cystic fibrosis,”“bronchial m
ucin,”and
“respiratory mucin”
106
Gly
can
syn
thes
is i
s n
on
te
mp
late
dri
ven
pro
cess.
We c
an
ne
ver
be
su
re t
ha
tth
e
co
mp
lete
str
uc
tura
l sp
ace
of
gly
can
sis
rep
res
en
ted
in
th
e d
ata
bases.
Th
eo
reti
cal
Nu
mb
er
of
Iso
mers
=
En
x 2
n(anom
er) x
2n
(conf) x (
4n
-1)
Monosaccharide1
4
Disaccharide 2
256
Trisaccharide
3
27,648
Tetrasaccharide4
4,194,304
Pentasaccharide5
819,200,00
Hexasaccharide6 195,689,447,42 4
Wh
ich
gly
can
str
uctu
res
reall
y e
xis
t in
cert
ain
sp
ecie
s ?
Wh
at
do
th
e d
ata
bas
es s
ay ?
Unknow
nstructuralspaceforglycan structure
Unknow
nstructuralspaceforglycan structure
107
Ź
Mo
no
sacch
ari
de
nam
e
mam
mal
ian
#M
am
malia
n
[%]
hu
man
# h
um
an
[%]
1B-D-GLC
PNAC
7319
26,1%
4705
26,69%
2B-D-GALP
6389
22,8%
4178
23,70%
3A-D-MANP
3659
13,1%
2073
11,76%
4A-D-NEUP5A
C2101
7,5%
1465
8,31%
5A-L-FUCP
1971
7,0%
1461
8,29%
6B-D-MANP
1486
5,3%
900
5,10%
7D-GLC
NAC
675
2,4%
403
2,29%
8D-GLC
NAC-OL
598
2,1%
399
2,26%
9D-GALN
AC-OL
511
1,8%
355
2,01%
10B-D-GLC
P423
1,5%
244
1,38%
11B-D-GALP
NAC
431
1,5%
230
1,30%
12SULFATE
450
1,6%
198
1,12%
13A-D-GALP
NAC
248
0,9%
171
0,97%
14D-GLC
197
0,7%
151
0,86%
15A-D-GALP
287
1,0%
103
0,58%
16D-GALN
AC
116
0,4%
910,52%
17A-D-GLC
P161
0,6%
680,39%
18B-D-GLC
PA
940,3%
540,31%
19D-GAL
560,2%
370,21%
20D-GLC
-OL
370,1%
340,19%
21A-D-GLC
PNAC
890,3%
310,18%
22D-GAL-OL
380,1%
270,15%
23D-GLC
PNAC
370,1%
170,10%
24A-D-NEUP5G
C132
0,5%
160,09%
25B-D-XYLP
230,1%
130,07%
26D-GALP
220,1%
120,07%
27A-L-4-EN-THRHEXPA
400,1%
120,07%
28?-D-GALP
NAC
150,1%
110,06%
29P
200,1%
110,06%
30D-2,5-ANHYDRO-MAN-OL13
0,0%
90,05%
Occurrence of monosaccharide residues
(CarbBank nomenclature)
Occurrence of monosaccharide residues
(CarbBank nomenclature)
Ma
mm
ali
an
:5
33
9
Hu
ma
n
:2
12
8
10
88.4
%
20
97.5
%
30
99.1
%
To
tal n
um
ber
of
dif
fere
nt
res
idu
es
Ma
mm
alia
n :
86
Hu
ma
n :
83
Ste
ph
an
Herg
et
/Ren
e R
an
zin
ger
108
Parent
from
to
Child
#1
B-D-GLC
P-2NAC
41
B-D-GALP
2837
2A-D-MANP
21
B-D-GLC
P-2NAC
1382
3B-D-GALP
31
B-D-GLC
P-2NAC
860
4B-D-MANP
61
A-D-MANP
776
5B-D-MANP
31
A-D-MANP
771
6B-D-GALP
32
A-D-NEUP-5AC
742
7B-D-GLC
P-2NAC
41
B-D-MANP
732
8B-D-GALP
62
A-D-NEUP-5AC
467
9B-D-GALP
21
A-L-FUCP
436
10B-D-GLC
P-2NAC
31
A-L-FUCP
418
11A-D-MANP
41
B-D-GLC
P-2NAC
340
12B-D-GLC
P-2NAC
31
B-D-GALP
300
13A-D-MANP
61
B-D-GLC
P-2NAC
255
14B-D-GLC
P4
1B-D-GALP
219
15B-D-GALP
61
B-D-GLC
P-2NAC
186
16B-D-GLC
P-2NAC
41
B-D-GLC
P-2NAC
175
17A-D-MANP
21
A-D-MANP
156
18B-D-GALP
31
A-D-GALP
-2NAC
119
19B-D-GLC
P-2NAC
41
A-L-FUCP
117
20B-D-GLC
P-2NAC
41
B-D-GALP
-2NAC
110
21A-D-MANP
31
A-D-MANP
9222
A-D-MANP
61
A-D-MANP
8823
B-D-GLC
P-2NAC
61
A-L-FUCP
8624
B-D-MANP
41
B-D-GLC
P-2NAC
7825
B-D-GALP
31
A-D-GALP
6826
B-D-GALP
41
B-D-GALP
-2NAC
6227
B-D-GALP
-2NAC
31
B-D-GALP
4528
A-D-GALP
-2NAC
31
B-D-GALP
3929
A-D-NEUP-5AC
82
A-D-NEUP-5AC
31
Occurrence of disaccharide residues (CarbBank nomenclature)
Occurrence of disaccharide residues (CarbBank nomenclature)
Hu
ma
n
:2
12
8 10
71.7
%
20
89.9
%
30
95.5
%
To
tal n
um
ber
of
dif
fere
nt
Dis
acc
ha
rid
e
Hu
ma
n :
17
1
on
ce
:6
5
twic
e
:
20
Th
ree
Tim
es
:1
0
Ste
ph
an
Herg
et
/Ren
e R
an
zin
ger
109
Topologies of G
lycans
Topologies of G
lycans
Siz
e o
f G
lyc
an
(R
es
idu
es
)N
um
be
r o
f B
ran
ch
ing
po
ints
Ste
ph
an
Herg
et
/Ren
e R
an
zin
ger
110
Mathematical M
odelling to explore the structural space of
glycan using Inform
ation from carbohydrate active enzymes
Mathematical M
odelling to explore the structural space of
glycan using Inform
ation from carbohydrate active enzymes
A M
athematicalM
odel of N-LinkedGlycosylation
Frederick J. Krambeck, Michael J. Betenbaugh; BiotechnolBioeng. 2005 Dec 20;92(6):711-28.
Th
e f
ull m
od
el g
en
era
tes 7
565 N
-gly
can
str
uctu
res in
a
netw
ork
of
22,8
71 r
eacti
on
s
En
zym
es
inclu
ded
Ab
bre
via
tio
ns
EC
No
.M
an
I3
.2.1
.11
3
Ma
nII
3.2
.1.1
14
Fu
cT
2.4
.1.6
8
Gn
TI
2.4
.1.1
01
Gn
TII
2.4
.1.1
43
Gn
TII
I2
.4.1
.14
4
Gn
TIV
2.4
.1.1
45
Gn
TV
2.4
.1.1
55
Gn
TE
2.4
.1.1
49
Ga
lT2
.4.1
.38
Sia
T2
.4.9
9.6
En
zym
e r
eacti
on
ru
le t
ab
les t
o m
od
el
reacti
on
netw
ork
s:
Parameters:spatial distribution of enzymes, transport,
reaction kinetcs, donor concentrations.
Gly
co
form
descri
pti
on
sch
em
e
Man
Number of mannose residues
Fuc
Number of fucoseresidues.
Gnb
Number of bisecting GlcNAcresidues
Gal
Number of galactose residues
Sia
Number of sialic acid (NeuAc) residues
Br1
Extension level of branch 1.
Br2
Extension level of branch 2.
Br3
Extension level of branch 3.
Br4
Extension level of branch 4.
111
7538
3550
607
3381
total
total
ligands
O-
glycans
N-glycan
0.7
52
0.4
15
00
1.1
37
8
1.0
74
0.4
15
0.7
41.7
55
7
1.9
142
1.2
42
0.3
22.9
98
6
2.5
186
2.3
83
0.8
52.9
98
5
3.1
234
4.2
149
0.3
22.5
83
4
8.6
649
9.3
329
1,7
10
9.2
310
3
20.4
1534
22.9
812
4.8
29
20.5
693
2
61.4
4625
59.0
2093
91.4
555
58.5
1977
1
%#
%#
%#
%#
Chain
lengthD
istr
ibu
tio
n o
f carb
oh
yd
rate
ch
ain
s in
PD
B
(Release September 2004)
112
Re
co
mm
en
dati
on
1:
De
velo
pm
en
t o
f a r
ob
ust,
cen
trali
zed
,
an
d t
ho
rou
gh
ly c
ura
ted
gly
can
str
uctu
res d
ata
bas
e
“We need to be able to search databases for what is out there.
Imagine genomics and proteomics withoutGenBank”
The current state of glyco-related databases can be characterized
as “the biggest defect in the field”. (Ajit Varki).
To smooth the way for central carbohydrate structure database the active larger
initiatives agreed to im
mediately start with the necessary preparatory steps for
the conversion of CarbBank data into the GLYDE-IIform
at
113
Summary
�Understanding protein modifications such as
glycosylation is crucial to understand function
�Databases for Glyco-inform
atics Research is
starting to come together
•XML standardization
•Major databases (G
lycosciences.de, KEGG, CFG)
�More advanced inform
atics approaches can be
applied to various facets of glyco-research
�Goal: to get the trueoverall picture of cellular
processes