TO A GENERAL APPLICABLE MONOSACCHARIDE IDENTIFICATION USING TOCSY … · 2017. 8. 3. · TO A...
Transcript of TO A GENERAL APPLICABLE MONOSACCHARIDE IDENTIFICATION USING TOCSY … · 2017. 8. 3. · TO A...
-
TO A GENERAL APPLICABLE
MONOSACCHARIDE IDENTIFICATION
USING TOCSY MATCHING
Yannick Dandois Student number: 01109010
Supervisor(s): Prof. Dr. José Martins, Prof. Dr. Peter Dawyndt
A dissertation submitted to Ghent University in partial fulfilment of the requirements for the degree of
Master of Science in Chemistry
Academic year: 2016 - 2017
-
I
Dankwoord
De weg van de student is lang en zeker niet zonder heuvels. Maar door die enkelen blijft student
zijn toch nog altijd leuk. De kans om aan een project te werken; die niet enkel pure chemie bevat
maar ook andere aspecten in de wetenschap, van het programmeren tot de wiskunde die zich er
telkens achter verschuilt, is een kans die ik enkel gekregen heb dankzij mijn twee uitzonderlijke
promotoren. Zij hebben mij niet enkel de kans gegeven te werken aan een project waar zij beiden
en ik volledig konden achterstaan, mijn promotoren hebben elke week opnieuw tijd vrijgemaakt
om me persoonlijk te begeleiden. Hiervoor dank ik mijn promotoren José Martins en Peter
Dawyndt.
Tevens dank ik ook Niels Geudens; die me telkens een antwoord bood op de vragen die ik had.
Het werken zonder begeleider geeft je vrijheid; maar laat je wel af en toe achter met talloze
vragen. Niels was altijd aanwezig om me een antwoord te bieden, maar ook de andere mensen
van de NMRSTR groep verdienen een bedanking.
Ik dank ook Sofie Van Damme, voor de uitstekende hulp tijdens de conformatie-analyses.
Ook mensen van buiten de UGent verdienen een bedanking; de dichte vriendengroep die al jaren
elkaar bijstaat; Chiel, Anthony, Matthias, Rutger en co.
Tot slot dank ik P. d. W.; dankzij jou ben ik chemie gaan studeren.
-
II
Table of Contents
Chapter 1: Introduction .................................................................................................................. 1
1.1 Context of this project ...................................................................................................... 1
1.2 Sugars, saccharides and carbohydrates: not only a source of energy ............................. 2
1.3 Oligo- and polysaccharides .............................................................................................. 3
1.4 Monosaccharides structure and conformation – a refresher .......................................... 5
1.4.1 Different forms of monosaccharides ........................................................................ 6
1.4.2 Variations in monosaccharides: the α vs β and D vs L .............................................. 7
1.4.3 Variations in conformations ..................................................................................... 8
1.5 Introduction towards TOCSY ............................................................................................ 8
1.6 Content of this project ................................................................................................... 11
Chapter 2: Current analysis methods ........................................................................................... 12
2.1 Mass Spectroscopy methods ......................................................................................... 12
2.2 HPLC methods ................................................................................................................ 13
2.3 NMR standard methods ................................................................................................. 13
2.4 The TOCSY-Matching approach (Gheysen et al.) ........................................................... 14
2.5 Comparison of the techniques ....................................................................................... 16
Chapter 3: Designing a new experiment ...................................................................................... 17
3.1 TOCSY pulse sequence ................................................................................................... 17
3.2 A novel experimental setup ........................................................................................... 17
3.3 Achieving selectivity ....................................................................................................... 18
3.4 p2D-sel TOCSY ................................................................................................................ 20
3.5 p3D-Bsel TOCSY .............................................................................................................. 23
3.6 Conclusion: p2D-sel TOCSY versus p3D-Bsel TOCSY ...................................................... 24
-
III
Chapter 4: Data processing ........................................................................................................... 26
4.1 Chunkification ................................................................................................................ 27
4.1.1 Chunkification of the p2D-sel TOCSY ............................................................................ 28
4.1.2 Chunkification of the p3D-Bsel TOCSY ......................................................................... 28
4.2 Integration of the spectra .............................................................................................. 30
4.2.1 Peak determination and noise size ......................................................................... 31
4.2.2 Determination of the integral filter and integration of the spectra ....................... 31
4.3 TQD filter ....................................................................................................................... 33
Chapter 5: Saccharide comparison and clustering ....................................................................... 36
5.1 Introduction.................................................................................................................... 36
5.2 Towards a monosaccharide database ............................................................................ 36
5.3 Monosaccharide comparison ......................................................................................... 37
5.3.1 Curve comparison: the integration method ........................................................... 38
5.3.2 Curve comparison: the Fréchet method ................................................................. 39
5.4 Clustering of the monosaccharides ................................................................................ 40
5.4.1 The first cluster: β-galactopyranose and α-L-arabinopyranose ............................. 42
5.4.2 The second and third cluster: manno-, L-rhamno- and β-lyxopyranose ................ 43
5.4.3 The fourth cluster: α-xylopyranose and α-glucopyranose ..................................... 45
5.5 The usability of the technique on furanoses.................................................................. 46
5.5.1 A computational analysis of arabinofuranose ........................................................ 46
5.5.2 A spectral analysis of ribofuranose ......................................................................... 47
5.6 Significance level the of technique ................................................................................ 48
5.6.1 The repeated measurement approach ................................................................... 48
5.6.2 The cluster approach .............................................................................................. 49
-
IV
5.6.3 The significance level: conclusion ........................................................................... 50
Chapter 6: Proof of concept & operation ..................................................................................... 51
6.1 Sucrose (p2D) ................................................................................................................. 51
6.2 A mixture of monosaccharides ...................................................................................... 52
6.2.1 The John Doe sample (p2D and p3D) ..................................................................... 52
6.2.2 The honey sample (p2D) ......................................................................................... 53
6.3 Capsular polysaccharides ............................................................................................... 54
Chapter 7: Conclusion and further research ................................................................................ 57
7.1 General conclusion ......................................................................................................... 57
7.2 Conclusion: the analysis of a furanose ........................................................................... 58
7.3 Usability of the current processing tools ....................................................................... 59
7.4 Further research and required software updates ......................................................... 59
Chapter 8: Appendix ..................................................................................................................... 60
8.1 GitHub manual ............................................................................................................... 60
8.2 Cluster analysis data for the integration method .......................................................... 65
8.3 Cluster analysis data for the Fréchet method ................................................................ 66
8.4 Ten starting conformations of beta L-arabinofuranose ................................................. 67
Chapter 9: References................................................................................................................... 68
Chapter 10: Dutch summary – Nederlandstalige samenvatting .................................................. 70
10.1 Een nieuwe experimentele aanpak ................................................................................ 70
10.2 Geautomatiseerde verwerking van de spectra .............................................................. 71
10.3 Besluit ............................................................................................................................. 71
Scientific article: To A General Applicable Monosaccharide Identification Using TOCSY Matching
-
V
List of abbreviations and technical terms
Abbreviated monosaccharides and chemical compounds are not included in this list.
1D-sel TOCSY One Dimensional selective TOCSY
CCM Curve Comparison Method
Chunk Part of the spectrum coming from one monosaccharide
Curve Refers to the curve resulting after the integration of a multiplet. Sometimes called integration or mixing time curve.
ESI-ITMS electrospray ionization ion trap mass spectrometry
fqlist Parameter used by the pulse program containing all frequencies (p2D-sel TOCSY only)
GAG GlycosAminoGlycans
GC-MS Gas Chromatography Mass Spectroscopy
HMBC Heteronuclear multiple-bond correlation spectroscopy
HPLC High Performance Liquid Chromatography
HSQC Heteronuclear single quantum coherence spectroscopy
JD #1 John Doe sample #1; sample containing three different monosaccharides
Max Curve Projection curve coming from one Chunk
m% Mass percentage notation
MS Mass Spectroscopy
NMR Nuclear Magnetic Resonance
p2D Refers to the p2D-sel TOCSY experiment
p3D Refers to the p3D-Bsel TOCSY experiment
p2D-sel TOCSY 1D-SEL TOCSY with a pseudo dimension in the mixing time
p3D-Bsel TOCSY 2D band selective TOCSY with a pseudo dimension in the mixing time
Resx Resolution in the x-dimension; number of points in the array
rga Receiver Gain Acquisition (automatic determination)
S/N-ratio Signal over Noise ratio, also abbreviated as ‘sino’
SOx/SFOx Spectral (Field) Offset; Nyquist frequency in the x-dimension; the center of the acquisition window
SWx Width of the spectral window in the x-dimension
TOCSY TOtal Correlation Spectroscopy
vclist Variable counter list; used for the different mixing times
-
VI
List of figures
Figure 1.1 Total sugar consumption worldwide (statista.com) ...................................................... 2
Figure 1.2 Starch; a branched polysaccharide ................................................................................ 4
Figure 1.3 Step by step identification ............................................................................................. 5
Figure 1.4 The pyranose form of hexose saccharides (left) and the pentose saccharides (right) . 5
Figure 1.5 Cyclization of the acyclic aldehyde to both the furanose and pyranose ....................... 6
Figure 1.6 The cyclization of D-Glucose .......................................................................................... 6
Figure 1.7 The alpha (left) and beta (right) D-Glucose form .......................................................... 7
Figure 1.8 D (left) and L (right) glucose........................................................................................... 7
Figure 1.9 The Haworth projection and both chair conformations of β-D-glucopyranose ............ 8
Figure 1.10 Comparison of a 1D-sel TOCSY of the non-anomeric region of α-glucose with
different mixing times (blue:43ms; green: 95ms) .......................................................................... 9
Figure 1.11 α-D-Mannopyranose; note the angle between the H1 and H2 will cause a bottleneck
....................................................................................................................................................... 10
Figure 1.13 The Karplus-relation ................................................................................................... 10
Figure 1.13 Dihedral angle between four atoms .......................................................................... 10
Figure 2.1 1D HNMR spectrum of Galactose ................................................................................ 13
Figure 2.2 NMR flowchart for monosaccharide determination (Touckach, 2013) ...................... 14
Figure 2.3 Gheysen determination table (60ms) .......................................................................... 14
Figure 2.4 A 2D-TOCSY spectra of sucrose (700MHz; 60ms). The signal corresponding to the
anomeric hydrogen and its cross peaks have been indicated. ..................................................... 15
Figure 3.1 Default pulse program for a 2D TOCSY ........................................................................ 17
Figure 3.2 An overlay of a regular 1D-HNMR (blue); a 1D-HNMR using 90° selective pulse (red)
centered anomeric signal (yet partly exciting water). In order to improve the 90° selective pulse
results, one can lengthen the pulse in order to increase selectivity to further reduce the
intensity of the water signal. Last, a 1D-HNMR using selective spin-echo (green) in order to
excite the entire anomeric region. ............................................................................................... 19
Figure 3.3 2D-TOCSY spectra of JD #1 (100ms – 700MHz) with a superposed band selective
indication. The anomeric hydrogen region on the diagonal has been indicated. ........................ 19
file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439957file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439958file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439959file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439960file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439961file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439962file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439963file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439964file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439965file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439966file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439966file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439967file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439967file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439968file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439969file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439970file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439971file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439972file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439973file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439973file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439974file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439976file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439976
-
VII
Figure 3.4 1D proton nmr of sucrose (green); 1D-sel TOCSY of sucrose (100ms) using selectivity
on the anomeric hydrogen using both the dipsi-2 (red) as mlev-17 (blue) sequence on 700MHz.
Only the non-anomeric region is shown. The effect of the z-filter in the DIPSI-2 sequence is
clearly visible around 3.45ppm upon comparison of the DIPSI-2 and mlev-17 sequences. ........ 21
Figure 3.5 A 3D (left) and 2D (right) representation of a p2D-sel TOCSY of β-glucopyranose (the
time domain axes are scaled incorrectly) ..................................................................................... 22
Figure 3.6 Data resulting from a p3D experiment (JD sample). Data identical to the p2D
experiment has been indicated. ................................................................................................... 24
Figure 4.1 Flowchart of the processing. ........................................................................................ 26
Figure 4.2 p2D-sel TOCSY taken on alpha (bottom) and beta (top) glucose directly from Topspin
....................................................................................................................................................... 27
Figure 4.3 The data structure of the p3D-Bsel TOCSY as a list of 2D-TOCSY after the dimension
flip (the commas indicate that there are multiple elements in the array) .................................. 28
Figure 4.4 The diagonal extracted automatically from JD#1 as priviously shown in Figure 3.2 .. 29
Figure 4.5 Projection of one chunk to achieve optimal signal to noise ratio for each individual
peak ............................................................................................................................................... 30
Figure 4.6 Peak determination on the diagonal of the p3D of JD#1 ............................................ 31
Figure 4.7 Peak region determination for the middle peak of a triplet, only depicted for the right
side. (The noise limit is exaggerated and is only for demonstration purpoces) .......................... 32
Figure 4.8 The integral filter obtained form the spectrum (α-glucose) ....................................... 33
Figure 4.9 The application of the TQD-filter on Sucrose .............................................................. 34
Figure 4.10 Chunk Plot of α-Glucose ............................................................................................ 35
Figure 5.1 Integration method applied on two curves ................................................................. 39
Figure 5.2 The Fréchet distance calculation (left) and two different curves with high Fréchet
distance (right), yet still a high probability of similarity using the integration method ............... 39
Figure 5.3 Frequency (left) and scatter (right top) plot of both methods of the probability two
monosaccharides of the database arre identical. ........................................................................ 41
Figure 5.4 HCA on the Integration Method Matrix, with the significance level of 0.75, the
determined clusters are indicated in blue. ................................................................................... 42
file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439978file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439978file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439979file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439979file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439980file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439981file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439981file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439982file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439982file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439983file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439984file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439984file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439985file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439986file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439986file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439987file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439988file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439989file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439990file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439991file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439991file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439992file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439992file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439993file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439993
-
VIII
Figure 5.5 β-galactopyranose (left) and α-L-arabinopyranose (right) .......................................... 43
Figure 5.6 Plot of the TOCSY cross correlation intensities vs mixing time of the first cluster ..... 43
Figure 5.7 β-manno-, β-L-rhamno- and β-lyxopyranose respectively .......................................... 43
Figure 5.8 α-manno-, α-L-rhamno- and α-lyxopyranose respectively (top) and both chair
conformations of α-lyxose (bottom) ............................................................................................ 44
Figure 5.9 Chunk plot of the monosaccharides in the second cluster ......................................... 44
Figure 5.10 Chunk plot of α-glucose and α-xylose of the non-anomeric region. ......................... 45
Figure 5.11 Polar Cremer-Pople plot of the initial (orange dots) conformations, with an arrow to
there eventual conformation (blue globe). The lower the energy, the bigger the globe.The final
conformaiton of 2 and 7 has been drawn on the left top. ........................................................... 47
Figure 5.12 A chunk plot of both ribofuranoses. .......................................................................... 48
Figure 5.13 Frequency plot of both methods of the similarity matrix ......................................... 49
Figure 6.1 Results of a Sucrose; using integration method with the minima criterion, showing all
curves but the anomeric signal. .................................................................................................... 51
Figure 6.2 Two chunks of the p2D spectra of the JD sample showing overlap. ........................... 52
Figure 6.3 Chunk plot of β-glucose (red), β-galactose (green) and β-xylose (blue) ..................... 53
Figure 6.4 1D proton NMR of the 19F (blue) and 22F (red) seroptype. The 22F seroptype has
been shifted 0.1 ppm to the right. ................................................................................................ 55
Figure 7.1 1D proton NMR (red) and the result of a selective pulse as used by the p3D
experiment (selgpse), with identical number of scans and scaling on the 19F sample. .............. 57
Figure 7.2 p2D spectra of ribofuranose taken on a 700MHz spectrometer. ............................... 58
file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439994file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439995file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439996file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439997file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439997file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439998file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439999file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440001file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440002file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440003file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440003file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440004file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440005file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440006file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440006file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440007file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440007file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440008
-
IX
List of Tables
Table 1 All monosaccharides in the database .............................................................................. 37
Table 2 Standard deviation of the techniques on sucrose. .......................................................... 49
Table 3 Processing hits for both the p2D and p3D experiment using the integration method and
the minima critereon. ................................................................................................................... 52
Table 4 Experimental settings and for the CPS for both the p2D and p3D experiment. ............. 55
-
1
Chapter 1: Introduction
1.1 Context of this project
Monosaccharide identification is not only used to identify and quantify monosaccharide
composition, it is also a critical step in the analysis of large carbohydrates. The analysis of these
carbohydrates is not only of high importance in the food industry, but also in medicinal chemistry.
In medicinal chemistry, it is often the case that interactions between a saccharide compound and
proteins are studied; yet the monosaccharide type and sequence is not always completely
determined due to the high complexity of the compound and the costs of analysis.
This work: Towards a general applicable monosaccharide identification using TOCSY matching
is a collaboration between two research groups at the UGent, the NMRSTR Research group
(department of Molecular and Macromolecular Chemistry) and the Computational Biology Lab
(Department of Applied Mathematics, Computer Science and Statistics) with the aim to propose
a fast, automatic identification and processing method for monosaccharides and polysaccharide
composition. This work is not connected to an ongoing doctoral thesis, however the foundations
have been laid by K. Gheysen (Gheysen, 2011), who has written her doctorate partially on this
subject. The TOCSY-matching approach, the concept introduced by Gheysen et al., is revisited in
this work and extended by developing an automated analysis. The TOCSY matching approach
allows a fast monosaccharide identification without a full analysis of the spectrum. In its current
implementation, analysis typically proceeds using a single 2D TOCSY at a specific mixing time.
This analysis occurs in a manual, user defined mode and is therefore not only prone to user bias,
but also limits the amount of different monosaccharides that can be examined. This is due to the
fact that the amount of monosaccharides that were investigated was limited, which could cause
misinterpretation of the spectra. In this work, a quantitative algorithm driven by the TOCSY
matching is developed in order to avoid these drawbacks. In addition to having a larger pyranose
database, the method has been extended to include a furanose sugar in order to assess the
strengths and limitations of the technique itself. In order to achieve this, a novel experimental
approach was developed, giving enhanced spectra that does not necessarily extends measuring
time upon increased complexity.
-
2
1.2 Sugars, saccharides and carbohydrates: not only a source of energy
A world without sugars is unthinkable in the 21st century. Every year the world population
consumes about 170 million metric tons of sugar (statista.com, n.d.). Sugars are in nearly every
food source, often added in order to further sweeten food (these are often referred to as added
sugars).
These added sugars are in almost every food product, going from yoghurt to soda drinks, a can
of cola for example easily contains 10m% carbohydrates and depending of the soda, the sugar
composition will be different. Normal Coca-Cola® contains high fructose corn syrup. This is a
combination of different corn syrups that have been processed using enzymes. The coke in the
‘green bottle’, also known as the ‘Mexican Coke®’ is sweetened with sucrose (of which the word
has a French origin: sucre + ose). Sucrose is a disaccharide extracted from Nature (from sugar
canes for example) and often referred to as table sugar. These added sugars are added to food
artificially, resulting in an extended knowledge of the carbohydrate composition.
When sugars are naturally present in the food substance, knowing their carbohydrate
composition i.e. the type and sequence of the monosaccharides is no longer straightforward.
Natural variation occurs caused by different origins or even weather conditions. A complex
example is honey, which is directly harvested from Nature and is almost entirely made from
sugars (mostly mono- and disaccharides). The foraging bees collect sucrose rich nectar, of which
they partially digest the sucrose into glucose and fructose (using digestive enzymes) in a process
which is called regurgitation. While this process is well known, sugars not originating from
Figure 1.1 Total sugar consumption worldwide (statista.com)
-
3
sucrose i.e. glucose and fructose are found in honey as well. Maltose for instance, a disaccharide
of glucose, is an often occurring example. These variations in monosaccharide units in these
higher sugars (di- and trisaccharides) result in a partially unknown composition.
But sugars are also found and used outside the food industry, although in that context they are
usually referred to as saccharides or carbohydrates as their taste is no longer of importance. They
are often the subject of scientific research in the life sciences, as they interact with proteins and
antibodies. Upon checking Web of Science, a ‘saccharide’ search resulted in 2500 hits for
biochemistry and molecular biology alone (for the years 1955-2017). A common example for the
usage of saccharides is cancer treatment under the form of chemotherapy (Calvaresi, 2013). The
cancer cells can be targeted due to the fact that aggressive tumors demonstrate high glycolytic
rates as they are rapidly dividing cells. By tagging saccharides with toxic molecules, they are used
as a ‘Trojan horse’ to bring the toxic molecules inside the cancer cell.
In medicinal chemistry, one can find countless other applications of the usage of sugars as they
are (in general) benign for the human body. Typical fillers for drugs are starch and sugars like
lactose. These are added to give a pill a bigger size for user comfort as the active compound is
usually present in extreme small doses. The science of these saccharides, baptized to
‘glycobiology’ somewhere in the late 1980s combines the chemistry and biochemistry discipline
on the usage of saccharides. Unfortunately, these saccharides all have a highly similar chemical
structure with most of their differences resulting from differences in stereochemistry.
1.3 Oligo- and polysaccharides
Oligosaccharides are build up by monosaccharides and have between two and eight
monosaccharides linked to each by glycosidic bonds. Due to their limited length, the different
branching possibilities are limited compared to polysaccharides, however they often occur in a
branched state. They are found in food, where they serve as a prebiotic in the colon, but aren’t
used as a source of energy by our body directly.
Polysaccharides are highly occurring compounds in Nature. They are polymeric carbohydrates,
containing long chains of monosaccharides linked together by glycosidic bonds. These
polysaccharides are divided in three main groups: food storage polysaccharides (such as starch
-
4
and glycogen), structural polysaccharides (chitin and cellulose) and mucopolysaccharides (also
known as glycosaminoglycans or GAG’s).
Starch can be found in corn, potatoes, and grains etc. while glycogen is only found in animals
(and is often referred to as animal starch). Both starch and glycogen are essential for nutrition
(as they are our main source of energy) and are homopolysaccharides since only one type of
monosaccharide is present in the polysaccharide. For both starch and glycogen this
monosaccharide is glucose. Due to their simplicity in monosaccharide composition, they are of
small importance during this project, as these polysaccharides only have one type of
monosaccharide, which is well known.
Structural polysaccharides such as chitin and cellulose are found in both plants and animals. They
are responsible for the rigidity of plants and for animals, chitin is found in the shell of crustaceans
(shrimps and lobsters for example). Cellulose is a homopolysaccharide of glucose, linearly linked
over a beta 1-4 bond allowing to form long and straight chains. This straight chain conformation
results in strong fibers, giving plants there structural strength. Chitin is similar to cellulose, but
has a derivative of glucose as its monosaccharide, namely N-acetyl-glucosamine.
Pneumococcal disease (Streptococcus pneumoniae), a potentially lethal bacterial infection,
shows the importance of the analysis of polysaccharides. This bacteria causes different infections,
however a vaccine does exist (ATCC, 2015). These vaccines are based on the serotype of the
bacteria which is depending of the polysaccharide capsule of the bacteria. Knowledge with
regards to the monosaccharide composition and the glycosidic bonds of the polysaccharide
capsule are crucial in order to formulate vaccines. While these CPS have been extensively studied,
some still remain unknown (Geno & Gilbert, 2015).
Figure 1.2 Starch; a branched polysaccharide (retrieved from nutrientsreview.com)
-
5
Upon identification of the oligo- or polysaccharide, two kinds of information are required. First,
the monosaccharide composition must be determined, as it is the building block of the oligo- or
polysaccharide. Second, the way the monosaccharides are interconnected with the glycosidic
bond and their order must be determined as well. In this thesis; only the identification of the
monosaccharides themselves is being tackled. No attempt is made in order to know in which
order they are connected and which hydroxyl functions are used in this connection. The latter
typically occurs in a later stage of the analysis. The identification of monosaccharides is referred
to as step I (Figure 1.3).
1.4 Monosaccharides structure and conformation – a refresher
Monosaccharides have a chemical formula (𝐶𝐻2𝑂)𝑛 where the origin of the group name can be
seen (carbohydrates: carbon and water). They are classified into four main groups according to
the number of carbon atoms in the chain, leading to triose, tetrose, pentose and hexose
monosaccharides of which only the latter two are at the focus of this project due to the fact that
they are the ones that mostly occur in Nature.
Within each group, various saccharide structures exist as a result of differences in cyclization and
stereochemistry. In addition different conformations may need to be taken into account when
considering cyclic pentose and hexose structures. As the mass of the monosaccharides are (per
Figure 1.3 Step by step identification
Figure 1.4 The pyranose form of hexose saccharides (left) and the pentose saccharides (right)
-
6
group) identical; most standard chemical methods are only able to differentiate between a
pentose and hexose but not between the different pentoses (or hexoses) themselves as
differences are only found in the stereochemistry.
1.4.1 Different forms of monosaccharides
Monosaccharides can both occur in cyclic and acyclic forms in a chemical equilibrium. This
equilibrium is possible thanks to the aldehyde function only present in the acyclic form. Through
intramolecular addition of one of the hydroxyl functions with the aldehyde a hemiacetal is
formed, leading to a cyclic structure. An interactive representation of the cyclization can be
found on Wikimedia.1
As multiple hydroxyl functions are present on each molecule; this reversible reaction can result
in different ring sizes. For pyranose type monosaccharides the ring has six atoms; for furanose
the ring has five atoms. This means the geometrical structure of the molecule is fundamentally
different, which is exploited in this project.
_________________________ 1 https://upload.wikimedia.org/wikipedia/commons/a/af/Glucose_Fisher_to_Haworth.gif
Figure 1.5 Cyclization of the acyclic aldehyde to both the furanose and pyranose
Figure 1.6 The cyclization of D-Glucose
https://upload.wikimedia.org/wikipedia/commons/a/af/Glucose_Fisher_to_Haworth.gif
-
7
Generally speaking, pyranose is the dominant form due to the presence of added ring strain for
the furanose cycle, giving the furanose forms typical mass percentages between 2 and 15%.
1.4.2 Variations in monosaccharides: the α vs β and D vs L
Due to the hemiacetal formation a new stereo center is created leading to the formation of two
diastereoisomers referred to as alpha and beta. The newly formed hydroxyl function can be
either in the cis or trans relationship in comparison to the C6 carbon (Figure 1.7). A mixture of
both will be formed with the ratio depending on their relative free energies of formation. The α
and β denomination is defined based on the mutual relationship: trans and cis respectively for
the alcoholic functions on the C1 and the C5 carbon. In the Haworth-projection, for all D-type
sugars, the α-type monosaccharides have the anomeric hydrogen in the upward position. For the
L-type monosaccharides it will be in the downward position.
The D and L notation refers to the Fisher representation and the orientation of the hydroxyl
function indicated in blue (Figure 1.8). The D-monosaccharides are (in general) always the most
occurring in Nature. There are only a handful exceptions (such as for rhamnose). The L type sugar
is the mirror image of the D type, both are depicted in Figure 1.8 for glucose. As it is not possible
to differentiate between two mirror images using NMR, the D-type monosaccharide will (unless
otherwise mentioned) always be investigated.
Figure 1.7 The alpha (left) and beta (right) D-Glucose form
Figure 1.8 D (left) and L (right) glucose
-
8
1.4.3 Variations in conformations
Throughout this project; two different representations will be used; the Haworth projection will
be used often to compare structure and stereochemistry of different monosaccharides. However,
multiple conformations still remain possible for a single Haworth projection. Pyranoses may
occur in two chair conformations2, and when these are important, both will be shown.
The most stable chair conformation will be the one where a majority of hydroxyl functions are
positioned in the equatorial plane and are not axial on the ring. This is due to the fact that axial
groups have spherical interactions between each other destabilizing the molecule. Therefor the
chair conformation with the smallest axial groups (often a hydrogen atom) is expected to be the
more stable conformation.3 Figure 1.9 illustrates both chair conformations for α-D-glucose, the
left chair conformation is expected to be more stable as it has four hydroxyl functions and C6 in
the equatorial plane while the right conformation only has hydrogens in the equatorial plane.
1.5 Introduction towards TOCSY
TOCSY or TOtal Correlation SpectroscopY; is an NMR technique where magnetization associated
with a particular spin can be passed along the entire spin system of which it is part in a molecule.
Transfer occurs between non scalar coupled protons as long as there are scalar couplings
between intervening protons in the spin system. In poly saccharides, every monosaccharide unit
defines such a separate spin system as the glycosidic bond involves too many bonds to allow for
significant scalar coupling to occur between protons of neighboring monosaccharides.
_________________________ 2 Other conformations exist as well, yet they have higher relative energies (boat conformation, twist-boat conformation) 3 One must also take the anomeric effect in account. For certain monosaccharides it might be less apparent which chair conformation shows a lower energy. Multiple other effects might be required to be taken into account.
Figure 1.9 The Haworth projection and both chair conformations of β-D-glucopyranose
-
9
Considering the chemical structure of monosaccharides, the scalar coupling networks mostly
involve 3JHH scalar couplings between vicinal hydrogens in the spin system, resulting in a linear
network (Eq. 1). The transfer rate of magnetization within such a network is dependent of two
main factors: the mixing time (the time that is given to pass on the magnetization) and the
coupling constant values between each of the vicinal hydrogens (3JHH). The coupling constant will
determine the magnetization transfer speed. Small couplings will reduce the speed of
magnetization transfer; while large a 3JHH coupling constant will lead to rapid transfer.
Due to the cyclic nature of monosaccharides, a monosaccharide diastereomer will be in one of
both chair conformations (for the pyranose form) meaning the dihedral angles between the
vicinal hydrogens are fixed and not prone to averaging over conformations. As the dihedral angles
determine the size of the coupling, the set of coupling constants along the network is therefore
also conformation specific. This has the result that each monosaccharide is expected to have a
specific set of 3JHH-coupling constants between each of the vicinal hydrogens. This leads to a
characteristic set of couplings at which the magnetization can be passed along from the H1 to H2,
to H3 and so on. For a hexose monosaccharide; the following couplings would be obtained
starting from the anomeric hydrogen:
𝑯𝟏 𝟑𝑱𝟏𝟐→ 𝑯𝟐
𝟑𝑱𝟐𝟑→ 𝑯𝟑
𝟑𝑱𝟑𝟒→ 𝑯𝟒
𝟑𝑱𝟒𝟓→ 𝑯𝟓 𝟑𝑱𝟓𝟔′
→ 𝑯𝟔′
𝟑𝑱𝟓𝟔→ 𝑯𝟔
Eq. 1
This means the appearance of a signal or TOCSY correlation for H2 through TOCSY transfer from
H1 is to a good approximation only dependent on the mixing time and the 3𝐽12 coupling. The
signal of H3 is dependent on both the 3𝐽12 and the 3𝐽23 coupling and so on. Differences in scalar
coupling values along the transfer path from H1 to H6 will lead to a different pattern and intensity
Figure 1.10 Comparison of a 1D-sel TOCSY of the non-anomeric region of α-glucose with different mixing times (blue:43ms; green: 95ms)
-
10
of TOCSY correlations depending on the particular monosaccharide. When a particular 3JHH is
small, this will lead to a transfer bottle-neck, preventing significant transfer of magnetization to
the remainder of the spin system. This is for instance the case in α-D-mannopyranose (Figure
1.11), where the 3J12 scalar coupling is very low (approximately 1Hz) causing a bottleneck for the
entire spin system. Unfortunately, this bottleneck effect will make it hard to analyze certain
monosaccharides that have an identical conformation before the bottleneck in the spin system
(see 5.4.2).
The size of all the individual couplings therefor has a big influence on how the TOCSY pattern
changes upon varying the mixing time. As previously mentioned, the size of the individual
coupling is dependent on the dihedral angle between the vicinal hydrogens. The Karplus-relation
attempts to describe the correlations between each vicinal hydrogens (3JHH):
3𝐽𝐻𝐻 = 𝐴𝑐𝑜𝑠2𝜃 + 𝐵𝑐𝑜𝑠𝜃 + 𝐶 Eq. 2
The A, B and C are 7.76; -1.1 and 1.4 respectively for unsubstituted monosaccharides (Haasnoot,
Deleeuw, & Altona, 1980). As the A, B and C are known, the only unknown variable of the 3JHH-
coupling is the dihedral angle (the concept of a dihedral angle is shown in Figure 1.13).
Using the TOCSY experiment, the rate at which magnetization propagates to the other hydrogen
atoms in the spin system can be mapped. Different correlations will be visible depending on the
mixing time due to the bottlenecks. In order to compensate against this and to achieve a high
Figure 1.13 The Karplus-relation Figure 1.13 Dihedral angle between four atoms
Figure 1.11 α-D-Mannopyranose; note the angle between the H1 and H2 will cause a bottleneck
-
11
signal to noise ratio for all signals, high mixing times (100ms) are most often used. However, due
to technical limitations of the hardware (mostly the probe); a safety limit on the mixing time has
been set to 130ms.
1.6 Content of this project
During this project, a new experimental setup is proposed in order to reduce the setup time of
the NMR experiments and automate the setup further. This is done using two different
experimental methods (pseudo 2D and pseudo 3D) and is discussed in chapter three. In order to
automate the processing and provide a quantitative comparison between monosaccharides, a
fully automatic processing script is written in Python (chapter four). It must be noted that chapter
four is very technical, as it describes the processing in detail. Next, chapter five compares the
measured monosaccharides against each other in a reference database. Chapter two contains a
short description of other techniques used in order to analyze monosaccharides and their (dis-)
advantages, including NMR and the current TOCSY-matching approach. The thesis is concluded
with a few case studies in chapter six and a conclusion in chapter seven.
-
12
Chapter 2: Current analysis methods
There are many simple methods to quantify carbohydrates in substances such as food. However,
methods having identification of chemical structure and composition as a goal are few and more
involved. Depending of the chain length (polysaccharides or oligosaccharides), techniques
involve partial or total hydrolysis in order to split oligosaccharides into the individual
monosaccharides and require an extended knowledge of organic chemistry. When total
hydrolysis is used, information on the glycosylation pattern and overall structure is lost. Some
techniques also require modification of the saccharides, which adds another step to the analysis
process.
2.1 Mass Spectroscopy methods
Using mass-spectroscopy, it does not seem arbitrary to differentiate between diastereoisomers
as these have an identical molar mass. However, it has been proven to be possible to differentiate
between three different hexoses (glucose, galactose and mannose) using ESI-ITMS in positive ion
mode. It was however not possible to differentiate between α and β monosaccharides (Zhu &
Sato, 2007). In this research only the monosaccharide composition was investigated, no attempt
was made to determine the saccharide sequence and the nature of the glycosidic linkages.
However it has been proven possible to do so using MS-MS. The data given by Zhu shows that
the technique requires extended use of wet chemistry and also lacks information on which
monosaccharides can be differentiated, as the amount of monosaccharides investigated was
fairly limited.
Using GC-MS, it is possible to do a full but destructive analysis of monosaccharide composition.
In order to do step I of the analysis (chapter 1.3), a full hydrolysis is however required. This can
be done with hydrochloric acid for example. After the hydrolysis, many derivatization methods
can be applied, such as silylation and fluoroacylation (Sassaki & Souza, 2013) to enable GC-MS
analysis. The identity is typically confirmed by using suitable monosaccharide reference
standards.
-
13
2.2 HPLC methods
This approach shows similarities to GC-MS as it also requires hydrolysis in the case of oligo- or
polysaccharides. Once a raw mixture of monosaccharides is obtained; the retention time of each
monosaccharide is determined using chromatography. The retention time is monosaccharide
specific giving a method for monosaccharide identification (Saddic & Ebert, n.d.). The exact
recipes for this analysis can be found in the paper of Saddic as this method uses ‘wet chemistry’.
Similar to MS methods, this technique is destructive for the sample yet it is able to differentiate
between multiple monosaccharides.
2.3 NMR standard methods
Upon the analysis of a single monosaccharide sample, a 1D proton NMR will be sufficient. But
once the sample consists of either multiple monosaccharides or oligo-/polysaccharides, a 1D
proton NMR spectrum will generally no longer suffice as there will be a significant amount of
spectral overlap between the signals of the different units. Unfortunately, chemical shifts are not
completely compound specific, as they are dependent of the chemical environment as well. As
such their usage is not advised for the identification accept for the anomeric proton.4 The
anomeric proton has a high chemical shift (4.5-6 ppm) due to the vicinity of the alcohol and ether
oxygen atom. The rest of the hydrogens on the monosaccharides are all in the same area (3-4.2
ppm) of the spectrum and as small deviations can occur, it is not possible to use this for a correct
annotation of the spectra.
_________________________ 4 The anomeric proton is responsible for the alpha or beta notation; it is indicated with H1.
Figure 2.1 1D HNMR spectrum of Galactose
-
14
For these more complex samples, two dimensional experiments are required. Using the classical
homonuclear (only one type of atom is measured in two dimensions: COSY, TOCSY,
NOESY/ROESY) and heteronuclear experiments (HSQC, HMBC involving 1H/13C) it is possible to
identify monosaccharides. This process requires first a full annotation of the spectrum, after
which the conformational analysis will determine which monosaccharide has been measured. As
the usage of chemical shift is not possible for non-anomeric hydrogens, the usage of scalar
coupling constants between the different hydrogens and heteronuclear experiments such as the
HSQC and HMBC are required in order to complete the structural analysis. To show the
complexity of this process, a possible work flowchart has been shown in Figure 2.2. This process
for monosaccharide and polysaccharide determination has been described extensively by Guus
& Gotfredsen and will not be repeated in this project.
2.4 The TOCSY-Matching approach (Gheysen et al.)
As the current NMR-technique to identify different carbohydrates is
very extensive, time consuming and requires manual examination of
the data, Gheysen et al. has proposed a new approach called TOCSY-
matching. While TOCSY-matching does not allow a full annotation of
the spectra, it gives a new approach to quickly identify the content of
monosaccharide, considerably facilitating a full annotation in the
subsequent analysis. The TOCSY-matching approach consists of
Figure 2.2 NMR flowchart for monosaccharide determination (Touckach, 2013)
Figure 2.3 Gheysen determination table (60ms)
-
15
taking one 2D-TOCSY at a chosen mixing time (often 100ms) of an oligosaccharide (or a mixture
of monosaccharides. The TOCSY trace from the anomeric signals is analyzed either along the F1
or the F2 dimension (the 2D TOCSY is symmetrical by design and both directions should bring an
identical result although there is a resolution difference).
Using this it is possible to compare the peak intensity per monosaccharide with the diagrams
provided by Gheysen (the 60ms chart is shown in Figure 2.3). These were made using multiple
2D TOCSY spectra for each monosaccharide (at 30, 60 and 100ms mixing time) in the matching
charts. The monosaccharide in Figure 2.4 shows one intense peak (the diagonal peak) and four
medium intensity peaks. When comparing this with the diagram, it tells us that this
monosaccharide in the sucrose sample, is one out of both glucoses. However it is not possible to
determine the stereochemistry of the anomeric hydrogen (α or β) in this case. This specific case
results in the fact that to differentiate between the alpha and beta glucose, a second 2D-TOCSY
must be taken at a mixing time of 30ms. Only then will we be able to correctly identify α-D-
glucose. While taking extra 2D TOCSY spectra at different mixing times works in the case of
glucose, none of the mixing times used in the matching charts enable us to differentiate between
the α and β form of galactose for instance.
There are three main concerns about this implementation of the TOCSY-matching approach.
First, it will only become apparent during the analysis and the processing of the spectra that an
Figure 2.4 A 2D-TOCSY spectra of sucrose (700MHz; 60ms). The signal corresponding to the anomeric hydrogen and its cross peaks have been indicated.
-
16
extra 2D TOCSY is required. This can cause delays in the analysis of the entire sample5. Upon
taking the extra 2D TOCSY spectrum, it is probable that it is impossible to differentiate between
the monosaccharides. The second and main concern is the limited amount of monosaccharides
investigated previously. As a result, it is unknown if monosaccharides not present in the charts
might show identical patterns. Xylose for instance, a pentose, might show identical peaks in the
charts and be misidentified as glucose. Finally, a third item is that the method only provides a
qualitative determination and does not provide a quantitative analysis. The analysis is therefore
prone to user bias.
2.5 Comparison of the techniques
While multiple techniques do exist, all have their own advantages and disadvantages. Both mass
spectroscopy and HPLC are able to analyze a mixture of monosaccharides and the chemical
composition of polysaccharides (step I). It not possible however, to analyze the glycosidic bonds
interconnecting the monosaccharides in an oligo- or polysaccharide.6 The main advantage of
both these techniques is the low sample amount required. NMR is able to identify every
monosaccharide and the correct glycosidic bond, yet it requires a much higher amount of sample.
A second disadvantage of NMR is the high cost of an NMR apparatus.
Another important factor that must be taken into account is that it is unknown which
monosaccharides can all be differentiated from each other. Different mass spectroscopy
methods and chromatography methods often include three or four monosaccharides in order to
see if the technique is able to differentiate between these and are a perfect proof of concept. A
universal identification method in order to identify any monosaccharide (independent whether
or not it is in a polysaccharide) is not always possible or does not exist, apart from a full analysis
using NMR, without the TOCSY-matching approach. It is our goal to reinvestigate the TOCSY-
matching approach in order to: expand the coverage of different mixing times, expand the
amount of monosaccharides investigated, automate the processing and explore the option for a
quantitative monosaccharide analysis.
_________________________ 5 In the case that extra measurement time has to be requested. 6 It is possible to determine the order of monosaccharides, yet not by which hydroxyl function they are interconnected. For this, NMR is required.
-
17
Chapter 3: Designing a new experiment
3.1 TOCSY pulse sequence
The default pulse program for a TOCSY experiment can be seen in Figure 3.1.7 The fid gives us a
frequency dimension after Fourier transformation and is often referred to as the direct
dimension. In order to obtain the extra information generated by the TOCSY sequence, two main
variants exist. In the 2D experiment the t1 time period is systematically increased, yielding a
series of 1D measurements from which the indirect dimension may be constructed and
subsequently Fourier transformed. The second option is to change the 90° universal pulse to a
selective pulse while keeping the t1 fixed resulting in a 1D selective TOCSY. This selective
excitation region can be placed anywhere in the 1H spectrum, resulting in the TOCSY cross
correlation pattern of the selected hydrogen.
3.2 A novel experimental setup
For the new experimental setup, the transfer of magnetization from the anomeric signal through
the entire spin system must be sampled for each saccharide. This would require a setup of
multiple 2D-TOCSY experiments, covering a mixing time of 0 to approximately 100ms and
therefore a high amount of measurement time. Indeed, depending on the instrument resolution
and sensitivity, a single 2D spectrum requires between 2 and 12 hours to record for a single
TOCSY mixing time. It should be noted however, that most of the time spent is used to record
250 to 500 1D experiments to sample the indirect time domain t1 of the 2D spectrum, so as to
achieve the required resolution for analysis of the TOCSY traces along F1. Repeating this for d 10
different mixing times would be prohibitively long. This can in principle be alleviated by using 1D-
selective TOCSY experiments. Here an individual anomeric resonance is excited and its
magnetization is then channeled in the TOCSY sequence. This results in direct generation of the
_________________________ 7 d1: relaxation delay; t1: time period, corresponds to the F1 axis after Fourier transform; fid: free induction decay.
Figure 3.1 Default pulse program for a 2D TOCSY
-
18
TOCSY trace in a 1D spectrum for that particular anomeric signal. The measuring time is now
reduced to that of a single 1D experiment repeated for as many mixing times as one wishes to
sample to record the TOCSY transfer. Thus only about 10 1D’s are now required. However, this
needs to be repeated for each individual monosaccharide. This notwithstanding, using multiple
1D-selective TOCSY experiments with changing mixing time per monosaccharide, reduces the
measurement time compared to the 2D approach. To facilitate analysis the associated pulse
program was set up so that all 1D selective TOCSY measurements for a particular monosaccharide
are recoded in a single experiment, generating a 2D like presentation with one frequency and
one mixing time axis. However, as this needs to be repeated for every anomeric signal a loop is
added so that each anomeric signal is targeted one by one generating a series of 2D’s in a single
file. This will be referred to as the p2D sequence.
Unfortunately, a minimum separation between individual anomeric resonances needs to occur
throughout, a condition which is generally not satisfied. In this case, the TOCSY traces of two or
more signals will overlap, compromising analysis. In order to address this issue, a band selective
version of the 2D TOCSY was developed, which marries the best of both worlds with a limited to
no extra time cost. The resulting p3D sequence, thus generates a 3D spectrum, consisting of two
frequency axis and a mixing time axis. Information on how to actually set up an experiment, both
the p2D as the p3D experiment, can be found online on the GitHub8, an online code repository.
Pulse programs, vclists and the processing software have been made fully available for download
in this environment.
3.3 Achieving selectivity
As previously mentioned; the TOCSY experiment consists of a 90° pulse; followed by a spinlock
sequence. To limit the range of signals along the F1 dimension to the anomeric signals and
achieve a suppression of all other peaks as shown in Figure 3.3, it is sufficient to avoid exciting
the signals outside the region of interest prior to the t1 evolution period of the TOCSY spectrum.
To achieve this; two options present themselves:
_________________________ 8 https://github.ugent.be/ydandois/Thesis-Source-Code or https://github.com/FramedYannick/Thesis-Source-Code
https://github.ugent.be/ydandois/Thesis-Source-Codehttps://github.com/FramedYannick/Thesis-Source-Code
-
19
It is possible to give a high selective 90° pulse as is done in the standard selective TOCSY
experiment. However, upon using this technique for both the p2D and p3D off-resonance effects
will be introduced causing phasing problems and non-uniform excitation, especially for the p3D
experiment where the anomeric hydrogen excitation region is quite wide. A second problem is
that the shape of these selective pulses are often Gaussians (or have a similar shape) as such that
they will still excite non-anomeric hydrogens (causing aliasing). The high selective 90° pulses can
be useful for the p2D experiment but not to excite an entire band from the 2D TOCSY plane as is
done in the p3D experiment.
A second possibility is to give a hard 90° pulse followed by a short delay, after which a
selective 180° refocusing pulse is applied to the anomeric region only, followed by a delay of
equal length. This will result in all magnetization evolving in the xy-plane, yet only the
magnetization of the inverted anomeric hydrogens are refocused by the selective inversion pulse.
By bracketing this pulse between two pulsed field gradient pulses, only the inverted signals will
Figure 3.3 2D-TOCSY spectra of JD #1 (100ms – 700MHz) with a superposed band selective indication. The anomeric hydrogen region on the diagonal has been indicated.
Figure 3.2 An overlay of a regular 1D-HNMR (blue); a 1D-HNMR using 90° selective pulse (red) centered anomeric signal (yet partly exciting water). In order to improve the 90° selective pulse results, one can lengthen the pulse in order to increase
selectivity to further reduce the intensity of the water signal. Last, a 1D-HNMR using selective spin-echo (green) in order to excite the entire anomeric region.
-
20
yield a coherent signal after the spin-lock sequence, with refocusing of chemical shift evolution.
The non-anomeric hydrogen signals remain in the xy-plane, however they are not refocused and
the gradients will almost reduce their signal contribution to zero. This technique is referred to as
gradient enhanced selective spin-echo.
Both excitation methods are compared in Figure 3.2 using a standard 1D 1H NMR (i.e. without
TOCSY). The selective 90° 27ms pulse approach (red spectrum) was given on the anomeric
hydrogen of β-galactose and the α-galactose anomeric hydrogen is not excited. Yet the solvent
peak (water) is still present and out of phase. Making the 90° selective pulse longer will increase
the selectivity but also the phase distortion of the selectively excited signal. For band-selective
excitation, a shorter selective 90° pulse is required, but this causes non-uniform excitation and
the need for large phase corrections. Therefore, the 180° refocusing pulse approach is preferred.
Using a 180° BURP 15ms pulse, both the α and β-hydrogen can be seen with expected relative
intensity and without phasing problems. Both techniques show no signals of the non-anomeric
hydrogens; and as such folding and aliasing will be prevented in our band selective TOCSY
experiment. As it is our goal to have a general experiment with limited setup, the spin-echo
technique is used throughout. It offers us the possibility to create a general applicable method
for assigning all monosaccharide units from their anomeric resonance even when the chemical
shift of these anomeric hydrogen resonances change due to the precise chemical circumstances
of the sample studied. Due to the fact the spin-echo sequence can be used with much longer
pulses (and wider refocusing regions) without any phasing problems (as seen in Figure 3.3), the
spin-echo has the clear advantage for the p3D experiment.
3.4 p2D-sel TOCSY
In principle, the 1D-sel TOCSY is taken by selectively give a 90° pulse to one anomeric hydrogen
and executing a spin-lock preventing resonances from evolving chemical shift in the xy-plane
using either the MLEV or DIPSI pulse sequence, while keeping the t1 parameter fixed. This will
cause the magnetization to be passed on towards the vicinal hydrogen networks. An example of
a 1D-sel TOCSY using the spin echo technique as previously mentioned can be found in Figure 3.4
with a 1D proton spectrum.
-
21
With the goal of combining multiple selective TOCSY with varying mixing time on the same
anomeric resonance into one single experiment, a pseudo 2D selective TOCSY pulse program was
set up as a 1D selective TOCSY adding one extra loop over the mixing time. The spectrometer
only requires one setup per experiment and as such setup time is reduced in comparison to the
previous workflow. For this, two items must be combined:
The 1D experiments must loop over all required mixing times. While Gheysen et al. showed that
both the DIPSI-2 and MLEV sequence are equally effective for the TOCSY transfer, due to the fact
that the DIPSI-2 sequence can have zero quantum coupling filtering and as such provides cleaner
spectra for an automated analysis (Figure 3.4); the DIPSI-sequence was used throughout this
project.9
The mixing times used were fixed to the same series of values (in milliseconds) for each
experiment, facilitating an automatic comparison later on:10
8.63, 17.26, 25.90, 34.53, 43.16, 51.80, 60.43, 69.06, 77.70, 86.33, 94.96, 103.60, 112.23
These are not chosen arbitrary, the length of the 90° pulse (p6) used in the DIPSI-2 sequence was
set to 25 µs and the DIPSI-2 was executed per three loops resulting in steps of 8.63ms.
_________________________ 9 This can clearly be seen in Figure 3.4 upon comparison of the triplet (3.3ppm) and doublet of doublets (3.45ppm) in the mlev-17 (blue) and dipsi-2 (red). 10 Using curve interpolation, it is possible to make the processing uniformed, as long as enough mixing times are used. This however complicates the issue and this process increases the experimental error on the data. Therefore these mixing time values were also used in the p3D experiment.
Figure 3.4 1D proton nmr of sucrose (green); 1D-sel TOCSY of sucrose (100ms) using selectivity on the anomeric hydrogen using both the dipsi-2 (red) as mlev-17 (blue) sequence on 700MHz. Only the
non-anomeric region is shown. The effect of the z-filter in the DIPSI-2 sequence is clearly visible around 3.45ppm upon comparison of the DIPSI-2 and mlev-17 sequences.
-
22
Since the p2D experiment does the selective excitation per anomeric hydrogen (meaning per
monosaccharide) present in the sample, it must loop over all hydrogen frequencies in the
anomeric region of the spectrum. Due to this, the time required for the p2D version to be
measured scales linearly with the amount of monosaccharides in the sample (+-20min per
monosaccharide for default resolution and sixteen scans). In principle, this could also be
automated, by introducing an a 1D proton scan and automatic peak determination. An extra loop
for all the frequencies required to be excited is added. This is stored in the fqlist parameter and
must then also be set by the operator. However, manual examination of the hydrogen spectrum
for the setup remains highly advised as errors would have a high impact on the spectra in this
stage.
It would also be difficult to automatically predict the selective pulse length required for selective
excitation of a certain band-width, as this depends on the distribution of signals in the anomeric
region. Higher selectivity requires longer pulse lengths. This means that a manual set up is
required for each signal any way. This also allows the receiver gain setting to amplify the signal
during detection to be optimized for each individual anomeric signal (using ‘rga’). This is
especially of interest when monosaccharide units appear in different S/N-ratio’s in the spectrum.
As the most intense signal defines the gain, the monosaccharide with the lowest signal
contribution may have a less than optimal receiver gain resulting in a reduced signal to noise ratio
in comparison to what can be achieved using the hardware with optimal settings. In the view of
this and subsequent development of the p3D sequence, it was opted not to automatize this step
of the acquisition and the user must manually enter the different frequencies.
Figure 3.5 A 3D (left) and 2D (right) representation of a p2D-sel TOCSY of β-glucopyranose (the time domain axes are scaled incorrectly)
-
23
Extensive documentation and details on how to set up an experiment can be found on the
GitHub. The total measurement time of this experiment is dependent on the amount of
frequencies set in the fqlist, meaning the more monosaccharides in the mixture that should be
analyzed, the longer the experiment will take (a linear correlation is present). This approach
works well for samples where the amount of monosaccharides is limited and the anomeric peaks
are well separated. Unfortunately, upon overlap of the anomeric hydrogen peaks in the spectra,
it will not be possible to be selective between different anomeric hydrogens and an automated
analysis will no longer by possible. Selective excitation of an anomeric hydrogen peak will always
excite a region around the anomeric hydrogen. If two signals overlap, the second anomeric
hydrogen signal will contribute to the signal measured after selective excitation. As this will
influence the profile of the signals over different mixing times, the processing software will not
be able to identify the correct monosaccharide. In such cases, it is advised to use a different
spectral approach, namely the p3D-Bsel TOCSY.
3.5 p3D-Bsel TOCSY
In Figure 3.3 a 2D TOCSY spectra is depicted of JD#111. Upon taking this spectrum over multiple
mixing times, the same experimental results will be achieved as in the p2D-sel TOCSY experiment.
But as this would be an array of 2D experiments, it would be very measurement time consuming.
As the blacked out regions in the regular 2D TOCSY spectrum are regions not required for the
automated analysis of the spectra, the parameters are set as such to only measure the white part
of the spectrum. It becomes apparent that the resolution in the F1 dimension is dependent of
two items, the spectral width that must be covered and the number of sampled points. The
former is typically 7 to 8 ppm, while the latter varies from 250 to 500, leading to minimal and
maximal resolutions (before zero filling) of 31.25 points per ppm and 71.43 points per ppm
respectively. Limiting the sample along the F1 to a band of 1ppm (as shown as the white region
in Figure 3.3), reduces the required amount of points in the F1 dimension with a factor 7-8 upon
keeping the same resolution. Also, as signals are more dispersed in the anomeric region (in
comparison to the non-anomeric region) the resolution can be further reduced.
_________________________ 11 JD#1 is a sample with the mixture of 3 different saccharides, resulting in 6 different monosaccharides
-
24
3.6 Conclusion: p2D-sel TOCSY versus p3D-Bsel TOCSY
In general, the goals for a new experimental analysis were to (1) constrain all experiments into
one experimental setup making not only the setup but also the analysis easier, (2) reduce the
required measurement time through a selective approach and (3) create a uniform mixing time
dimension for all experiments in order to facilitate an automatic processing.
Comparing the measurement time of the p3D-Bsel TOCSY with that of the p2D-sel TOCSY, it
becomes immediately apparent that the measurement time of the p3D experiment is much
longer (about 7 hours for a default resolution with a spectral window of 1ppm in the F1
dimension). It would seem the p2D experiment has the advantage on samples with a low
monosaccharide content (up to approximately 15 monosaccharides).
But the p2D-sel TOCSY does have its limitations. Due to the fact it uses selective 90° peaks on
each anomeric hydrogen, interference could present itself when two anomeric hydrogens have
similar chemical shift. Both will be excited and this will cause problems for the processing
software, which is only designed to process each monosaccharide one by one and is not using
deconvolution. This choice was made as the p3D experiment solves this issue.
Figure 3.6 Data resulting from a p3D experiment (JD sample). Data identical to the p2D experiment has been indicated.
-
25
In conclusion, the time factor is only from little importance upon choosing between the p2D and
the p3D experiment. As long as all anomeric hydrogen signals are well separated in the terms of
chemical shift, the operator is better off using the p2D-sel TOCSY. The p3D-Bsel TOCSY only
receives the advantage upon a higher complexity of the anomeric hydrogen region. As long as
the peaks are slightly separated, the p3D experiment will be able to differentiate between the
different monosaccharides due to the extra dimension. In chapter six both experiments will be
demonstrated on a mixture of multiple monosaccharides and on oligosaccharides to show this in
further detail.
-
26
Chapter 4: Data processing
The goal is to obtain monosaccharide specific curves of the peak intensities versus the mixing
time independent of the experimental setup and the experiment itself. As two different
experimental approaches were used, the data has to be put in a uniform format.
The data is first manually processed using the NMR spectrometer software Topspin® in order to
ensure optimal spectra.12 The data is read in using NMRGlue, an open source Python framework
(Helmus & Jaroniec, 2013) designed to be compatible with multiple NMR formats. Thanks to
NMRGlue, the data can almost immediately be processed using Python.
The data finds itself in a format depending on the experiment
that was used. Both p2D and p3D experiment will contain
data coming from multiple monosaccharides. This data is
split up and reorganized in a data format independent of the
experiments and only contains data corresponding to one
monosaccharide. This is called a ‘chunk’.13 These chunks will
be analyzed completely separated from each other. On each
chunk, the peak of the anomeric hydrogen and the
correlations of the remaining hydrogens must be found, their
signal must be integrated over the width of the peak in the
chemical shift dimension. This results in curves with the
intensity as a function of the mixing time for each correlation
and the anomeric signals. These curves are normalized,14
after which various filters are applied to these curves, in
order to remove multiplets (doublets, triplets and
quadruplets etc.). Chemical shift filters and threshold filters
are also used in order to filter out remaining noise and
_________________________ 12 More information on the processing using Topspin® can be found in the Github Manual. https://github.ugent.be/ydandois/Thesis-Source-Code or https://github.com/FramedYannick/Thesis-Source-Code 13 The chunkification is the only step that is different for the p2D and the p3D experiment. 14 They will all be normalized towards the anomeric hydrogen, as this signal always has the highest intensity.
Figure 4.1 Flowchart of the processing.
https://github.ugent.be/ydandois/Thesis-Source-Codehttps://github.com/FramedYannick/Thesis-Source-Code
-
27
solvent peaks. The remaining curves are monosaccharide specific and can be used for
monosaccharide comparison (chapter five).
4.1 Chunkification
Both p2D and p3D experiment contains data of multiple monosaccharides. First this must be
‘chunkified’. A chunk is the amount of data corresponding to one monosaccharide. This step is
the only step where the p2D and its p3D counterpart have different processing methods as the
chunks are designed to be in an identical format for both experiments. This has the effect that
the rest of the data processing will happen in identical manner, independent of the type of
experiment.
Due to the way the data is saved (in Bruker’s Topspin software); the data arrays in the F1, F2 and
F3 dimension will always contain 2n elements.15 This must be taking into account for the
processing in the mixing time dimension; as our experiment only contain 13 elements in the
mixing time dimension. The last elements of an array will be filled up with zero rows until a power
of two is reached. For the p2D-sel TOCSY the chunks are sequentially behind each other: if two
different frequencies were measured, rows 0-25 will be data and 26-31 will be zero rows.
_________________________ 15 With n being a positive integer, this is due to the usage of the Cooley & Tukey FFT.
Figure 4.2 p2D-sel TOCSY taken on alpha (bottom) and beta (top) glucose directly from Topspin
-
28
For the p3D-Bsel TOCSY; we will always have thirteen different mixing times; meaning plane 13-
15 will be filled with zero rows. Removal of these zero rows is important; as it will reduce
calculation times upon manipulation of the data.
4.1.1 Chunkification of the p2D-sel TOCSY
As mentioned above, for the p2D-sel TOCSY; the experiment is set up so that the different chunks
are always corresponding to sequential mixing time elements in the array.16 As we measure with
thirteen different mixing times, the first chunk consists of elements 0-12; the second chunk of
13-25 and so on. This shows that chunks will be lined up with a multiplication of 13 as the first
spectrum from a new chunk. The chunkification of the p2D spectra is straightforward, as it is just
slicing up the data array. This can be seen in Figure 4.2 where two chunks are present (rows 26-
31 which are empty rows to fill the data up to a power of 2 are already removed from the figure).
4.1.2 Chunkification of the p3D-Bsel TOCSY
For the p3D-Bsel TOCSY chunkification is less direct. The experimental data consists of three
dimensions: the F3 dimension or the direct dimension, the F2 dimension or the mixing time
dimension and the F1 dimension or the indirect dimension corresponding to the band-selective
region. As the order of dimensions in the data matrix is not identical to the on p2D experiment,
_________________________ 16 As previously mentioned, the mixing time dimension is a pseudo dimension.
Figure 4.3 The data structure of the p3D-Bsel TOCSY as a list of 2D-TOCSY after the dimension flip (the commas indicate that there are multiple elements in the array)
-
29
the F2 and F1 dimension need to be switched to ensure compatibility. After the switch, the p3D
data will have F3 as the direct dimension, F2 as the indirect dimension and F1 as the mixing time
dimension, similar to the p2D experiment.17 The setup of the data matrix can be seen in Figure
4.3. Res stands for resolution, SW for spectral window and index stands for the index of the array
(this is only used in formulas).
In order to know which part of the spectra contains the diagonal peaks and therefore the
anomeric hydrogen signals, the diagonal must be extracted. To extract the diagonal, the chemical
shift is calculated for each data point in the F2 dimension and then the index of the corresponding
data point in the F1 dimension (with the obtained chemical shift) is calculated. The ensemble of
these two operations result in the following formula:18
𝑥 =𝑆𝑂1−
1
2𝑆𝑊1−(
𝑅𝑒𝑠2−𝑦
𝑅𝑒𝑠2∗𝑆𝑊2+𝑆𝑂1
1
2𝑆𝑊2)
𝑆𝑊1∗ 𝑅𝑒𝑠1 + 𝑅𝑒𝑠1 Eq. 3
Looping over every index of the F2 dimension (index y) results in a list of coordinates of the
diagonal (list of points (x, y) located on the diagonal). This is executed on the first plane in the
p3D, as the lowest