TO A GENERAL APPLICABLE MONOSACCHARIDE IDENTIFICATION USING TOCSY … · 2017. 8. 3. · TO A...

TO A GENERAL APPLICABLE

MONOSACCHARIDE IDENTIFICATION

USING TOCSY MATCHING

Yannick Dandois Student number: 01109010

Supervisor(s): Prof. Dr. José Martins, Prof. Dr. Peter Dawyndt

A dissertation submitted to Ghent University in partial fulfilment of the requirements for the degree of

Master of Science in Chemistry

Academic year: 2016 - 2017

I

Dankwoord

De weg van de student is lang en zeker niet zonder heuvels. Maar door die enkelen blijft student

zijn toch nog altijd leuk. De kans om aan een project te werken; die niet enkel pure chemie bevat

maar ook andere aspecten in de wetenschap, van het programmeren tot de wiskunde die zich er

telkens achter verschuilt, is een kans die ik enkel gekregen heb dankzij mijn twee uitzonderlijke

promotoren. Zij hebben mij niet enkel de kans gegeven te werken aan een project waar zij beiden

en ik volledig konden achterstaan, mijn promotoren hebben elke week opnieuw tijd vrijgemaakt

om me persoonlijk te begeleiden. Hiervoor dank ik mijn promotoren José Martins en Peter

Dawyndt.

Tevens dank ik ook Niels Geudens; die me telkens een antwoord bood op de vragen die ik had.

Het werken zonder begeleider geeft je vrijheid; maar laat je wel af en toe achter met talloze

vragen. Niels was altijd aanwezig om me een antwoord te bieden, maar ook de andere mensen

van de NMRSTR groep verdienen een bedanking.

Ik dank ook Sofie Van Damme, voor de uitstekende hulp tijdens de conformatie-analyses.

Ook mensen van buiten de UGent verdienen een bedanking; de dichte vriendengroep die al jaren

elkaar bijstaat; Chiel, Anthony, Matthias, Rutger en co.

Tot slot dank ik P. d. W.; dankzij jou ben ik chemie gaan studeren.

II

Table of Contents

Chapter 1: Introduction .................................................................................................................. 1

1.1 Context of this project ...................................................................................................... 1

1.2 Sugars, saccharides and carbohydrates: not only a source of energy ............................. 2

1.3 Oligo- and polysaccharides .............................................................................................. 3

1.4 Monosaccharides structure and conformation – a refresher .......................................... 5

1.4.1 Different forms of monosaccharides ........................................................................ 6

1.4.2 Variations in monosaccharides: the α vs β and D vs L .............................................. 7

1.4.3 Variations in conformations ..................................................................................... 8

1.5 Introduction towards TOCSY ............................................................................................ 8

1.6 Content of this project ................................................................................................... 11

Chapter 2: Current analysis methods ........................................................................................... 12

2.1 Mass Spectroscopy methods ......................................................................................... 12

2.2 HPLC methods ................................................................................................................ 13

2.3 NMR standard methods ................................................................................................. 13

2.4 The TOCSY-Matching approach (Gheysen et al.) ........................................................... 14

2.5 Comparison of the techniques ....................................................................................... 16

Chapter 3: Designing a new experiment ...................................................................................... 17

3.1 TOCSY pulse sequence ................................................................................................... 17

3.2 A novel experimental setup ........................................................................................... 17

3.3 Achieving selectivity ....................................................................................................... 18

3.4 p2D-sel TOCSY ................................................................................................................ 20

3.5 p3D-Bsel TOCSY .............................................................................................................. 23

3.6 Conclusion: p2D-sel TOCSY versus p3D-Bsel TOCSY ...................................................... 24

III

Chapter 4: Data processing ........................................................................................................... 26

4.1 Chunkification ................................................................................................................ 27

4.1.1 Chunkification of the p2D-sel TOCSY ............................................................................ 28

4.1.2 Chunkification of the p3D-Bsel TOCSY ......................................................................... 28

4.2 Integration of the spectra .............................................................................................. 30

4.2.1 Peak determination and noise size ......................................................................... 31

4.2.2 Determination of the integral filter and integration of the spectra ....................... 31

4.3 TQD filter ....................................................................................................................... 33

Chapter 5: Saccharide comparison and clustering ....................................................................... 36

5.1 Introduction.................................................................................................................... 36

5.2 Towards a monosaccharide database ............................................................................ 36

5.3 Monosaccharide comparison ......................................................................................... 37

5.3.1 Curve comparison: the integration method ........................................................... 38

5.3.2 Curve comparison: the Fréchet method ................................................................. 39

5.4 Clustering of the monosaccharides ................................................................................ 40

5.4.1 The first cluster: β-galactopyranose and α-L-arabinopyranose ............................. 42

5.4.2 The second and third cluster: manno-, L-rhamno- and β-lyxopyranose ................ 43

5.4.3 The fourth cluster: α-xylopyranose and α-glucopyranose ..................................... 45

5.5 The usability of the technique on furanoses.................................................................. 46

5.5.1 A computational analysis of arabinofuranose ........................................................ 46

5.5.2 A spectral analysis of ribofuranose ......................................................................... 47

5.6 Significance level the of technique ................................................................................ 48

5.6.1 The repeated measurement approach ................................................................... 48

5.6.2 The cluster approach .............................................................................................. 49

IV

5.6.3 The significance level: conclusion ........................................................................... 50

Chapter 6: Proof of concept & operation ..................................................................................... 51

6.1 Sucrose (p2D) ................................................................................................................. 51

6.2 A mixture of monosaccharides ...................................................................................... 52

6.2.1 The John Doe sample (p2D and p3D) ..................................................................... 52

6.2.2 The honey sample (p2D) ......................................................................................... 53

6.3 Capsular polysaccharides ............................................................................................... 54

Chapter 7: Conclusion and further research ................................................................................ 57

7.1 General conclusion ......................................................................................................... 57

7.2 Conclusion: the analysis of a furanose ........................................................................... 58

7.3 Usability of the current processing tools ....................................................................... 59

7.4 Further research and required software updates ......................................................... 59

Chapter 8: Appendix ..................................................................................................................... 60

8.1 GitHub manual ............................................................................................................... 60

8.2 Cluster analysis data for the integration method .......................................................... 65

8.3 Cluster analysis data for the Fréchet method ................................................................ 66

8.4 Ten starting conformations of beta L-arabinofuranose ................................................. 67

Chapter 9: References................................................................................................................... 68

Chapter 10: Dutch summary – Nederlandstalige samenvatting .................................................. 70

10.1 Een nieuwe experimentele aanpak ................................................................................ 70

10.2 Geautomatiseerde verwerking van de spectra .............................................................. 71

10.3 Besluit ............................................................................................................................. 71

Scientific article: To A General Applicable Monosaccharide Identification Using TOCSY Matching

V

List of abbreviations and technical terms

Abbreviated monosaccharides and chemical compounds are not included in this list.

1D-sel TOCSY One Dimensional selective TOCSY

CCM Curve Comparison Method

Chunk Part of the spectrum coming from one monosaccharide

Curve Refers to the curve resulting after the integration of a multiplet. Sometimes called integration or mixing time curve.

ESI-ITMS electrospray ionization ion trap mass spectrometry

fqlist Parameter used by the pulse program containing all frequencies (p2D-sel TOCSY only)

GAG GlycosAminoGlycans

GC-MS Gas Chromatography Mass Spectroscopy

HMBC Heteronuclear multiple-bond correlation spectroscopy

HPLC High Performance Liquid Chromatography

HSQC Heteronuclear single quantum coherence spectroscopy

JD #1 John Doe sample #1; sample containing three different monosaccharides

Max Curve Projection curve coming from one Chunk

m% Mass percentage notation

MS Mass Spectroscopy

NMR Nuclear Magnetic Resonance

p2D Refers to the p2D-sel TOCSY experiment

p3D Refers to the p3D-Bsel TOCSY experiment

p2D-sel TOCSY 1D-SEL TOCSY with a pseudo dimension in the mixing time

p3D-Bsel TOCSY 2D band selective TOCSY with a pseudo dimension in the mixing time

Resx Resolution in the x-dimension; number of points in the array

rga Receiver Gain Acquisition (automatic determination)

S/N-ratio Signal over Noise ratio, also abbreviated as ‘sino’

SOx/SFOx Spectral (Field) Offset; Nyquist frequency in the x-dimension; the center of the acquisition window

SWx Width of the spectral window in the x-dimension

TOCSY TOtal Correlation Spectroscopy

vclist Variable counter list; used for the different mixing times

VI

List of figures

Figure 1.1 Total sugar consumption worldwide (statista.com) ...................................................... 2

Figure 1.2 Starch; a branched polysaccharide ................................................................................ 4

Figure 1.3 Step by step identification ............................................................................................. 5

Figure 1.4 The pyranose form of hexose saccharides (left) and the pentose saccharides (right) . 5

Figure 1.5 Cyclization of the acyclic aldehyde to both the furanose and pyranose ....................... 6

Figure 1.6 The cyclization of D-Glucose .......................................................................................... 6

Figure 1.7 The alpha (left) and beta (right) D-Glucose form .......................................................... 7

Figure 1.8 D (left) and L (right) glucose........................................................................................... 7

Figure 1.9 The Haworth projection and both chair conformations of β-D-glucopyranose ............ 8

Figure 1.10 Comparison of a 1D-sel TOCSY of the non-anomeric region of α-glucose with

different mixing times (blue:43ms; green: 95ms) .......................................................................... 9

Figure 1.11 α-D-Mannopyranose; note the angle between the H1 and H2 will cause a bottleneck

....................................................................................................................................................... 10

Figure 1.13 The Karplus-relation ................................................................................................... 10

Figure 1.13 Dihedral angle between four atoms .......................................................................... 10

Figure 2.1 1D HNMR spectrum of Galactose ................................................................................ 13

Figure 2.2 NMR flowchart for monosaccharide determination (Touckach, 2013) ...................... 14

Figure 2.3 Gheysen determination table (60ms) .......................................................................... 14

Figure 2.4 A 2D-TOCSY spectra of sucrose (700MHz; 60ms). The signal corresponding to the

anomeric hydrogen and its cross peaks have been indicated. ..................................................... 15

Figure 3.1 Default pulse program for a 2D TOCSY ........................................................................ 17

Figure 3.2 An overlay of a regular 1D-HNMR (blue); a 1D-HNMR using 90° selective pulse (red)

centered anomeric signal (yet partly exciting water). In order to improve the 90° selective pulse

results, one can lengthen the pulse in order to increase selectivity to further reduce the

intensity of the water signal. Last, a 1D-HNMR using selective spin-echo (green) in order to

excite the entire anomeric region. ............................................................................................... 19

Figure 3.3 2D-TOCSY spectra of JD #1 (100ms – 700MHz) with a superposed band selective

indication. The anomeric hydrogen region on the diagonal has been indicated. ........................ 19

file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439957file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439958file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439959file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439960file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439961file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439962file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439963file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439964file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439965file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439966file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439966file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439967file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439967file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439968file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439969file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439970file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439971file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439972file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439973file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439973file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439974file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439975file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439976file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439976

VII

Figure 3.4 1D proton nmr of sucrose (green); 1D-sel TOCSY of sucrose (100ms) using selectivity

on the anomeric hydrogen using both the dipsi-2 (red) as mlev-17 (blue) sequence on 700MHz.

Only the non-anomeric region is shown. The effect of the z-filter in the DIPSI-2 sequence is

clearly visible around 3.45ppm upon comparison of the DIPSI-2 and mlev-17 sequences. ........ 21

Figure 3.5 A 3D (left) and 2D (right) representation of a p2D-sel TOCSY of β-glucopyranose (the

time domain axes are scaled incorrectly) ..................................................................................... 22

Figure 3.6 Data resulting from a p3D experiment (JD sample). Data identical to the p2D

experiment has been indicated. ................................................................................................... 24

Figure 4.1 Flowchart of the processing. ........................................................................................ 26

Figure 4.2 p2D-sel TOCSY taken on alpha (bottom) and beta (top) glucose directly from Topspin

....................................................................................................................................................... 27

Figure 4.3 The data structure of the p3D-Bsel TOCSY as a list of 2D-TOCSY after the dimension

flip (the commas indicate that there are multiple elements in the array) .................................. 28

Figure 4.4 The diagonal extracted automatically from JD#1 as priviously shown in Figure 3.2 .. 29

Figure 4.5 Projection of one chunk to achieve optimal signal to noise ratio for each individual

peak ............................................................................................................................................... 30

Figure 4.6 Peak determination on the diagonal of the p3D of JD#1 ............................................ 31

Figure 4.7 Peak region determination for the middle peak of a triplet, only depicted for the right

side. (The noise limit is exaggerated and is only for demonstration purpoces) .......................... 32

Figure 4.8 The integral filter obtained form the spectrum (α-glucose) ....................................... 33

Figure 4.9 The application of the TQD-filter on Sucrose .............................................................. 34

Figure 4.10 Chunk Plot of α-Glucose ............................................................................................ 35

Figure 5.1 Integration method applied on two curves ................................................................. 39

Figure 5.2 The Fréchet distance calculation (left) and two different curves with high Fréchet

distance (right), yet still a high probability of similarity using the integration method ............... 39

Figure 5.3 Frequency (left) and scatter (right top) plot of both methods of the probability two

monosaccharides of the database arre identical. ........................................................................ 41

Figure 5.4 HCA on the Integration Method Matrix, with the significance level of 0.75, the

determined clusters are indicated in blue. ................................................................................... 42

file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439977file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439978file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439978file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439979file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439979file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439980file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439981file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439981file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439982file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439982file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439983file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439984file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439984file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439985file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439986file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439986file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439987file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439988file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439989file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439990file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439991file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439991file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439992file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439992file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439993file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439993

VIII

Figure 5.5 β-galactopyranose (left) and α-L-arabinopyranose (right) .......................................... 43

Figure 5.6 Plot of the TOCSY cross correlation intensities vs mixing time of the first cluster ..... 43

Figure 5.7 β-manno-, β-L-rhamno- and β-lyxopyranose respectively .......................................... 43

Figure 5.8 α-manno-, α-L-rhamno- and α-lyxopyranose respectively (top) and both chair

conformations of α-lyxose (bottom) ............................................................................................ 44

Figure 5.9 Chunk plot of the monosaccharides in the second cluster ......................................... 44

Figure 5.10 Chunk plot of α-glucose and α-xylose of the non-anomeric region. ......................... 45

Figure 5.11 Polar Cremer-Pople plot of the initial (orange dots) conformations, with an arrow to

there eventual conformation (blue globe). The lower the energy, the bigger the globe.The final

conformaiton of 2 and 7 has been drawn on the left top. ........................................................... 47

Figure 5.12 A chunk plot of both ribofuranoses. .......................................................................... 48

Figure 5.13 Frequency plot of both methods of the similarity matrix ......................................... 49

Figure 6.1 Results of a Sucrose; using integration method with the minima criterion, showing all

curves but the anomeric signal. .................................................................................................... 51

Figure 6.2 Two chunks of the p2D spectra of the JD sample showing overlap. ........................... 52

Figure 6.3 Chunk plot of β-glucose (red), β-galactose (green) and β-xylose (blue) ..................... 53

Figure 6.4 1D proton NMR of the 19F (blue) and 22F (red) seroptype. The 22F seroptype has

been shifted 0.1 ppm to the right. ................................................................................................ 55

Figure 7.1 1D proton NMR (red) and the result of a selective pulse as used by the p3D

experiment (selgpse), with identical number of scans and scaling on the 19F sample. .............. 57

Figure 7.2 p2D spectra of ribofuranose taken on a 700MHz spectrometer. ............................... 58

file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439994file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439995file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439996file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439997file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439997file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439998file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484439999file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440000file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440001file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440002file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440003file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440003file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440004file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440005file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440006file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440006file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440007file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440007file:///C:/Users/yannick/Documents/_Documenten/UGent/Thesis/Writing/Thesis/Thesis.docx%23_Toc484440008

IX

List of Tables

Table 1 All monosaccharides in the database .............................................................................. 37

Table 2 Standard deviation of the techniques on sucrose. .......................................................... 49

Table 3 Processing hits for both the p2D and p3D experiment using the integration method and

the minima critereon. ................................................................................................................... 52

Table 4 Experimental settings and for the CPS for both the p2D and p3D experiment. ............. 55

1

Chapter 1: Introduction

1.1 Context of this project

Monosaccharide identification is not only used to identify and quantify monosaccharide

composition, it is also a critical step in the analysis of large carbohydrates. The analysis of these

carbohydrates is not only of high importance in the food industry, but also in medicinal chemistry.

In medicinal chemistry, it is often the case that interactions between a saccharide compound and

proteins are studied; yet the monosaccharide type and sequence is not always completely

determined due to the high complexity of the compound and the costs of analysis.

This work: Towards a general applicable monosaccharide identification using TOCSY matching

is a collaboration between two research groups at the UGent, the NMRSTR Research group

(department of Molecular and Macromolecular Chemistry) and the Computational Biology Lab

(Department of Applied Mathematics, Computer Science and Statistics) with the aim to propose

a fast, automatic identification and processing method for monosaccharides and polysaccharide

composition. This work is not connected to an ongoing doctoral thesis, however the foundations

have been laid by K. Gheysen (Gheysen, 2011), who has written her doctorate partially on this

subject. The TOCSY-matching approach, the concept introduced by Gheysen et al., is revisited in

this work and extended by developing an automated analysis. The TOCSY matching approach

allows a fast monosaccharide identification without a full analysis of the spectrum. In its current

implementation, analysis typically proceeds using a single 2D TOCSY at a specific mixing time.

This analysis occurs in a manual, user defined mode and is therefore not only prone to user bias,

but also limits the amount of different monosaccharides that can be examined. This is due to the

fact that the amount of monosaccharides that were investigated was limited, which could cause

misinterpretation of the spectra. In this work, a quantitative algorithm driven by the TOCSY

matching is developed in order to avoid these drawbacks. In addition to having a larger pyranose

database, the method has been extended to include a furanose sugar in order to assess the

strengths and limitations of the technique itself. In order to achieve this, a novel experimental

approach was developed, giving enhanced spectra that does not necessarily extends measuring

time upon increased complexity.

2

1.2 Sugars, saccharides and carbohydrates: not only a source of energy

A world without sugars is unthinkable in the 21st century. Every year the world population

consumes about 170 million metric tons of sugar (statista.com, n.d.). Sugars are in nearly every

food source, often added in order to further sweeten food (these are often referred to as added

sugars).

These added sugars are in almost every food product, going from yoghurt to soda drinks, a can

of cola for example easily contains 10m% carbohydrates and depending of the soda, the sugar

composition will be different. Normal Coca-Cola® contains high fructose corn syrup. This is a

combination of different corn syrups that have been processed using enzymes. The coke in the

‘green bottle’, also known as the ‘Mexican Coke®’ is sweetened with sucrose (of which the word

has a French origin: sucre + ose). Sucrose is a disaccharide extracted from Nature (from sugar

canes for example) and often referred to as table sugar. These added sugars are added to food

artificially, resulting in an extended knowledge of the carbohydrate composition.

When sugars are naturally present in the food substance, knowing their carbohydrate

composition i.e. the type and sequence of the monosaccharides is no longer straightforward.

Natural variation occurs caused by different origins or even weather conditions. A complex

example is honey, which is directly harvested from Nature and is almost entirely made from

sugars (mostly mono- and disaccharides). The foraging bees collect sucrose rich nectar, of which

they partially digest the sucrose into glucose and fructose (using digestive enzymes) in a process

which is called regurgitation. While this process is well known, sugars not originating from

Figure 1.1 Total sugar consumption worldwide (statista.com)

3

sucrose i.e. glucose and fructose are found in honey as well. Maltose for instance, a disaccharide

of glucose, is an often occurring example. These variations in monosaccharide units in these

higher sugars (di- and trisaccharides) result in a partially unknown composition.

But sugars are also found and used outside the food industry, although in that context they are

usually referred to as saccharides or carbohydrates as their taste is no longer of importance. They

are often the subject of scientific research in the life sciences, as they interact with proteins and

antibodies. Upon checking Web of Science, a ‘saccharide’ search resulted in 2500 hits for

biochemistry and molecular biology alone (for the years 1955-2017). A common example for the

usage of saccharides is cancer treatment under the form of chemotherapy (Calvaresi, 2013). The

cancer cells can be targeted due to the fact that aggressive tumors demonstrate high glycolytic

rates as they are rapidly dividing cells. By tagging saccharides with toxic molecules, they are used

as a ‘Trojan horse’ to bring the toxic molecules inside the cancer cell.

In medicinal chemistry, one can find countless other applications of the usage of sugars as they

are (in general) benign for the human body. Typical fillers for drugs are starch and sugars like

lactose. These are added to give a pill a bigger size for user comfort as the active compound is

usually present in extreme small doses. The science of these saccharides, baptized to

‘glycobiology’ somewhere in the late 1980s combines the chemistry and biochemistry discipline

on the usage of saccharides. Unfortunately, these saccharides all have a highly similar chemical

structure with most of their differences resulting from differences in stereochemistry.

1.3 Oligo- and polysaccharides

Oligosaccharides are build up by monosaccharides and have between two and eight

monosaccharides linked to each by glycosidic bonds. Due to their limited length, the different

branching possibilities are limited compared to polysaccharides, however they often occur in a

branched state. They are found in food, where they serve as a prebiotic in the colon, but aren’t

used as a source of energy by our body directly.

Polysaccharides are highly occurring compounds in Nature. They are polymeric carbohydrates,

containing long chains of monosaccharides linked together by glycosidic bonds. These

polysaccharides are divided in three main groups: food storage polysaccharides (such as starch

4

and glycogen), structural polysaccharides (chitin and cellulose) and mucopolysaccharides (also

known as glycosaminoglycans or GAG’s).

Starch can be found in corn, potatoes, and grains etc. while glycogen is only found in animals

(and is often referred to as animal starch). Both starch and glycogen are essential for nutrition

(as they are our main source of energy) and are homopolysaccharides since only one type of

monosaccharide is present in the polysaccharide. For both starch and glycogen this

monosaccharide is glucose. Due to their simplicity in monosaccharide composition, they are of

small importance during this project, as these polysaccharides only have one type of

monosaccharide, which is well known.

Structural polysaccharides such as chitin and cellulose are found in both plants and animals. They

are responsible for the rigidity of plants and for animals, chitin is found in the shell of crustaceans

(shrimps and lobsters for example). Cellulose is a homopolysaccharide of glucose, linearly linked

over a beta 1-4 bond allowing to form long and straight chains. This straight chain conformation

results in strong fibers, giving plants there structural strength. Chitin is similar to cellulose, but

has a derivative of glucose as its monosaccharide, namely N-acetyl-glucosamine.

Pneumococcal disease (Streptococcus pneumoniae), a potentially lethal bacterial infection,

shows the importance of the analysis of polysaccharides. This bacteria causes different infections,

however a vaccine does exist (ATCC, 2015). These vaccines are based on the serotype of the

bacteria which is depending of the polysaccharide capsule of the bacteria. Knowledge with

regards to the monosaccharide composition and the glycosidic bonds of the polysaccharide

capsule are crucial in order to formulate vaccines. While these CPS have been extensively studied,

some still remain unknown (Geno & Gilbert, 2015).

Figure 1.2 Starch; a branched polysaccharide (retrieved from nutrientsreview.com)

5

Upon identification of the oligo- or polysaccharide, two kinds of information are required. First,

the monosaccharide composition must be determined, as it is the building block of the oligo- or

polysaccharide. Second, the way the monosaccharides are interconnected with the glycosidic

bond and their order must be determined as well. In this thesis; only the identification of the

monosaccharides themselves is being tackled. No attempt is made in order to know in which

order they are connected and which hydroxyl functions are used in this connection. The latter

typically occurs in a later stage of the analysis. The identification of monosaccharides is referred

to as step I (Figure 1.3).

1.4 Monosaccharides structure and conformation – a refresher

Monosaccharides have a chemical formula (𝐶𝐻2𝑂)𝑛 where the origin of the group name can be

seen (carbohydrates: carbon and water). They are classified into four main groups according to

the number of carbon atoms in the chain, leading to triose, tetrose, pentose and hexose

monosaccharides of which only the latter two are at the focus of this project due to the fact that

they are the ones that mostly occur in Nature.

Within each group, various saccharide structures exist as a result of differences in cyclization and

stereochemistry. In addition different conformations may need to be taken into account when

considering cyclic pentose and hexose structures. As the mass of the monosaccharides are (per

Figure 1.3 Step by step identification

Figure 1.4 The pyranose form of hexose saccharides (left) and the pentose saccharides (right)

6

group) identical; most standard chemical methods are only able to differentiate between a

pentose and hexose but not between the different pentoses (or hexoses) themselves as

differences are only found in the stereochemistry.

1.4.1 Different forms of monosaccharides

Monosaccharides can both occur in cyclic and acyclic forms in a chemical equilibrium. This

equilibrium is possible thanks to the aldehyde function only present in the acyclic form. Through

intramolecular addition of one of the hydroxyl functions with the aldehyde a hemiacetal is

formed, leading to a cyclic structure. An interactive representation of the cyclization can be

found on Wikimedia.1

As multiple hydroxyl functions are present on each molecule; this reversible reaction can result

in different ring sizes. For pyranose type monosaccharides the ring has six atoms; for furanose

the ring has five atoms. This means the geometrical structure of the molecule is fundamentally

different, which is exploited in this project.

_________________________ 1 https://upload.wikimedia.org/wikipedia/commons/a/af/Glucose_Fisher_to_Haworth.gif

Figure 1.5 Cyclization of the acyclic aldehyde to both the furanose and pyranose

Figure 1.6 The cyclization of D-Glucose

https://upload.wikimedia.org/wikipedia/commons/a/af/Glucose_Fisher_to_Haworth.gif

7

Generally speaking, pyranose is the dominant form due to the presence of added ring strain for

the furanose cycle, giving the furanose forms typical mass percentages between 2 and 15%.

1.4.2 Variations in monosaccharides: the α vs β and D vs L

Due to the hemiacetal formation a new stereo center is created leading to the formation of two

diastereoisomers referred to as alpha and beta. The newly formed hydroxyl function can be

either in the cis or trans relationship in comparison to the C6 carbon (Figure 1.7). A mixture of

both will be formed with the ratio depending on their relative free energies of formation. The α

and β denomination is defined based on the mutual relationship: trans and cis respectively for

the alcoholic functions on the C1 and the C5 carbon. In the Haworth-projection, for all D-type

sugars, the α-type monosaccharides have the anomeric hydrogen in the upward position. For the

L-type monosaccharides it will be in the downward position.

The D and L notation refers to the Fisher representation and the orientation of the hydroxyl

function indicated in blue (Figure 1.8). The D-monosaccharides are (in general) always the most

occurring in Nature. There are only a handful exceptions (such as for rhamnose). The L type sugar

is the mirror image of the D type, both are depicted in Figure 1.8 for glucose. As it is not possible

to differentiate between two mirror images using NMR, the D-type monosaccharide will (unless

otherwise mentioned) always be investigated.

Figure 1.7 The alpha (left) and beta (right) D-Glucose form

Figure 1.8 D (left) and L (right) glucose

8

1.4.3 Variations in conformations

Throughout this project; two different representations will be used; the Haworth projection will

be used often to compare structure and stereochemistry of different monosaccharides. However,

multiple conformations still remain possible for a single Haworth projection. Pyranoses may

occur in two chair conformations2, and when these are important, both will be shown.

The most stable chair conformation will be the one where a majority of hydroxyl functions are

positioned in the equatorial plane and are not axial on the ring. This is due to the fact that axial

groups have spherical interactions between each other destabilizing the molecule. Therefor the

chair conformation with the smallest axial groups (often a hydrogen atom) is expected to be the

more stable conformation.3 Figure 1.9 illustrates both chair conformations for α-D-glucose, the

left chair conformation is expected to be more stable as it has four hydroxyl functions and C6 in

the equatorial plane while the right conformation only has hydrogens in the equatorial plane.

1.5 Introduction towards TOCSY

TOCSY or TOtal Correlation SpectroscopY; is an NMR technique where magnetization associated

with a particular spin can be passed along the entire spin system of which it is part in a molecule.

Transfer occurs between non scalar coupled protons as long as there are scalar couplings

between intervening protons in the spin system. In poly saccharides, every monosaccharide unit

defines such a separate spin system as the glycosidic bond involves too many bonds to allow for

significant scalar coupling to occur between protons of neighboring monosaccharides.

_________________________ 2 Other conformations exist as well, yet they have higher relative energies (boat conformation, twist-boat conformation) 3 One must also take the anomeric effect in account. For certain monosaccharides it might be less apparent which chair conformation shows a lower energy. Multiple other effects might be required to be taken into account.

Figure 1.9 The Haworth projection and both chair conformations of β-D-glucopyranose

9

Considering the chemical structure of monosaccharides, the scalar coupling networks mostly

involve 3JHH scalar couplings between vicinal hydrogens in the spin system, resulting in a linear

network (Eq. 1). The transfer rate of magnetization within such a network is dependent of two

main factors: the mixing time (the time that is given to pass on the magnetization) and the

coupling constant values between each of the vicinal hydrogens (3JHH). The coupling constant will

determine the magnetization transfer speed. Small couplings will reduce the speed of

magnetization transfer; while large a 3JHH coupling constant will lead to rapid transfer.

Due to the cyclic nature of monosaccharides, a monosaccharide diastereomer will be in one of

both chair conformations (for the pyranose form) meaning the dihedral angles between the

vicinal hydrogens are fixed and not prone to averaging over conformations. As the dihedral angles

determine the size of the coupling, the set of coupling constants along the network is therefore

also conformation specific. This has the result that each monosaccharide is expected to have a

specific set of 3JHH-coupling constants between each of the vicinal hydrogens. This leads to a

characteristic set of couplings at which the magnetization can be passed along from the H1 to H2,

to H3 and so on. For a hexose monosaccharide; the following couplings would be obtained

starting from the anomeric hydrogen:

𝑯𝟏 𝟑𝑱𝟏𝟐→ 𝑯𝟐

𝟑𝑱𝟐𝟑→ 𝑯𝟑

𝟑𝑱𝟑𝟒→ 𝑯𝟒

𝟑𝑱𝟒𝟓→ 𝑯𝟓 𝟑𝑱𝟓𝟔′

→ 𝑯𝟔′

𝟑𝑱𝟓𝟔→ 𝑯𝟔

Eq. 1

This means the appearance of a signal or TOCSY correlation for H2 through TOCSY transfer from

H1 is to a good approximation only dependent on the mixing time and the 3𝐽12 coupling. The

signal of H3 is dependent on both the 3𝐽12 and the 3𝐽23 coupling and so on. Differences in scalar

coupling values along the transfer path from H1 to H6 will lead to a different pattern and intensity

Figure 1.10 Comparison of a 1D-sel TOCSY of the non-anomeric region of α-glucose with different mixing times (blue:43ms; green: 95ms)

10

of TOCSY correlations depending on the particular monosaccharide. When a particular 3JHH is

small, this will lead to a transfer bottle-neck, preventing significant transfer of magnetization to

the remainder of the spin system. This is for instance the case in α-D-mannopyranose (Figure

1.11), where the 3J12 scalar coupling is very low (approximately 1Hz) causing a bottleneck for the

entire spin system. Unfortunately, this bottleneck effect will make it hard to analyze certain

monosaccharides that have an identical conformation before the bottleneck in the spin system

(see 5.4.2).

The size of all the individual couplings therefor has a big influence on how the TOCSY pattern

changes upon varying the mixing time. As previously mentioned, the size of the individual

coupling is dependent on the dihedral angle between the vicinal hydrogens. The Karplus-relation

attempts to describe the correlations between each vicinal hydrogens (3JHH):

3𝐽𝐻𝐻 = 𝐴𝑐𝑜𝑠2𝜃 + 𝐵𝑐𝑜𝑠𝜃 + 𝐶 Eq. 2

The A, B and C are 7.76; -1.1 and 1.4 respectively for unsubstituted monosaccharides (Haasnoot,

Deleeuw, & Altona, 1980). As the A, B and C are known, the only unknown variable of the 3JHH-

coupling is the dihedral angle (the concept of a dihedral angle is shown in Figure 1.13).

Using the TOCSY experiment, the rate at which magnetization propagates to the other hydrogen

atoms in the spin system can be mapped. Different correlations will be visible depending on the

mixing time due to the bottlenecks. In order to compensate against this and to achieve a high

Figure 1.13 The Karplus-relation Figure 1.13 Dihedral angle between four atoms

Figure 1.11 α-D-Mannopyranose; note the angle between the H1 and H2 will cause a bottleneck

11

signal to noise ratio for all signals, high mixing times (100ms) are most often used. However, due

to technical limitations of the hardware (mostly the probe); a safety limit on the mixing time has

been set to 130ms.

1.6 Content of this project

During this project, a new experimental setup is proposed in order to reduce the setup time of

the NMR experiments and automate the setup further. This is done using two different

experimental methods (pseudo 2D and pseudo 3D) and is discussed in chapter three. In order to

automate the processing and provide a quantitative comparison between monosaccharides, a

fully automatic processing script is written in Python (chapter four). It must be noted that chapter

four is very technical, as it describes the processing in detail. Next, chapter five compares the

measured monosaccharides against each other in a reference database. Chapter two contains a

short description of other techniques used in order to analyze monosaccharides and their (dis-)

advantages, including NMR and the current TOCSY-matching approach. The thesis is concluded

with a few case studies in chapter six and a conclusion in chapter seven.

12

Chapter 2: Current analysis methods

There are many simple methods to quantify carbohydrates in substances such as food. However,

methods having identification of chemical structure and composition as a goal are few and more

involved. Depending of the chain length (polysaccharides or oligosaccharides), techniques

involve partial or total hydrolysis in order to split oligosaccharides into the individual

monosaccharides and require an extended knowledge of organic chemistry. When total

hydrolysis is used, information on the glycosylation pattern and overall structure is lost. Some

techniques also require modification of the saccharides, which adds another step to the analysis

process.

2.1 Mass Spectroscopy methods

Using mass-spectroscopy, it does not seem arbitrary to differentiate between diastereoisomers

as these have an identical molar mass. However, it has been proven to be possible to differentiate

between three different hexoses (glucose, galactose and mannose) using ESI-ITMS in positive ion

mode. It was however not possible to differentiate between α and β monosaccharides (Zhu &

Sato, 2007). In this research only the monosaccharide composition was investigated, no attempt

was made to determine the saccharide sequence and the nature of the glycosidic linkages.

However it has been proven possible to do so using MS-MS. The data given by Zhu shows that

the technique requires extended use of wet chemistry and also lacks information on which

monosaccharides can be differentiated, as the amount of monosaccharides investigated was

fairly limited.

Using GC-MS, it is possible to do a full but destructive analysis of monosaccharide composition.

In order to do step I of the analysis (chapter 1.3), a full hydrolysis is however required. This can

be done with hydrochloric acid for example. After the hydrolysis, many derivatization methods

can be applied, such as silylation and fluoroacylation (Sassaki & Souza, 2013) to enable GC-MS

analysis. The identity is typically confirmed by using suitable monosaccharide reference

standards.

13

2.2 HPLC methods

This approach shows similarities to GC-MS as it also requires hydrolysis in the case of oligo- or

polysaccharides. Once a raw mixture of monosaccharides is obtained; the retention time of each

monosaccharide is determined using chromatography. The retention time is monosaccharide

specific giving a method for monosaccharide identification (Saddic & Ebert, n.d.). The exact

recipes for this analysis can be found in the paper of Saddic as this method uses ‘wet chemistry’.

Similar to MS methods, this technique is destructive for the sample yet it is able to differentiate

between multiple monosaccharides.

2.3 NMR standard methods

Upon the analysis of a single monosaccharide sample, a 1D proton NMR will be sufficient. But

once the sample consists of either multiple monosaccharides or oligo-/polysaccharides, a 1D

proton NMR spectrum will generally no longer suffice as there will be a significant amount of

spectral overlap between the signals of the different units. Unfortunately, chemical shifts are not

completely compound specific, as they are dependent of the chemical environment as well. As

such their usage is not advised for the identification accept for the anomeric proton.4 The

anomeric proton has a high chemical shift (4.5-6 ppm) due to the vicinity of the alcohol and ether

oxygen atom. The rest of the hydrogens on the monosaccharides are all in the same area (3-4.2

ppm) of the spectrum and as small deviations can occur, it is not possible to use this for a correct

annotation of the spectra.

_________________________ 4 The anomeric proton is responsible for the alpha or beta notation; it is indicated with H1.

Figure 2.1 1D HNMR spectrum of Galactose

14

For these more complex samples, two dimensional experiments are required. Using the classical

homonuclear (only one type of atom is measured in two dimensions: COSY, TOCSY,

NOESY/ROESY) and heteronuclear experiments (HSQC, HMBC involving 1H/13C) it is possible to

identify monosaccharides. This process requires first a full annotation of the spectrum, after

which the conformational analysis will determine which monosaccharide has been measured. As

the usage of chemical shift is not possible for non-anomeric hydrogens, the usage of scalar

coupling constants between the different hydrogens and heteronuclear experiments such as the

HSQC and HMBC are required in order to complete the structural analysis. To show the

complexity of this process, a possible work flowchart has been shown in Figure 2.2. This process

for monosaccharide and polysaccharide determination has been described extensively by Guus

& Gotfredsen and will not be repeated in this project.

2.4 The TOCSY-Matching approach (Gheysen et al.)

As the current NMR-technique to identify different carbohydrates is

very extensive, time consuming and requires manual examination of

the data, Gheysen et al. has proposed a new approach called TOCSY-

matching. While TOCSY-matching does not allow a full annotation of

the spectra, it gives a new approach to quickly identify the content of

monosaccharide, considerably facilitating a full annotation in the

subsequent analysis. The TOCSY-matching approach consists of

Figure 2.2 NMR flowchart for monosaccharide determination (Touckach, 2013)

Figure 2.3 Gheysen determination table (60ms)

15

taking one 2D-TOCSY at a chosen mixing time (often 100ms) of an oligosaccharide (or a mixture

of monosaccharides. The TOCSY trace from the anomeric signals is analyzed either along the F1

or the F2 dimension (the 2D TOCSY is symmetrical by design and both directions should bring an

identical result although there is a resolution difference).

Using this it is possible to compare the peak intensity per monosaccharide with the diagrams

provided by Gheysen (the 60ms chart is shown in Figure 2.3). These were made using multiple

2D TOCSY spectra for each monosaccharide (at 30, 60 and 100ms mixing time) in the matching

charts. The monosaccharide in Figure 2.4 shows one intense peak (the diagonal peak) and four

medium intensity peaks. When comparing this with the diagram, it tells us that this

monosaccharide in the sucrose sample, is one out of both glucoses. However it is not possible to

determine the stereochemistry of the anomeric hydrogen (α or β) in this case. This specific case

results in the fact that to differentiate between the alpha and beta glucose, a second 2D-TOCSY

must be taken at a mixing time of 30ms. Only then will we be able to correctly identify α-D-

glucose. While taking extra 2D TOCSY spectra at different mixing times works in the case of

glucose, none of the mixing times used in the matching charts enable us to differentiate between

the α and β form of galactose for instance.

There are three main concerns about this implementation of the TOCSY-matching approach.

First, it will only become apparent during the analysis and the processing of the spectra that an

Figure 2.4 A 2D-TOCSY spectra of sucrose (700MHz; 60ms). The signal corresponding to the anomeric hydrogen and its cross peaks have been indicated.

16

extra 2D TOCSY is required. This can cause delays in the analysis of the entire sample5. Upon

taking the extra 2D TOCSY spectrum, it is probable that it is impossible to differentiate between

the monosaccharides. The second and main concern is the limited amount of monosaccharides

investigated previously. As a result, it is unknown if monosaccharides not present in the charts

might show identical patterns. Xylose for instance, a pentose, might show identical peaks in the

charts and be misidentified as glucose. Finally, a third item is that the method only provides a

qualitative determination and does not provide a quantitative analysis. The analysis is therefore

prone to user bias.

2.5 Comparison of the techniques

While multiple techniques do exist, all have their own advantages and disadvantages. Both mass

spectroscopy and HPLC are able to analyze a mixture of monosaccharides and the chemical

composition of polysaccharides (step I). It not possible however, to analyze the glycosidic bonds

interconnecting the monosaccharides in an oligo- or polysaccharide.6 The main advantage of

both these techniques is the low sample amount required. NMR is able to identify every

monosaccharide and the correct glycosidic bond, yet it requires a much higher amount of sample.

A second disadvantage of NMR is the high cost of an NMR apparatus.

Another important factor that must be taken into account is that it is unknown which

monosaccharides can all be differentiated from each other. Different mass spectroscopy

methods and chromatography methods often include three or four monosaccharides in order to

see if the technique is able to differentiate between these and are a perfect proof of concept. A

universal identification method in order to identify any monosaccharide (independent whether

or not it is in a polysaccharide) is not always possible or does not exist, apart from a full analysis

using NMR, without the TOCSY-matching approach. It is our goal to reinvestigate the TOCSY-

matching approach in order to: expand the coverage of different mixing times, expand the

amount of monosaccharides investigated, automate the processing and explore the option for a

quantitative monosaccharide analysis.

_________________________ 5 In the case that extra measurement time has to be requested. 6 It is possible to determine the order of monosaccharides, yet not by which hydroxyl function they are interconnected. For this, NMR is required.

17

Chapter 3: Designing a new experiment

3.1 TOCSY pulse sequence

The default pulse program for a TOCSY experiment can be seen in Figure 3.1.7 The fid gives us a

frequency dimension after Fourier transformation and is often referred to as the direct

dimension. In order to obtain the extra information generated by the TOCSY sequence, two main

variants exist. In the 2D experiment the t1 time period is systematically increased, yielding a

series of 1D measurements from which the indirect dimension may be constructed and

subsequently Fourier transformed. The second option is to change the 90° universal pulse to a

selective pulse while keeping the t1 fixed resulting in a 1D selective TOCSY. This selective

excitation region can be placed anywhere in the 1H spectrum, resulting in the TOCSY cross

correlation pattern of the selected hydrogen.

3.2 A novel experimental setup

For the new experimental setup, the transfer of magnetization from the anomeric signal through

the entire spin system must be sampled for each saccharide. This would require a setup of

multiple 2D-TOCSY experiments, covering a mixing time of 0 to approximately 100ms and

therefore a high amount of measurement time. Indeed, depending on the instrument resolution

and sensitivity, a single 2D spectrum requires between 2 and 12 hours to record for a single

TOCSY mixing time. It should be noted however, that most of the time spent is used to record

250 to 500 1D experiments to sample the indirect time domain t1 of the 2D spectrum, so as to

achieve the required resolution for analysis of the TOCSY traces along F1. Repeating this for d 10

different mixing times would be prohibitively long. This can in principle be alleviated by using 1D-

selective TOCSY experiments. Here an individual anomeric resonance is excited and its

magnetization is then channeled in the TOCSY sequence. This results in direct generation of the

_________________________ 7 d1: relaxation delay; t1: time period, corresponds to the F1 axis after Fourier transform; fid: free induction decay.

Figure 3.1 Default pulse program for a 2D TOCSY

18

TOCSY trace in a 1D spectrum for that particular anomeric signal. The measuring time is now

reduced to that of a single 1D experiment repeated for as many mixing times as one wishes to

sample to record the TOCSY transfer. Thus only about 10 1D’s are now required. However, this

needs to be repeated for each individual monosaccharide. This notwithstanding, using multiple

1D-selective TOCSY experiments with changing mixing time per monosaccharide, reduces the

measurement time compared to the 2D approach. To facilitate analysis the associated pulse

program was set up so that all 1D selective TOCSY measurements for a particular monosaccharide

are recoded in a single experiment, generating a 2D like presentation with one frequency and

one mixing time axis. However, as this needs to be repeated for every anomeric signal a loop is

added so that each anomeric signal is targeted one by one generating a series of 2D’s in a single

file. This will be referred to as the p2D sequence.

Unfortunately, a minimum separation between individual anomeric resonances needs to occur

throughout, a condition which is generally not satisfied. In this case, the TOCSY traces of two or

more signals will overlap, compromising analysis. In order to address this issue, a band selective

version of the 2D TOCSY was developed, which marries the best of both worlds with a limited to

no extra time cost. The resulting p3D sequence, thus generates a 3D spectrum, consisting of two

frequency axis and a mixing time axis. Information on how to actually set up an experiment, both

the p2D as the p3D experiment, can be found online on the GitHub8, an online code repository.

Pulse programs, vclists and the processing software have been made fully available for download

in this environment.

3.3 Achieving selectivity

As previously mentioned; the TOCSY experiment consists of a 90° pulse; followed by a spinlock

sequence. To limit the range of signals along the F1 dimension to the anomeric signals and

achieve a suppression of all other peaks as shown in Figure 3.3, it is sufficient to avoid exciting

the signals outside the region of interest prior to the t1 evolution period of the TOCSY spectrum.

To achieve this; two options present themselves:

_________________________ 8 https://github.ugent.be/ydandois/Thesis-Source-Code or https://github.com/FramedYannick/Thesis-Source-Code

https://github.ugent.be/ydandois/Thesis-Source-Codehttps://github.com/FramedYannick/Thesis-Source-Code

19

It is possible to give a high selective 90° pulse as is done in the standard selective TOCSY

experiment. However, upon using this technique for both the p2D and p3D off-resonance effects

will be introduced causing phasing problems and non-uniform excitation, especially for the p3D

experiment where the anomeric hydrogen excitation region is quite wide. A second problem is

that the shape of these selective pulses are often Gaussians (or have a similar shape) as such that

they will still excite non-anomeric hydrogens (causing aliasing). The high selective 90° pulses can

be useful for the p2D experiment but not to excite an entire band from the 2D TOCSY plane as is

done in the p3D experiment.

A second possibility is to give a hard 90° pulse followed by a short delay, after which a

selective 180° refocusing pulse is applied to the anomeric region only, followed by a delay of

equal length. This will result in all magnetization evolving in the xy-plane, yet only the

magnetization of the inverted anomeric hydrogens are refocused by the selective inversion pulse.

By bracketing this pulse between two pulsed field gradient pulses, only the inverted signals will

Figure 3.3 2D-TOCSY spectra of JD #1 (100ms – 700MHz) with a superposed band selective indication. The anomeric hydrogen region on the diagonal has been indicated.

Figure 3.2 An overlay of a regular 1D-HNMR (blue); a 1D-HNMR using 90° selective pulse (red) centered anomeric signal (yet partly exciting water). In order to improve the 90° selective pulse results, one can lengthen the pulse in order to increase

selectivity to further reduce the intensity of the water signal. Last, a 1D-HNMR using selective spin-echo (green) in order to excite the entire anomeric region.

20

yield a coherent signal after the spin-lock sequence, with refocusing of chemical shift evolution.

The non-anomeric hydrogen signals remain in the xy-plane, however they are not refocused and

the gradients will almost reduce their signal contribution to zero. This technique is referred to as

gradient enhanced selective spin-echo.

Both excitation methods are compared in Figure 3.2 using a standard 1D 1H NMR (i.e. without

TOCSY). The selective 90° 27ms pulse approach (red spectrum) was given on the anomeric

hydrogen of β-galactose and the α-galactose anomeric hydrogen is not excited. Yet the solvent

peak (water) is still present and out of phase. Making the 90° selective pulse longer will increase

the selectivity but also the phase distortion of the selectively excited signal. For band-selective

excitation, a shorter selective 90° pulse is required, but this causes non-uniform excitation and

the need for large phase corrections. Therefore, the 180° refocusing pulse approach is preferred.

Using a 180° BURP 15ms pulse, both the α and β-hydrogen can be seen with expected relative

intensity and without phasing problems. Both techniques show no signals of the non-anomeric

hydrogens; and as such folding and aliasing will be prevented in our band selective TOCSY

experiment. As it is our goal to have a general experiment with limited setup, the spin-echo

technique is used throughout. It offers us the possibility to create a general applicable method

for assigning all monosaccharide units from their anomeric resonance even when the chemical

shift of these anomeric hydrogen resonances change due to the precise chemical circumstances

of the sample studied. Due to the fact the spin-echo sequence can be used with much longer

pulses (and wider refocusing regions) without any phasing problems (as seen in Figure 3.3), the

spin-echo has the clear advantage for the p3D experiment.

3.4 p2D-sel TOCSY

In principle, the 1D-sel TOCSY is taken by selectively give a 90° pulse to one anomeric hydrogen

and executing a spin-lock preventing resonances from evolving chemical shift in the xy-plane

using either the MLEV or DIPSI pulse sequence, while keeping the t1 parameter fixed. This will

cause the magnetization to be passed on towards the vicinal hydrogen networks. An example of

a 1D-sel TOCSY using the spin echo technique as previously mentioned can be found in Figure 3.4

with a 1D proton spectrum.

21

With the goal of combining multiple selective TOCSY with varying mixing time on the same

anomeric resonance into one single experiment, a pseudo 2D selective TOCSY pulse program was

set up as a 1D selective TOCSY adding one extra loop over the mixing time. The spectrometer

only requires one setup per experiment and as such setup time is reduced in comparison to the

previous workflow. For this, two items must be combined:

The 1D experiments must loop over all required mixing times. While Gheysen et al. showed that

both the DIPSI-2 and MLEV sequence are equally effective for the TOCSY transfer, due to the fact

that the DIPSI-2 sequence can have zero quantum coupling filtering and as such provides cleaner

spectra for an automated analysis (Figure 3.4); the DIPSI-sequence was used throughout this

project.9

The mixing times used were fixed to the same series of values (in milliseconds) for each

experiment, facilitating an automatic comparison later on:10

8.63, 17.26, 25.90, 34.53, 43.16, 51.80, 60.43, 69.06, 77.70, 86.33, 94.96, 103.60, 112.23

These are not chosen arbitrary, the length of the 90° pulse (p6) used in the DIPSI-2 sequence was

set to 25 µs and the DIPSI-2 was executed per three loops resulting in steps of 8.63ms.

_________________________ 9 This can clearly be seen in Figure 3.4 upon comparison of the triplet (3.3ppm) and doublet of doublets (3.45ppm) in the mlev-17 (blue) and dipsi-2 (red). 10 Using curve interpolation, it is possible to make the processing uniformed, as long as enough mixing times are used. This however complicates the issue and this process increases the experimental error on the data. Therefore these mixing time values were also used in the p3D experiment.

Figure 3.4 1D proton nmr of sucrose (green); 1D-sel TOCSY of sucrose (100ms) using selectivity on the anomeric hydrogen using both the dipsi-2 (red) as mlev-17 (blue) sequence on 700MHz. Only the

non-anomeric region is shown. The effect of the z-filter in the DIPSI-2 sequence is clearly visible around 3.45ppm upon comparison of the DIPSI-2 and mlev-17 sequences.

22

Since the p2D experiment does the selective excitation per anomeric hydrogen (meaning per

monosaccharide) present in the sample, it must loop over all hydrogen frequencies in the

anomeric region of the spectrum. Due to this, the time required for the p2D version to be

measured scales linearly with the amount of monosaccharides in the sample (+-20min per

monosaccharide for default resolution and sixteen scans). In principle, this could also be

automated, by introducing an a 1D proton scan and automatic peak determination. An extra loop

for all the frequencies required to be excited is added. This is stored in the fqlist parameter and

must then also be set by the operator. However, manual examination of the hydrogen spectrum

for the setup remains highly advised as errors would have a high impact on the spectra in this

stage.

It would also be difficult to automatically predict the selective pulse length required for selective

excitation of a certain band-width, as this depends on the distribution of signals in the anomeric

region. Higher selectivity requires longer pulse lengths. This means that a manual set up is

required for each signal any way. This also allows the receiver gain setting to amplify the signal

during detection to be optimized for each individual anomeric signal (using ‘rga’). This is

especially of interest when monosaccharide units appear in different S/N-ratio’s in the spectrum.

As the most intense signal defines the gain, the monosaccharide with the lowest signal

contribution may have a less than optimal receiver gain resulting in a reduced signal to noise ratio

in comparison to what can be achieved using the hardware with optimal settings. In the view of

this and subsequent development of the p3D sequence, it was opted not to automatize this step

of the acquisition and the user must manually enter the different frequencies.

Figure 3.5 A 3D (left) and 2D (right) representation of a p2D-sel TOCSY of β-glucopyranose (the time domain axes are scaled incorrectly)

23

Extensive documentation and details on how to set up an experiment can be found on the

GitHub. The total measurement time of this experiment is dependent on the amount of

frequencies set in the fqlist, meaning the more monosaccharides in the mixture that should be

analyzed, the longer the experiment will take (a linear correlation is present). This approach

works well for samples where the amount of monosaccharides is limited and the anomeric peaks

are well separated. Unfortunately, upon overlap of the anomeric hydrogen peaks in the spectra,

it will not be possible to be selective between different anomeric hydrogens and an automated

analysis will no longer by possible. Selective excitation of an anomeric hydrogen peak will always

excite a region around the anomeric hydrogen. If two signals overlap, the second anomeric

hydrogen signal will contribute to the signal measured after selective excitation. As this will

influence the profile of the signals over different mixing times, the processing software will not

be able to identify the correct monosaccharide. In such cases, it is advised to use a different

spectral approach, namely the p3D-Bsel TOCSY.

3.5 p3D-Bsel TOCSY

In Figure 3.3 a 2D TOCSY spectra is depicted of JD#111. Upon taking this spectrum over multiple

mixing times, the same experimental results will be achieved as in the p2D-sel TOCSY experiment.

But as this would be an array of 2D experiments, it would be very measurement time consuming.

As the blacked out regions in the regular 2D TOCSY spectrum are regions not required for the

automated analysis of the spectra, the parameters are set as such to only measure the white part

of the spectrum. It becomes apparent that the resolution in the F1 dimension is dependent of

two items, the spectral width that must be covered and the number of sampled points. The

former is typically 7 to 8 ppm, while the latter varies from 250 to 500, leading to minimal and

maximal resolutions (before zero filling) of 31.25 points per ppm and 71.43 points per ppm

respectively. Limiting the sample along the F1 to a band of 1ppm (as shown as the white region

in Figure 3.3), reduces the required amount of points in the F1 dimension with a factor 7-8 upon

keeping the same resolution. Also, as signals are more dispersed in the anomeric region (in

comparison to the non-anomeric region) the resolution can be further reduced.

_________________________ 11 JD#1 is a sample with the mixture of 3 different saccharides, resulting in 6 different monosaccharides

24

3.6 Conclusion: p2D-sel TOCSY versus p3D-Bsel TOCSY

In general, the goals for a new experimental analysis were to (1) constrain all experiments into

one experimental setup making not only the setup but also the analysis easier, (2) reduce the

required measurement time through a selective approach and (3) create a uniform mixing time

dimension for all experiments in order to facilitate an automatic processing.

Comparing the measurement time of the p3D-Bsel TOCSY with that of the p2D-sel TOCSY, it

becomes immediately apparent that the measurement time of the p3D experiment is much

longer (about 7 hours for a default resolution with a spectral window of 1ppm in the F1

dimension). It would seem the p2D experiment has the advantage on samples with a low

monosaccharide content (up to approximately 15 monosaccharides).

But the p2D-sel TOCSY does have its limitations. Due to the fact it uses selective 90° peaks on

each anomeric hydrogen, interference could present itself when two anomeric hydrogens have

similar chemical shift. Both will be excited and this will cause problems for the processing

software, which is only designed to process each monosaccharide one by one and is not using

deconvolution. This choice was made as the p3D experiment solves this issue.

Figure 3.6 Data resulting from a p3D experiment (JD sample). Data identical to the p2D experiment has been indicated.

25

In conclusion, the time factor is only from little importance upon choosing between the p2D and

the p3D experiment. As long as all anomeric hydrogen signals are well separated in the terms of

chemical shift, the operator is better off using the p2D-sel TOCSY. The p3D-Bsel TOCSY only

receives the advantage upon a higher complexity of the anomeric hydrogen region. As long as

the peaks are slightly separated, the p3D experiment will be able to differentiate between the

different monosaccharides due to the extra dimension. In chapter six both experiments will be

demonstrated on a mixture of multiple monosaccharides and on oligosaccharides to show this in

further detail.

26

Chapter 4: Data processing

The goal is to obtain monosaccharide specific curves of the peak intensities versus the mixing

time independent of the experimental setup and the experiment itself. As two different

experimental approaches were used, the data has to be put in a uniform format.

The data is first manually processed using the NMR spectrometer software Topspin® in order to

ensure optimal spectra.12 The data is read in using NMRGlue, an open source Python framework

(Helmus & Jaroniec, 2013) designed to be compatible with multiple NMR formats. Thanks to

NMRGlue, the data can almost immediately be processed using Python.

The data finds itself in a format depending on the experiment

that was used. Both p2D and p3D experiment will contain

data coming from multiple monosaccharides. This data is

split up and reorganized in a data format independent of the

experiments and only contains data corresponding to one

monosaccharide. This is called a ‘chunk’.13 These chunks will

be analyzed completely separated from each other. On each

chunk, the peak of the anomeric hydrogen and the

correlations of the remaining hydrogens must be found, their

signal must be integrated over the width of the peak in the

chemical shift dimension. This results in curves with the

intensity as a function of the mixing time for each correlation

and the anomeric signals. These curves are normalized,14

after which various filters are applied to these curves, in

order to remove multiplets (doublets, triplets and

quadruplets etc.). Chemical shift filters and threshold filters

are also used in order to filter out remaining noise and

_________________________ 12 More information on the processing using Topspin® can be found in the Github Manual. https://github.ugent.be/ydandois/Thesis-Source-Code or https://github.com/FramedYannick/Thesis-Source-Code 13 The chunkification is the only step that is different for the p2D and the p3D experiment. 14 They will all be normalized towards the anomeric hydrogen, as this signal always has the highest intensity.

Figure 4.1 Flowchart of the processing.

https://github.ugent.be/ydandois/Thesis-Source-Codehttps://github.com/FramedYannick/Thesis-Source-Code

27

solvent peaks. The remaining curves are monosaccharide specific and can be used for

monosaccharide comparison (chapter five).

4.1 Chunkification

Both p2D and p3D experiment contains data of multiple monosaccharides. First this must be

‘chunkified’. A chunk is the amount of data corresponding to one monosaccharide. This step is

the only step where the p2D and its p3D counterpart have different processing methods as the

chunks are designed to be in an identical format for both experiments. This has the effect that

the rest of the data processing will happen in identical manner, independent of the type of

experiment.

Due to the way the data is saved (in Bruker’s Topspin software); the data arrays in the F1, F2 and

F3 dimension will always contain 2n elements.15 This must be taking into account for the

processing in the mixing time dimension; as our experiment only contain 13 elements in the

mixing time dimension. The last elements of an array will be filled up with zero rows until a power

of two is reached. For the p2D-sel TOCSY the chunks are sequentially behind each other: if two

different frequencies were measured, rows 0-25 will be data and 26-31 will be zero rows.

_________________________ 15 With n being a positive integer, this is due to the usage of the Cooley & Tukey FFT.

Figure 4.2 p2D-sel TOCSY taken on alpha (bottom) and beta (top) glucose directly from Topspin

28

For the p3D-Bsel TOCSY; we will always have thirteen different mixing times; meaning plane 13-

15 will be filled with zero rows. Removal of these zero rows is important; as it will reduce

calculation times upon manipulation of the data.

4.1.1 Chunkification of the p2D-sel TOCSY

As mentioned above, for the p2D-sel TOCSY; the experiment is set up so that the different chunks

are always corresponding to sequential mixing time elements in the array.16 As we measure with

thirteen different mixing times, the first chunk consists of elements 0-12; the second chunk of

13-25 and so on. This shows that chunks will be lined up with a multiplication of 13 as the first

spectrum from a new chunk. The chunkification of the p2D spectra is straightforward, as it is just

slicing up the data array. This can be seen in Figure 4.2 where two chunks are present (rows 26-

31 which are empty rows to fill the data up to a power of 2 are already removed from the figure).

4.1.2 Chunkification of the p3D-Bsel TOCSY

For the p3D-Bsel TOCSY chunkification is less direct. The experimental data consists of three

dimensions: the F3 dimension or the direct dimension, the F2 dimension or the mixing time

dimension and the F1 dimension or the indirect dimension corresponding to the band-selective

region. As the order of dimensions in the data matrix is not identical to the on p2D experiment,

_________________________ 16 As previously mentioned, the mixing time dimension is a pseudo dimension.

Figure 4.3 The data structure of the p3D-Bsel TOCSY as a list of 2D-TOCSY after the dimension flip (the commas indicate that there are multiple elements in the array)

29

the F2 and F1 dimension need to be switched to ensure compatibility. After the switch, the p3D

data will have F3 as the direct dimension, F2 as the indirect dimension and F1 as the mixing time

dimension, similar to the p2D experiment.17 The setup of the data matrix can be seen in Figure

4.3. Res stands for resolution, SW for spectral window and index stands for the index of the array

(this is only used in formulas).

In order to know which part of the spectra contains the diagonal peaks and therefore the

anomeric hydrogen signals, the diagonal must be extracted. To extract the diagonal, the chemical

shift is calculated for each data point in the F2 dimension and then the index of the corresponding

data point in the F1 dimension (with the obtained chemical shift) is calculated. The ensemble of

these two operations result in the following formula:18

𝑥 =𝑆𝑂1−

1

2𝑆𝑊1−(

𝑅𝑒𝑠2−𝑦

𝑅𝑒𝑠2∗𝑆𝑊2+𝑆𝑂1

1

2𝑆𝑊2)

𝑆𝑊1∗ 𝑅𝑒𝑠1 + 𝑅𝑒𝑠1 Eq. 3

Looping over every index of the F2 dimension (index y) results in a list of coordinates of the

diagonal (list of points (x, y) located on the diagonal). This is executed on the first plane in the

p3D, as the lowest

TO A GENERAL APPLICABLE MONOSACCHARIDE IDENTIFICATION USING TOCSY … · 2017. 8. 3. · TO A...

Documents

Transcript of TO A GENERAL APPLICABLE MONOSACCHARIDE IDENTIFICATION USING TOCSY … · 2017. 8. 3. · TO A...