Data analysis in two-dimensional chromatography

89
Data analysis in two-dimensional chromatography Data analysis in two-dimensional chromatography Van ‘t Hoff Institute for Molecular Sciences Van ‘t Hoff Institute for Molecular Sciences University of Amsterdam University of Amsterdam Gabriel Vivó Truyols Analytical-chemistry group Van ‘t Hoff Institute for Molecular Sciences University of Amsterdam [email protected]

Transcript of Data analysis in two-dimensional chromatography

Page 1: Data analysis in two-dimensional chromatography

Data analysis in two-dimensional chromatography

Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Gabriel Vivó TruyolsAnalytical-chemistry group

Van ‘t Hoff Institute for Molecular SciencesUniversity of Amsterdam

[email protected]

Page 2: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

IntroductionFrom 1D chromatography to 2D chromatography: what does change?

2 4 6 8 10 12 14 16 18Time, min

0x100

4x106

8x106

1x107

Inte

nsity

Page 3: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The complexity of the data changesInstruments can be classified according to the order of the tensor of data used to represent a single experiment:

Zero-order instruments Produce Zero-order tensor

(e.g. a number) Example pH - meter

First-order instruments Produce First-order tensor

(e.g. a vector) Example UV-VIS spectrometer

Second-order instruments Produce Second-order tensor

(e.g. a matrix) Example LC-MS, GCxGC

Third-order instrument Produce Third-order tensor

(e.g. a “cube” of data) Example GCxGC-MS

nth-order instrument exist, but they are rare

The changes from 1D to 2D

Page 4: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak vicinity” changes

Two-dimensional

1tR2 t R

One-dimensional

tR

Peak of interest

Peak of interest

Neighbors Neighbors

S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.

The changes from 1D to 2D

Page 5: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak resolution” is different

A

B

One-dimensional

tR

Two-dimensional

1tR2 t R

f

g

Valley-to-peak ratio(between A and B) g

fP gfP

AB?

Valley-to-peak ratio(between A and B)?

S. Peters, G. Vivó-Truyols, P. Marriott, P.J. Schoenmakers, J. Chromatogr. A. 1146 (2007), 232-241.

The changes from 1D to 2D

Page 6: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak” changes

One-dimensional

tR

Two-dimensional

The changes from 1D to 2D

Page 7: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The concept of “peak” changes

One-dimensional

tR

Two-dimensional

The changes from 1D to 2D

GCxGC, Riva - 2014

Page 8: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Do the main steps in data processing change?

Ste

p 2

Pre-process

Ste

p 3

MeasureS

tep

1View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition

The changes from 1D to 2D

Page 9: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Do the main steps in data processing change?

Ste

p 2

Pre-process

Ste

p 3

MeasureS

tep

1View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

The changes from 1D to 2D

Basically, the steps are the same, but the algorithms used for each step may be different.

Univariate

Multivariate

• Folding• Phasing

Page 10: Data analysis in two-dimensional chromatography

First step: visualizationFirst step: visualization

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 11: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

VisualisationRaw data in 2D chromatography

0 10 20 30Time, min

-101234

Abso

rban

ce, A

U

Chromatogram of Glycine preparation (254nm)(data courtesy of University of Valencia)

Two-dimensional chromatographyGCxGC chromatogram of diesel (FID detector)First dimension: non-polar; Second dimension: polar

Page 12: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram Visualisation

Mod time (8 seconds)

27002710

27202730

2740

02

46

80

2000

4000

6000

8000

10000

1tR, s2tR, s

FID

sig

nal

Page 13: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram Visualisation

27002720

2740

02

46

80

5000

10000

1tR, s2tR, s

FID

sig

nal

2700 2710 2720 2730 27400

1

2

3

4

5

6

7

8

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

Mod time (8 seconds)

Page 14: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

27002720

2740

46

8100

5000

10000

1tR, s2tR, s

FID

sig

nal

2700 2710 2720 2730 2740

3

4

5

6

7

8

9

10

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

Phase = 0.3

Page 15: Data analysis in two-dimensional chromatography

2700 2710 2720 2730 27404

5

6

7

8

9

10

11

12

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

27002720

2740

46

810

120

5000

10000

1tR, s2tR, s

FID

sig

nal

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

Phase = 0.5

Page 16: Data analysis in two-dimensional chromatography

2700 2710 2720 2730 27406

7

8

9

10

11

12

13

14

1tR, s

2 t R, s

0

2000

4000

6000

8000

10000

27002720

2740

68

1012

140

5000

10000

1tR, s2tR, s

FID

sig

nal

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: phasing Visualisation

Mod time (8 seconds)

Phase = 0.75

Page 17: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Cylindrical coordinates. An alternative way to represent the data.

Visualisation

J.J.A.M. Weusten, E.P.P.A. Derks , J.H.M. Mommers, S. van der Wal, Anal. Chim. Acta 726 (2012), 9

1tR

=2tR

Page 18: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Interpolation Visualisation

27002720

2740

46

8100

5000

10000

1tR, s2tR, s

FID

sig

nal

Page 19: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

+ =

Welcome to the magic world of chemometrics!

Interpolation Visualisation

Page 20: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

“Folding” the chromatogram: final result Visualisation

Page 21: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Visualisation

• Visualising is simple, and gives a lot of information.

• Folding (one-dimensional) data into (2D) image introduces discontinuities in the edges. Other visualization methods (cylindrical coordinates) possible.

• Phasing can be of great help.

• Careful with “cosmetic” effects!

Page 22: Data analysis in two-dimensional chromatography

Second step: Pre-processingSecond step: Pre-processing

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 23: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingTypical problems: base-line drifts and noise

0 10 20 30Time, min

-101234

Abso

rban

ce, A

U

0 10 20 30Time, min

-0.15-0.1

-0.050

0.050.1

Abso

rban

ce, A

U

Base-line drifts Noise

20 20.2 20.4 20.6 20.8 21Time, min

-0.04-0.02

00.020.040.06

Abso

rban

ce, A

U

Page 24: Data analysis in two-dimensional chromatography

0 10 20 30Time, min

-0.2

-0.1

0

0.1

0.2

Abso

rban

ce, A

U

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingBase-line drifts.

0 10 20 30Time, min

-0.15-0.1

-0.050

0.050.1

Abso

rban

ce, A

U

Original chromatogram

Corrected chromatogr.

Fitted base-line correction

Weighted least squares fitting

Page 25: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

Originalchromatogram

Base-line drifts.

Weighted least squares fitting

Fitted base-line correction

Correctedchromatogram

Page 26: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

Originalchromatogram

Fitted base-line correction

Correctedchromatogram

Other (more sophisticated) options:

- Use splines- Base-line correction coupled to peak detection- Fourier-transform based approaches- Wavelet-based approaches

Base-line drifts.

Weighted least squares fitting

Page 27: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56

Base-line is reached at the (half) upper part

Base-line is reached at the (half) bottom part

Base-line drifts.

Page 28: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processing

S.E. Reichenbach, M. Ni, D. Zhang, E.B. Ledford Jr., J. Chromatogr. A, 985 (2003) 47 - 56

Consider the positions with the smallest values in each half

Estimate local background parameters using neighboring

values

Interpolate the main background trend and

subtract it

Base-line drifts.

Page 29: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Smoothing and derivatives.

Savitzky-Golay filter is the most common method

Two parameters should be optmisized

• Window size• Polynomial degree

These parameters govern how much correlated noise is

removed

• Large window sizes and low polynomial degree

Too much noise is removed (chromatograms appear deformed)

• Small window sizes and large polynomial degrees Too much noise remains

Page 30: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Smoothing and derivatives.

Window size = 11Polynomial = 2

Window size = 41Polynomial = 2

Window size = 251Polynomial = 2

Originalchromatogram

Correctedchromatogram

G. Vivó-Truyols, P.J. Schoenmakers, "Automatic selection of optimal Savitzky-Golay smoothing”, Anal. Chem. 78 (2006) 4598-4608.

FID

sig

nal

Page 31: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingNoise removal. Spikes.

A good way of removing spikes consists of passing a median filter(before the Savitsky-Golay filter)

Savitzky-Golay filter

Median filterBase-line correction

Original data

Parameter to tune: window

size

Parameters to tune: window size and

polynomial degreeOptimizing three parameters

Page 32: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingAlignment.

Two types of alignment

Between-chromatogram alignment

Between-modulation alignment

Alignment is not always necessary, depending on the final objective of the analysis

Rarely doneUsing 2D

techniques(folded data)

Using 1D techniques

(unfolded data)

Page 33: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

15 different (but related) chromatograms

13.5 13.6 13.7 13.8Time, min

Alignment

Pre-processingAlignment. COW (1D)

Page 34: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pre-processingAlignment. Using (truly) 2D algorithms

S. Castillo, I. Mattila, J. Miettinen, M. Orešič, T.Hyötyläinen, Anal. Chem. 83 ( 2011) 3058–3067

Score alignment in GCxGC-MS

D. Zhang, X. Huang, F.E. Regnier,M. Zhang, Anal. Chem., 80 (2008) 2664–2671

COW-adapted GCxGC-MS (using

single channel)

Page 35: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Pre-processing

• Pre-processing methods are almost the same: one-dimensional = two-dimensional. Normally done in the (pre-folded) raw data.

• Every case needs a particular solution (it always exists, but some care should be taken!)

Page 36: Data analysis in two-dimensional chromatography

Third step: measureThird step: measure

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 37: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Peak detection in one step: the watershed algorithm Peak detection

Most common software programs use the watershed algorithm to detect peaks in 2D chromatography:

J. De Bock et al., doi 10.1007/11558484

1 2 3 4

Page 38: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

??? ?Single catchmentbasin?

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Peak detection in one step: the watershed algorithm Peak detection

Page 39: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The first problem using the watershed algorithm.

Peak detection in one step: the watershed algorithm Peak detection

Page 40: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The watershed algorithm

Does a two-dimensional chromatographic peak form a single basin?

50100

150

0

5

100

500

1000

1500

y-3

y-2

y-1

y0

y3

y2y1

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

4 4.5 5 5.5 60

500

1000

1500

y-2

y-1

y-3

y2

y1

y0

2tR, AU

Watershed algorithm works

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Automated peak detection in 2D chromatography

Page 41: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The watershed algorithm

Does a two-dimensional chromatographic peak form a single basin?

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

50100

150

0

5

100

500

1000

1500

y-3

y-2

y-1

y0

y3

y2y1

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

y-2

y-1

y3

y2

y1

y0

Saddle point

Watershed algorithm fails!

y-1

y0

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Automated peak detection in 2D chromatography

Page 42: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Critical variability in second-dimension retention time:

2

2 1

pq

tP critfail

22 1

pq

tP critfail

mp

2

1

m

p

2

1

1mq 1mq

q=0.5 (eight cuts per peak)

q=1 (four cuts per peak)

q=2 (two cuts per peak)

q=4 (1 cut per peak)0 0.04 0.08 0.12 0.16 0.2

tR,crit, min

0

0.2

0.4

0.6

0.8

1

Pfa

il

p=5p=20

p=2.5

p=10

Current instruments

Current chromatographic practice

G. Vivó-Truyols, H.G. Janssen, J. Chromatogr. A, doi:10.1016/j.chroma.2009.12.063

Peak detectionin one step: the watershed algorithm Peak detection

Page 43: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

S. Peters, G. Vivó-Truyols, P.J. Marriott and P.J. Schoenmakers, J. Chromatogr. A 1156 (2007) 14.E.J.C. van der Klift, G. Vivó-Truyols, F.W. Claassen, F.L. van Holthoon, T.A. van Beek, J. Chromatogr. A, 1178 (2008) 43.

Time, arbitrary units (AU)

Ste

p

1 Detect peaks as in one-dimensional chromatography

Use information from derivatives (pre-processing

step)

Peak detectioni in two steps. Peak detection

Page 44: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences

50100

150

0

5

100

500

1000

1500

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

T: Tolerance criterion

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

Peak detectioni in two steps. Peak detection

Page 45: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

2 Merge peaks that belong to the same compound according to 2nd-dimension retention time differences

50100

150

0

5

100

500

1000

1500

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

D

D>T ?

Peak detectioni in two steps. Peak detection

Page 46: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p

3 Check unimodality

50100

150

0

5

100

500

1000

1500

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

70 90 110 1304.5

5

5.5

1tR, AU

2 t R, A

U

Peak detectioni in two steps. Peak detection

Page 47: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Alternative algorithms?

0 5000 10000 15000

50

60

70

80

90

100

1st dimension ret. time, s

2nd

dim

ensi

on re

t. tim

e, s

0

1

2

3

4

5

6

7

8

9

10x 107

Automated peak detection in 2D chromatography

1st dimension, Ag-Column

2nd

dim

ensi

on, R

P

Data courtesy of Teris van Beek, University of Wageningen (NL)

Page 48: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Automated peak detection in 2D chromatography

Each of these dots corresponds to a detected peak

Alternative algorithms?

Page 49: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

7400 7500 7600 7700 7800 7900 8000 8100 820048

50

52

54-0.5

0

0.5

1

1.5

2

2.5

x 108

Sig

nali

nten

sity

, AU

Peak A

Peak B

7400 7500 7600 7700 7800 7900 8000 8100 820048

50

52

54-0.5

0

0.5

1

1.5

2

2.5

x 108

Sig

nali

nten

sity

, AU

Peak A

Peak B

Possibility 1 Possibility 2

Automated peak detection in 2D chromatographyAlternative algorithms?

Page 50: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Alternative algorithms?

0 5000 10000 15000

50

60

70

80

90

100

1st dimension ret. time, s

2nd

dim

ensi

on re

t. tim

e, s

0

1

2

3

4

5

6

7

8

9

10x 107

In general, any group of 1D peaks may exhibit x possibilities of arrangement in 2D peaks that do not violate the rules of unimodality and 2tR < T (tolerance criterion)!!!

Automated peak detection in 2D chromatography

Page 51: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The Bayesian approach

1tR

2 t R

Detected 1D peaksPeak A

Peak B

Peak A

Peak B

Peak A

Peak B

Peak A

Peak B

Peak C

Sol 1 Sol 2 Sol 3

Sol 4

Peak A

Peak B

Sol n

Ste

p

1 Let’s consider all possible solutions of peak arrangement

Automated peak detection in 2D chromatography

Page 52: Data analysis in two-dimensional chromatography

1tR

2 t R

Detected 1D peaksPeak A

Peak B

Peak A

Peak B

Peak A

Peak B

Peak A

Peak B

Peak C

Sol 1 Sol 2 Sol 3

Sol 4

Peak A

Peak B

Sol n

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The Bayesian approachS

tep

2 Discard those solutions that violate the unimodality criterion. Discard also those solutions that imply a too fragmented chromatographic peak.

Automated peak detection in 2D chromatography

Page 53: Data analysis in two-dimensional chromatography

Peak A

Peak B

Peak A

Peak B

Sol 1

Sol 2

Peak A

Peak B

Sol n

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The Bayesian approachS

tep

3 Apply the Bayes theorem to calculate the probability of each solution

H1

Hypothesis

H2

…Hn

|D D| |D D|

Automated peak detection in 2D chromatography

1tR

2 t R

Page 54: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The Bayesian approach

|D ∝ D||D ∝ D|

I’m interested only in a relative value of

p(Hn|D)

|D ∝ D||D ∝ D|

All the priors have the same probability

|D D| |D D|

Automated peak detection in 2D chromatography

Page 55: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

The Bayesian approach Automated peak detection in 2D chromatography

HPLC-2011

| … , 1 , , 1 1 | … , 1 , , 1 1

|

…12

12

12 1 1

212

2 21 1

|

…12

12

12 1 1

212

2 21 1

Peak phase

1st

dimension peak width

Total peak area

Prior probabilities

How much does your 1D peak profile look like a

peak?

Are the 2nd dimension retention times too far

away?The computer willdo the work for you!

Page 56: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Automated peak detection in 2D chromatographyThe Bayesian approach

7400 7500 7600 7700 7800 7900 8000 8100 820048

50

52

54-0.5

0

0.5

1

1.5

2

2.5

x 108

Sig

nali

nten

sity

, AU

Peak A

Peak B

7400 7500 7600 7700 7800 7900 8000 8100 820048

50

52

54-0.5

0

0.5

1

1.5

2

2.5

x 108

Sig

nali

nten

sity

, AU

Peak A

Peak B

Possibility 1 Possibility 2

Probability = 51% Probability = 49%

Page 57: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

2 t R

1tR

> 5000 peaks(or peak clusters)

Analyse each spot using

deconvolution

Deconvolution methods. Deconvolution

2D-FID chromatogram

Page 58: Data analysis in two-dimensional chromatography

Deconvolution

εHεyxA εHεyxA Bilinear model (for a single compound)

Cross productof two vectors

Modelled variance

xy

=

Unmodelled variance

Matrix

+

Total response

Matrix

A

Bilinear model

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Page 59: Data analysis in two-dimensional chromatography

εHεyxA

n

iii

1εHεyxA

n

iii

1

Total response

Matrix

A

Unmodelled variance

Matrix

+

Modelled variance

=

+

+

+

...

x1y1

x2y2

xnyn

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution

Bilinear model (for a n compounds)

Bilinear model

Page 60: Data analysis in two-dimensional chromatography

-50

5

10

15

20

25

4.04.2

4.44.6

225250

275300

325350

375

Derivatisationagent

Amino-acid

Deconvolution

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Solving the bilinear model. OPA-ALS

Page 61: Data analysis in two-dimensional chromatography

εYXεHεyxA

n

iii

1εYXεHεyxA

n

iii

1

y1

y2

Peak profiles inthe first order ofmeasurement(spectra)

x1 x2

Peak profiles inthe second orderof measurement(chromatograms)

AXY AXY

YAX YAX

Apply constraints

on Y

Apply constraints

on X

Initial estimates of X (or Y)

Final X and Y

DeconvolutionOPA-ALS

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Page 62: Data analysis in two-dimensional chromatography

DDTiid det DDTiid det ixRD ixRD

1

x1 x2 …References(R) xi

spectrum at the ith retention time (xi)

The user selects the retention time of

maximum dissimilarity

Consider the spectrum at the selected time as the new R matrix

3.8 4 4.2 4.4 4.6time, min

d i (m

AU

)

Dissimilarity Reference(s) (R)

Mean spectrum

200 240 280 320 360 400lambda, nm

0

2

4

6

Abs

orba

nce,

mAU

DissimilarityDeconvolutionOPA-ALS

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Page 63: Data analysis in two-dimensional chromatography

Dissimilarity Reference(s) (R)

3.8 4 4.2 4.4 4.6time, min

d i (m

AU

)

200 240 280 320 360 400lambda, nm

0

4

8

12

16

Abs

orba

nce,

mAU2

3.8 4 4.2 4.4 4.6time, min

d i (m

AU

)

200 240 280 320 360 400lambda, nm

05

10152025

Abs

orba

nce,

mAU

3 Only noise

Initial spectra (X)

DeconvolutionOPA-ALS

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Page 64: Data analysis in two-dimensional chromatography

Deconvolution

Initial X

200 240 280 320 360 400lambda, nm

05

10152025

Abs

orba

nce,

mAU

AXY AXY

YAX YAX

Apply constraints

on Y

Apply constraints

on X

200 240 280 320 360 400lambda, nm

0

0.1

0.2

0.3

0.4

Abso

rban

ce, m

AU(n

orm

alis

ed)

3.8 4 4.2 4.4 4.6time, min

0

20

40

60

80

Abs

orba

nce,

mAU

Derivatisationagent

Amino-acid

Derivatisationagent

Amino-acid

Final X

Final Y

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Page 65: Data analysis in two-dimensional chromatography

50100

150

0

5

100

500

1000

1500

y-3

y-2

y-1

y0

y3

y2y1

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Let’s reflect… Deconvolution

… peak profiles in the second dimension are not exactly the same (e.g. due to retention time misalignments…)

What happens with the model if…

4 4.5 5 5.5 60

500

1000

1500

2tR, AU

y-2

y-1

y3

y2

y1

y0

A Cross productof two vectors

xy

MatrixMatrix= +

Page 66: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

What to do then in practice? Deconvolution

If you have multichannel detection: matrix unfolding

If you don’t have multichannel detection… you

could align… but forget about it!

Misalignments in the 2nd-dimension

Page 67: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Matrix unfolding Deconvolution

1tR (region)

2 t R(r

egio

n) tR

m/z

m/z=50m/z=51

…m/z=750

m/z=50 m/z=51 … m/z=750

A Cross productof two vectors

x

y

MatrixMatrix= +

MatrixA

Page 68: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Matrix unfolding. Example. Deconvolution

PLO and POL

1st dimension, Ag-Column

2nd

dim

ensi

on, R

P

Injection of corn oil in

LCxLC-MS(zoom in)

Data courtesy of Teris van Beek, University of Wageningen (NL). M. Navarro et al., presented at HPLC-Geneva.

Page 69: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Matrix unfolding. Example. Deconvolution

20 22 24 26 28 30 326065707580859095

100105

570575580585590595600605610615

0

0.2

0.4

0.6

0.8

1 575

577

601

570575580585590595600605610615

0

0.2

0.4

0.6

0.8

1

575

577

601

m/z

2 t R

1tR

2 t R

1tR

Abu

ndan

ce

m/z

Abu

ndan

ce

m/z

PLO

POL

Reference library

Data courtesy of Teris van Beek, University of Wageningen (NL). M. Navarro et al., presented at HPLC-Geneva.

Page 70: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Matrix unfolding & matrix augmentation Deconvolution

H. Parastar et. al, Anal. Chem. 83 (2011) 9289

Page 71: Data analysis in two-dimensional chromatography

εYXεHεyxA

n

iii

1εYXεHεyxA

n

iii

1

y1

y2

Peak profiles inthe first order ofmeasurement(spectra)

x1 x2

Peak profiles inthe second orderof measurement(chromatograms)

AXY AXY

YAX YAX

Apply constraints

on Y

Apply constraints

on X

Initial estimates of X (or Y)

Final X and Y

DeconvolutionFinding some unique mases. AMDIS.

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

If I can find a unique mass for each compound…

Page 72: Data analysis in two-dimensional chromatography

AXY AXY

DeconvolutionFinding some unique mases. AMDIS.

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

If I can find a unique mass for each compound…

… I could find the peak profiles (X) for each compound…

Page 73: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Let’s reflect… Deconvolution

Two peaks are completely resolved in one dimension but retention times are the same in the other dimension…

What happens with the model if…

εHεyxA

n

iii

1εHεyxA

n

iii

1For n compounds

“Mathematical compound” is not the same as

“chemical compound”!!

Page 74: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Matrix unfolding Deconvolution

1tR (region)

2 t R(r

egio

n) tR

m/z

m/z=50m/z=51

…m/z=750

m/z=50 m/z=51 … m/z=750

A Cross productof two vectors

x

y

MatrixMatrix= +

MatrixA

Could we avoid this unfolding and use true

third-order data?

Page 75: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Using third-order data Deconvolution

Trilinear model (for n compounds)

??? ?What are the implications of this?

Profiles are unique in the three order of measurements

Solving the tri-linear model

Via PARAFAC, the three profiles are unique

Via PARAFAC2, one of the conditions of uniqueness can be relaxed (normally the

alignment between 1st and 2nd dimensions)

Other algorithms possible (e.g., TLD), but rarely used…

Page 76: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution methods. Deconvolution

A.E. Sinha, J.L. Hope, B.J. Prazen, C.G. Fraga, E.J. Nilsson, R.E. Synovec, J. Chromatogr. A, 1056 (2004) 145 - 154

An example of PARAFAC

Page 77: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution methods. Summary. Deconvolution

Deconvolutionmethods

Using 1D (unfolded) data

Use truly (folded) 2D data

• ALS or rank annihilation methods• Does not need between-

modulation alignment• Similar to AMDIS

• PARAFAC (needs between-modulation alignment)

• PARAFAC2 (more robust against between-modulation alignment)

Main problem: determine the number of components behind the peak cluster

Page 78: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Deconvolution methods. Discussion. Deconvolution

Trilinear model (for n compounds)

??? ?What are the

implications of having high-resolution MS

instead of nominal mass?

Bilinear model (for n compounds)

Page 79: Data analysis in two-dimensional chromatography

Third step: measureThird step: measure

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Ste

p 2

Pre-process

Ste

p 3

Measure

Ste

p 1

View

• Base-line correction

• Noise filtering• Spike filtering• Alignment• … etc.

• Peak detection / integration

• Calibration• Deconvolution• Pattern

recognition• Class separation

• Folding• Phasing

Page 80: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition as a variable reductionS

tep

1 Obtain chromatogram(s)

Ste

p

2 Peak detection

Ste

p

3 Pattern recognition

Ste

p

4 Unknown sample characterized

HPLC-MS/MS GCxGC-MS

GC-MS

Base-line correction Alignment

Peak detection

Digital variables

Chemicalvariables

Process/biologicalvariables

From digital variables (chromatogram) to process variables

Raw data

Features

Healthy/sick

Pattern recognition

Page 81: Data analysis in two-dimensional chromatography

Information

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Is peak detection necessary?

Chromatographic data

Using raw data directly

Ana

lysi

s of

flam

e ac

cele

rant

s us

ing

GC

xGC

Source ASource B

… for example:

Pattern recognition

Page 82: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition as a variable reductionS

tep

1 Obtain chromatogram(s)

Ste

p

2 Peak detection

Ste

p

3 Pattern recognition

Ste

p

4 Unknown sample characterized

HPLC-MS/MS GCxGC-MS

GC-MS

Base-line correction Alignment

Peak detection

Digital variables

Chemicalvariables

Process/biologicalvariables

From digital variables (chromatogram) to process variables

Raw data

Features

Healthy/sick

Pattern recognition

Page 83: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. The options Pattern recognition

Pattern recognition in GCxGC

Using raw data Using peak tables

• Alignment is critical• Less chance to miss important

compounds• Normally done with the unfolded

(raw) data, but not always (e.g. N-PLS)

• Alignment not important, but peak tracking is essential (normally MS should be present)

• Chance to miss important compounds (close to the S/N)

• (Truly) 1D method

Page 84: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Supervised methods. Pattern recognition

Any method will be prone to overfitting

In supervised pattern recognition of GCxGC, a tremendous reduction of variables is performed (form millions to a few tens/hundreds)

Any variable pre-reduction (e.g. using Fisher ratios) should be done within a cross-validation loop

Otherwise the results will be optimistic (a method that seems to work, when in fact it only works for that data)

Page 85: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition

Ste

p 2Variable

selection: Fisher ratio on the raw

data Ste

p 3Supervised

pattern recognition: PLS-DA to

separate sick from healthyS

tep

1Obtain GCxGCchromatograms for sick (50) and

healthy (50) Ste

p 4Consider the

coefficients from PLS-DA as indicators of

potential metabolites

Objective: discovering metabolites responsible for cancer tumor

??? ?Aren’t you Overfitting? No, I’ve been cross-

validating the PLS-DA

… but the variable pre-selection has been done with the full data set!!

Page 86: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Pattern recognition in GCxGC. Example of a wrong strategy Pattern recognition

Ste

p 2Variable

selection: Fisher ratio on the raw

data Ste

p 3Supervised

pattern recognition: PLS-DA to

separate sick from healthyS

tep

1Obtain GCxGCchromatograms for sick (50) and

healthy (50) Ste

p 4Consider the

coefficients from PLS-DA as indicators of

potential metabolites

Objective: discovering metabolites responsible for cancer tumor

??? ?Aren’t you Overfitting? No, I’ve been cross-

validating the variable selection and the PLS-DA

correct

Page 87: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Orig

inal

dat

a se

t

Ste

p 2Withdraw

subset 1, fit the model with the rest, and test

the model with subset 1 S

tep

3

Do the same for the

other k sectionsS

tep

1Divide the original data

set in k subsets

(randomly) Ste

p 4

Sum up the error of the model in all

validation sets

1

2

k

Cal

ibra

tion

Val

Cal

ibra

tion

Val

Cal

Test the model here

Test the model here

Cal

Val

Cal

Pattern recognition in GCxGC. Example of a correct strategy Pattern recognition

“model” = “variable pre-selection (Fisher ratio) + PLS-DA”

Page 88: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Conclusions Multivariate methods

• Deconvolution: normally done with the unfolded data (less problems with between-modulation alignment)

• Deconvolution: problem to establish the number of compounds (normally done in a manual way)

• Two ways for pattern recognition: with raw data (normally preferred) or with peak table.

• Careful with validation of supervised pattern recognition. Variable pre-selection should be included in the validation loop.

Page 89: Data analysis in two-dimensional chromatography

Van ‘t Hoff Institute for Molecular SciencesVan ‘t Hoff Institute for Molecular Sciences University of AmsterdamUniversity of Amsterdam

Further debate… Multivariate methods

This presentation has been uploaded in my blog: www.tecnometrix.com

Feel free to download and generate debate if you wish!