Implementation of an Advanced Workflow to Enhance the ......Implementation of an Advanced Workflow...
Transcript of Implementation of an Advanced Workflow to Enhance the ......Implementation of an Advanced Workflow...
Implementation of an Advanced Workflow to Enhancethe Confidence in Compound Identification for Non-
Targeted GC-HR-MS Applications
40th ISCC Symposium, May 29 - June 04, 2016
Eric Dossin, Antonio Castellon, Pierrick Diana, Pavel Pospisil, Mark Bentley, Philippe Guy
Philip Morris International R&D
Reduced-Risk Products (“RRPs”) is the term the company uses to refer to products with the potentialto reduce individual risk and population harm in comparison to smoking combustible cigarettes.
PMI’s RRPs are in various stages of development and commercialization, and we are conductingextensive and rigorous scientific studies to determine whether we can support claims for suchproducts of reduced exposure to harmful and potentially harmful constituents in smoke, andultimately claims of reduced disease risk, when compared to smoking combustible cigarettes.
Before making any such claims, we will rigorously evaluate the full set of data from the relevantscientific studies to determine whether they substantiate reduced exposure or risk. Any such claimsmay also be subject to government review and approval, as is the case in the US today.
2
Outline
3
• Smoke as complex matrix
• GC-HR-MS instrumentation
• Our workflow for complex matrix characterization
• Modeling of Linear Retention Index
• Creation of library with predicted LRI
• Case study (2 examples)
• Conclusion and next steps
PMI Science
4
• PMI is working on various Reduced Risk Products deliveringnicotine containing aerosols
• In this context it is important to fully characterize thechemical composition of these aerosols, in particular aerosolsproduced by heating tobacco compared to smoke fromcigarettes
• For analytical method development purpose, we use areference cigarette (3R4F)
5
TPM - Total Particulate Matter Whole smoke
• Smoke sample generated on a linear smoking machine• The cambridge filter is combined with the impingers
Generation of complex matrix
• Reference cigarette: 3R4F*• Smoking regimen: Health Canada 2 sticks Puff volume: 55mL Puff duration: 2 sec Puff interval: 30 sec Puff count (butt length)
* University of Kentucky (Kentucky Tobacco R&D Center). http://www2.ca.uky.edu/refcig/
Filter is extracted
2 cold impingers in series
GVP - Gas Vapor Phase
Whole smoke
6
Goal is to screen the broadest range of smoke constituents.Non-Targeted Screening
Analytical Technique: GC-High Resolution (GC-HR-MS)
GC-HR-MS_1(7200A Agilent Q-TOF-MS)
Volatile and semi-volatilesLRI from 500 to 1,900
GC-HR-MS_2(7200B Agilent Q-TOF-MS)
Apolar and polarLRI from 1,000 to 3,000
• Stop when identified: DB‐624 Accurate Mass Library
(In house PMI Library ≈700 spectra) NIST 14 or Wiley
• Export .cef and review• Purchase of ref std if available
Automated Workflow for Identification in Complex Matrices
7
MassHunterData Acquisition
(EI mode)
DeconvolutionIdentification
MassHunterUnknown Analysis
8x10
0
0.4
0.8
1.2
1.6
+EI TIC Scan 160323_3R4F_004.D
1
Counts vs. Acquisition Time (min)6 8 10 12 14 16 18 20 22 24 26
Acquisition Time (min)6 8 10 12 14 16 18 20 22 24 26
3R4F_Dil 5 (160323_3R4F_004.D)
Cou
nts
8x10
0
0.4
0.8
1.2
1.6
18.6
245
13. 9
501
8.04
61
12. 9
472
23. 7
301
10. 1
395
21. 9
162
6.58
80
19.6
303
17. 0
571
11. 8
957
15. 9
458
24. 8
099
26. 1
756
14. 7
062
20. 6
514
• Stop when identified: DB‐624 Accurate Mass Library Compound library NIST 14 or Wiley
• Export .cef and review• Purchase of ref std if available
8
The present work focuses on building a relevant library containing LRI prediction.
• To predict LRI we used two softwares:1. RapidMiner-Dragon2. ACD/Labs ChromGenius
• To build a relevant LRI prediction system 552 molecules were used:1. Experimental LRI2. Quantitative Structure-Property Relationship (QSPR) and structure similarities
The experimental linear retention indices were randomly split as training (n=401) and test (n=151) sets.Validation set (n=23) confirmed the great performance of both prediction models.
Purpose
Assessment of the Prediction Models (Test Set and Validation Set)
9
LRI p
redi
cted
800
1,200
1,600
1,6001,200800
LRI experimental
RapidMiner values
Manuscript submitted peer‐reviewed journal
1,6001,200800
LRI experimental
800
1,200
1,600
ACD/ChromGenius values
LRI p
redi
cted
Acc
urac
y(A
CD
/Chr
omG
eniu
s vs
. exp
erim
enta
l)1,6001,200800
180%
160%
140%
120%
100%
80%
60%
40%
LRI experimental
Acc
urac
y(R
apid
Min
er v
s. e
xper
imen
tal)
180%
160%
140%
120%
100%
80%
60%
1,6001,200800
40%
LRI experimental
RapidMiner values
R2 : 0.949 R2 : 0.976
ACD/ChromGenius values
n=151 reference standards (test set)n=23 reference standards (validation set)
Based on structure similarity
Based on structural descriptors (QSPR)
10
UCSD is our in-house database that contains 11567 molecules:
7000 chemicals reported as present in tobacco plant and smoke1
3000 molecules associated with flavor properties2,3
1 The Chemical Components of Tobacco and Tobacco Smoke, A. Rodgman, T.A. Perfetti, 2013, 2nd Ed. CRC press.2 Leffingwell, J. C.; Young, H. J.; Bernasek, E. Tobacco flavoring for Smoking Products, R. J. Reynolds Tobacco Company, Winston‐Salem, 1972.3 EFSA flavoring substances database
Unique Compounds & Spectra Database
1013 accurate massspectra and LRI areregistered in UCSD
11
UCSD Web Interface
Creation of the UCSD Compound Library
12
UCSDDatabase (n=11,567)
RapidMiner software
ChromGeniussoftware
Model 2
Model 1
LRI mean value
(within LRI)
Commercial libraryNIST and Wiley (EI)
Extraction of MS spectra
UCSD.xml n = 3,646 molecules
MassHunter Library Editor
• Predicted LRI of flavors and tobacco-related compounds database when thereimplemented in our current identification workflow.• 6,053 molecules were predicted with LRI values between 500 and 1,900• 3,646 molecules have an available nominal EI Mass Spectra (NIST or Wiley)
13
UCSD Compound Library
LRI predictions were associated with the 3,646 nominal EI mass spectraextracted from commercial libraries.
LRI Prediction
• Stop when identified: DB‐624 Accurate Mass Library(In house PMI Library≈700 spectra)
NIST 14 or Wiley• Export .cef and review• Purchase of ref std if available
Different Output With the New Workflow
14
Data Acquisition
(+EI)
DeconvolutionIdentification
MassHunterUnknown Analysis
Software
Implementation of predicted LRI allows alternative proposals.
•
••••
••
•
•••
••
••••••••
••
• Stop when identified: DB‐624 Accurate Mass Library(In house PMI Library≈700 spectra) UCSD Library (LRI predicted) NIST 14 or Wiley
• Export .cef and review• Purchase of ref std if available
1st Example / Evaluation
15
14.50 15.00 15.50 16.00
16
.36
7
15
.14
5
15
. 94
6
15
. 43
3
14
.70
6
15
.58
8
15
.78
4
14
.94
6
14
.52
4
16
.08
61
6.2
22
16
.47
1
Acquisition Time (min)13.00 14.00 15.00 16.00 17.00 18.00
3R4F_Rep 4 (160323_3R4F_004_duplicate.D)
Co
un
ts
7x10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
13
.95
0
12
.94
7
16
.91
6
13
.82
6
17
.86
5
12
.79
5
16
.36
7
15
.14
5
15
.94
6
12
.54
2
17
.57
7
15
.43
3
13
.13
0
17
.36
3
14
.70
6
13
.39
9
14
.20
9
18
.15
2
LRI1204.02
LRI1214.03
3-Methyl-1H-indene
Smoke sample
Confirmation
1st Example / Evaluation
16
14.50 15.00 15.50 16.00
16
.36
7
15
.14
5
15
. 94
6
15
. 43
3
14
.70
6
15
.58
8
15
.78
4
14
.94
6
14
.52
4
16
.08
61
6.2
22
16
.47
1
Acquisition Time (min)13.00 14.00 15.00 16.00 17.00 18.00
3R4F_Rep 4 (160323_3R4F_004_duplicate.D)
Co
un
ts
7x10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
13
.95
0
12
.94
7
16
.91
6
13
.82
6
17
.86
5
12
.79
5
16
.36
7
15
.14
5
15
.94
6
12
.54
2
17
.57
7
15
.43
3
13
.13
0
17
.36
3
14
.70
6
13
.39
9
14
.20
9
18
.15
2
LRI Prediction
Alternative hits proposal
LRI1204.02
LRI1214.03
1st Example / Confirmation
17
Same RT Similar EI Mass Spectra
2-Methyl-1H-indeneCommercial
Smoke sample
14.50 15.00 15.50 16.00
16
.36
7
15
.14
5
15
.94
6
15
. 43
3
14
.70
6
15
.58
8
15
.78
4
14
.94
6
14
.52
4
16
.08
61
6.2
22
16
.47
1
Acquisition Time (min)13.00 14.00 15.00 16.00 17.00 18.00
3R4F_Rep 4 (160323_3R4F_004_duplicate.D)
Co
un
ts
7x10
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
13
.95
0
12
.94
7
16
.91
6
13
.82
6
17
.86
5
12
.79
5
16
.36
7
15
.14
5
15
.94
6
12
.54
2
17
.57
7
15
.43
3
13
.13
0
17
.36
3
14
.70
6
13
.39
9
14
.20
9
18
.15
2
3-Methyl-1H-indene is confirmed
2-Methyl-1H-indene is confirmed1214.31 0.282
LRI1204.02
LRI1214.03
81.32177‐47‐1
Evaluation of NIST Proposals
18
The combination of mass spectral similarity and LRI modeling enhances theconfidence in compound identification.
Alternative ranking using the LRI model!!!
19
2nd Example / Evaluation
Acquisition Time (min)11.00 11.50 12.00 12.50
Co
un
ts
6x10
0
1
2
12
.79
5
11
.64
8
11
.89
6
11
.43
4
12
.54
2
12
.2
25
12
.63
3
11
.12
9
12
.01
5
11
.76
35
12
.40
8
12
.15
1
11
.84
27
12
.07
5
12
.70
2
12
.29
1
11
.54
4
11
.01
8
11
.21
2
11
.33
7
12
.35
1
3R4F_Rep 4 (160323_3R4F_004_duplicate.D)
Benzene,1-ethyl-4-methyl
3R4F
As Match Factor > 90%, the confidence in identification is very high.
LRI992.25
Let’s go further to identify this co-eluting compound!!!
20
Again, the combination of mass spectral similarity and LRI modeling improvesthe identification process
Alternative ranking using the LRI model !!!
Already parts of our accurate mass library
Evaluation of NIST Proposals
21
Acquisition Time (min)11.00 11.50 12.00 12.50
Co
un
ts
6x10
0
1
2
12
.79
5
11
.64
8
11
.89
6
11
.43
4
12
.54
2
12
.2
25
12
.63
3
11
.12
9
12
.01
5
11
.76
35
12
.40
8
12
.15
1
11
.84
27
12
.07
5
12
.70
2
12
.29
1
11
.54
4
11
.01
8
11
.21
2
11
.33
7
12
.35
1
3R4F_Rep 4 (160323_3R4F_004_duplicate.D)
Benzene,1-ethyl-4-methyl
3R4F
Benzene,1-ethyl-3-methyl
Final Confirmation
Both ethyl-toluene isomers wereconfirmed in the smoke sampleexplaining the co-eluting peaks.
22
Conclusion & Next Steps
Implement the LRI prediction in the MassHunter Unknown Analysis software
Dossin, E. et al. Prediction Models of Retention Indices for Increased Confidencein Structural Elucidation during Complex Matrix Analysis: Application to GasChromatography Coupled with High Resolution Mass Spectrometry coming inAnal. Chem. in 2016
• Combination of state-of-the-art instrumentation with advanced chemoinformatictools (Predicted LRIs using QSPR methods) enhance the confidence level incompound identification
• This combined approach reduces the list of putative compounds to purchase forfinal confirmation, leading to
• Shortened time for compound identification• Reduced total cost for ordering chemicals• Reduced rate of false positive compound identification
• Automation of the workflow reduces the time for complex matrix characterization
Implementation of an Advanced Workflow to Enhance the Confidence in Compound Identification for Non-Targeted GC-HR-MS Applications
40th ISCC Symposium, May 29 - June 04, 2016
Eric Dossin, Antonio Castellon, Pierrick Diana, Pavel Pospisil, Mark Bentley, Philippe GuyPhilip Morris International R&D
THANK YOU [email protected]
24
GC-HR-MS_2(7200B Agilent Q-TOF-MS)
Goal is to screen the broadest range of smoke constituents
Apolar & PolarLRI from 1000 to 3000
Column HP-5MS (30m, Ø=0.25mm)5% phenyl‐arylene95% dimethyl‐polysiloxane
Carrier gas: He
Injection volume: 0.5 µL
Injection mode split 5:1
Sample spiked with several deuterated IS
Full scan acquisition mode (m/z 40‐620 amu)
BFTSA derivatization (labile hydrogens, alcohols, carboxylic acids, amides…)
GC-HR-MS_1(7200A Agilent Q-TOF-MS)
Volatile and semi‐volatilesLRI from 500 to 1900
Column DB-624 (30m, Ø=0.25mm) 6% cyanopropyl‐phenyl94% dimethyl‐polysiloxane
Carrier gas: He
Injection volume: 250 µL (headspace)1 µL (liquid)
Injection mode split 5:1
Full scan acquisition mode (m/z 20‐500 amu)
Analytical Technique: GC-High Resolution (GC-HR-MS)
Accuracy Data of Predicted vs Experimental LRI Values
25
Accu
racy
(AC
D/C
hrom
Gen
ius
vs. e
xper
imen
tal)
1,6001,200800
LRI experimental
180%
160%
140%
120%
100%
80%
60%
40%
Accu
racy
(Rap
id M
iner
vs.
exp
erim
enta
l)
180%
160%
140%
120%
100%
80%
60%
1,6001,200800
LRI experimental
40%
RapidMiner ACD/ChromGenius
n=151 reference standards (test set)n=23 reference standards (validation set)
C5H5NOC6H10
C2HBrClF3
C2H3N C12H20O7
C5H10OC6H14O C7H6O2
Manuscript in revision
r2 : 0.949r2 : 0.976
26
Mathematical Formula
Exp =
Exp+ ( Pred‐ Exp)2