Virscidian Poster Pittcon 2010

1
Summary of Study Results Using the combined Query approach, it is possible to construct extensive result query conditions that allow you to evaluate a variety of conditions. The implementation is easily extensible to further customized queries. A snapshot only is reported here. A quantitative assessment of large scale data processing for LC/UV & MS based compound QC Mark A. Bayliss, Joseph D. Simpkins, Virscidian Inc., Raleigh NC 27601 Abstract In our experiences, we have found a significant number of situations that force us to have to QC a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) Found (Green) but the purity or concentration of the sample being too low to be of practical use. Targets found but eluting in a region with significant level of impurities and therefore more challenging for autopurification. Targets eluting within the solvent front or end of the chromatographic run typically with poor integration. Targets being poorly classified as found, maybe or not found due to challenges in the signal processing, baselining, peak integration, MS peak classification, poor assignment of adducts and so on. The major issue of course, was that we were not really sure to what level these issues were prevalent or were causing us to over QC results. To better understand these effects, we have undertaken a relatively large scale review of our results to determine where most of the problem situations occur and to remedy as many as possible. We were also looking to increase the trust we have our processing and to be able to trap those situations where an analyst needs to make informed decisions and communicate these effectively. This presentation summarizes some of our findings and how we have attempted to solve these needs. Results Conclusions This study really set out to quantitatively answer three basic questions 1) Do we need additional tools to visualize hidden content within the results deck? 2) Do we need workflow based visualization tools that can assist a scientist through the process of results review, reporting and publishing? 3) Is it possible to reliably reduce the need for 100% results review? 1) Results Review – Visualization of the hidden content Results analysis simply using the traditional traffic light approach answers a very limited question and thus can lead to under and over expression of the true reality of results as highlighted in this example study. Thus for both effective targeted review and discovery of the true nature of results, we propose that additional ways to visualize the results deck are necessary and advantageous. We found that by making focused review based on one result aspect at a time, the overall quality of results review was much improved. It is important to state this has not been quantified and may be more about how the individual reviewer works. It could be interesting to study this more deepply. 2) Workflow based results review Effective review of results is as much about the design of the tools for effective visualization as well as the tools required to generate the result in the first place. Reviewing large quantities of results requires a complimentary yet different implementation and must be able to guide the user to often hidden problem areas in the results. For this type of sample analysis, chemists are really interested in those samples that are “Found” AND “Pure” (AND optionally is the concentration above some minimum acceptable level). These are typical requirements for example for target substance activity screening. To answer the first two parts of the question, analysts need to understand which samples are “Found” AND “Pure (>80% Area by Detector X)” AND “not eluting in the solvent front” AND “ Not eluting at the end of the chromatography” etc… By embedding the results with specific tags of information coupled with a flexible graphical query system, an analyst may easily generate any number of combinations of test conditions that exposes the hidden detail of the results. This makes for very effective targeted decision making. 3) Reducing the need for 100% results review This study found that under even difficult analysis conditions, that it is not necessary to perform a 100% data review, rather a more targeted workflow based approach can be used with good accuracy. While our aim of a 100% accurate evaluation is our goal, a 97% accuracy under these conditions represents a very respectable and usable level of performance. Final Comments We have found consistently that the quality of both processing and interface design should be considered equal and important. We continue to improve both and aim to achieve 100% accuracy. For Further Information www.virscidian.com Contact Joseph Simpkins at [email protected] Contact Mark Bayliss at [email protected] Virscidian Inc. 7330 Chapel Hill Road, Suite 201, Raleigh, NC 27607, USA (919) 8097651or (919) 655 8050 Method A batch of 1015 of random crude synthesis data sets were selected representing what we can refer too as challenging samples. The data were originally acquired using an Agilent Technologies Ion Trap, with the following streams of data [MS1 (+ve), UV310 and ELSD]. A fast chromatographic gradient over 2 minutes was used for separation of the substances. Data Processing The data were analysed using Virscidian’s Analytical Studio ProfessionalCompound QC software beta version 1.2 with a new statistical data collection plugin. The method for processing was optimized on a small random batch of the samples. Attention in optimization was given to: baselining, peak integration regions, solvent front exclusion from calculations, peak selection and rejection criteria, adduct classification criteria, peak demotion criteria and detector offset alignment. Batches of data were then selected from different nonconsecutive days of sample acquisitions to make up the 1015 test sample collection. All samples were processed using same processing method with no changes allowed. Evaluation of results Results were captured for the following conditions: Number of samples with Status = Found before and after manual review Number of samples with Status = Wells Maybe Number of samples with status = Wells containing any maybe peak(s) Number of wells with status = Multiple Target substances before and after review Number of wells with status = Found AND Pure AND no solvent Elution AND NOT eluting at the end of the chromatography Comparisons of the traditional “Traffic Light” Approach were made against a potentially more appropriate and practical “Combined Query” approach. A review of the workflow optimization for large batchwise results review is also reported. Is a 100% Results QC Necessary to Ensure Accuracy? As noted already, the sample datasets used in this study were chosen from synthetic crude mixtures. It so happened that there was also significant chromatographic column contamination with a large baseline disturbance and some baseline resonance towards the end of the chromatography as shown below. We were able to deal with this effectively in almost all samples analyzed. As part of this study, we wanted to determine if it would be possible to use a combined query system as a workflow and be able to rely on the quality of the answers it provided. We found that for the 1015 samples that were analyzed, a total of 110 before a complete results review, were determined to meet the following compound query criteria “Found” AND “Pure {Integrated %Area of UV310} AND NOT “eluting within the solvent front” AND NOT “Eluting at the end of the chromatography (within 0.2 minutes). After review the same compound query returned 107 samples meeting the same criteria. This query is consistent with the requirements for substance screening for activity. We found that the 3% of results which were updated during the review, the main reason for change was the target substance being defined as found when they should have actually been not found {false positive}. This exclusively happened in the analysis of weak secondary adducts {ie: [M+Na] + , [2M+H] + , [2M+Na] + etc.. } which were often very spikey and low numbers of points across the peak. Our approach was to downgrade these to target not found. A Workflow Based Approach to Results Review Review of samples within a workflow Global visualization is at the plate level, local visualization of individual samples is achieved by having a results summary display and an automated autoadvance (Play button) which visualizes only what remains unfiltered. A reviewer can therefore evaluate quickly large quantities of results without needing to use the mouse or keyboard. Exposing Hidden Detail Using a “Combined Query” Approach Using a query based approach to results evaluation exposes the hidden detail in the samples, and allows a reviewer to remained focused on a single review task. We found that this appears to improve the accuracy of final results – though this has not been quantified at this time and was not the initial aim of the experiment. TRADITIONAL TRAFFIC LIGHT APPROACH REVIEW AS AREA% MAYBE PEAKS PRESENT FOUND, AREA%80%, NO SOLVENT FRONT ELUTION, DOES NOT ELUTE AT END OF CHROMATOGRAPHY REQUIRES PURIFICATION PURIFICATION – SLOW GRADIENT CUSTOM QUERY 1 CUSTOM QUERY N STEP 1 STEP 2 STEP 3 STEP 4 STEP 5 STEP 6 STEP … STEP “N” TIMED AUTOADVANCE TO NEXT UNFILTERED POSITION EXAMPLE QUERY & VISUALIZATION CRITERIA Traffic Light Approach Combined Query Approach 47 217 Using the Combined Query Approach to find all "Maybe" peaks {Sample size 1015 samples} Maybe Traffic Light Approach Before Review After Review Found 690 715 Maybe 47 0 Not Found 278 300 Total 1015 1015 Limited Visibility Data Review using the Traditional Traffic Light Approach Combined Query Approach Before Review After Review “Found” AND “%Area UV310 >= 80% AND NOT “Solvent Front” AND NOT “End of Chrom 110 107 Any Maybe Peaks Present 217 0 Not Found 278 300 Isobaric substances present 140 163 Substance Elutes in Solvent Front 6 6 Combined Query Approach that allows you to ask many varied questions of the results Left – The Traditional Visualization of Found/Maybe/Not Found Right Visualization of all samples containing any maybe peaks

Transcript of Virscidian Poster Pittcon 2010

Page 1: Virscidian Poster Pittcon 2010

Summary of Study Results     

 

 

 

 

 

 

 

 

 

  

Using the combined Query approach, it is possible to construct extensive result query conditions that allow you to evaluate a variety of conditions. The implementation is easily extensible to further customized queries. A snapshot only is reported here. 

 

 

A quantitative assessment of large scale data processing for LC/UV & MS based compound QC 

Mark A. Bayliss, Joseph D. Simpkins,  Virscidian Inc., Raleigh NC 27601 

Abstract In our experiences, we have found a significant number of situations that force us to have to QC 

a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be 

necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) 

Found  (Green) but  the purity or concentration of  the sample being  too  low  to be of practical 

use. Targets found but eluting in a region with significant level of impurities and therefore more 

challenging  for  auto‐purification.  Targets  eluting  within  the  solvent  front  or  end  of  the 

chromatographic  run  typically with poor  integration. Targets being poorly  classified as  found, 

maybe or not found due to challenges in the signal processing, baselining, peak integration, MS 

peak classification, poor assignment of adducts and so on. The major issue of course, was that 

we were not really sure to what level these issues were prevalent or were causing us to over QC 

results. To better understand these effects, we have undertaken a relatively large scale review of 

our results to determine where most of the problem situations occur and to remedy as many as 

possible. We were also  looking to  increase the trust we have our processing and to be able to 

trap  those  situations where  an  analyst  needs  to make  informed  decisions  and  communicate 

these  effectively.  This  presentation  summarizes  some  of  our  findings  and  how  we  have 

attempted to solve these needs. 

Results   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Conclusions This study really set out to quantitatively answer three basic questions 

1) Do we need additional tools  to  visualize hidden content within the results deck? 

2) Do we need   workflow based   visualization tools that can assist a scientist through 

the process of results review, reporting and publishing? 

3) Is it possible to reliably reduce the need for 100% results review? 

 

1) Results Review – Visualization of the hidden content 

Results  analysis  simply  using  the  traditional  traffic  light  approach  answers  a  very 

limited question and thus can lead to under and over expression of the true reality of 

results as highlighted  in  this example study.   Thus  for both effective  targeted  review 

and  discovery  of  the  true  nature  of  results,  we  propose  that  additional  ways  to 

visualize the results deck are necessary and advantageous.  

We  found  that by making  focused  review based on one  result aspect at a  time,  the 

overall quality of results review was much  improved.  It  is  important to state this has 

not been quantified  and may be more  about how  the  individual  reviewer works.  It 

could be interesting to study this more deepply. 

 

2) Workflow based  results review 

Effective  review  of  results  is  as  much  about  the  design  of  the  tools  for  effective 

visualization  as well  as  the  tools  required  to  generate  the  result  in  the  first  place. 

Reviewing  large  quantities  of  results  requires  a  complimentary  yet  different 

implementation and must be able to guide the user to often hidden problem areas in 

the results. 

For this type of sample analysis, chemists are really  interested  in those samples that 

are “Found” AND “Pure”  (AND optionally  is the concentration above some minimum 

acceptable  level).  These  are  typical  requirements  for  example  for  target  substance 

activity screening. 

To  answer  the  first  two  parts  of  the  question,  analysts  need  to  understand which 

samples are “Found” AND “Pure (>80% Area by Detector X)”  AND “not eluting in the 

solvent front” AND “ Not eluting at the end of the chromatography” etc… 

By  embedding  the  results with  specific  tags  of  information  coupled with  a  flexible 

graphical query system, an analyst may easily generate any number of combinations of 

test  conditions  that  exposes  the  hidden  detail  of  the  results.  This makes  for  very 

effective targeted decision making. 

 

3) Reducing the need for 100% results review 

This study found that under even difficult analysis conditions, that  it  is not necessary 

to perform a 100% data review, rather a more targeted workflow based approach can 

be used with good accuracy. While our aim of a 100% accurate evaluation is our goal, a 

97% accuracy under these conditions represents a very respectable and usable level of 

performance.  

Final Comments 

We have  found consistently  that  the quality of both processing and  interface design 

should be considered equal and  important. We continue to  improve both and aim to 

achieve 100% accuracy.  

 

For Further Information    

www.virscidian.com  

Contact Joseph Simpkins at [email protected] Contact Mark Bayliss at [email protected] 

 Virscidian Inc. 7330 Chapel Hill Road, Suite  201, Raleigh, NC 27607, 

USA  (919) 809‐7651or (919) 655 8050 

Method • A batch of 1015 of  random crude synthesis data sets were selected  representing what we 

can  refer  too  as  challenging  samples.    The data were originally  acquired using  an Agilent 

Technologies  Ion Trap, with  the  following streams of data  [MS1  (+ve), UV310 and ELSD]. A 

fast chromatographic gradient over 2 minutes was used for separation of the substances. 

Data Processing 

• The  data  were  analysed  using  Virscidian’s  Analytical  Studio  Professional‐Compound  QC 

software beta version 1.2 with a new statistical data collection plug‐in.  

• The method for processing was optimized on a small random batch of the samples. 

• Attention  in optimization was given to: baselining, peak  integration regions, solvent 

front  exclusion  from  calculations,  peak  selection  and  rejection  criteria,  adduct 

classification criteria, peak demotion criteria and detector offset alignment.  

• Batches  of  data  were  then  selected  from  different  non‐consecutive  days  of  sample 

acquisitions to make up the 1015 test sample collection. 

• All samples were processed using same processing method with no changes allowed. 

Evaluation of results 

• Results were captured for the following conditions: 

• Number of samples with Status = Found before and after manual review 

• Number of samples with Status = Wells Maybe  

• Number of samples with status = Wells containing any maybe peak(s) 

• Number of wells with status = Multiple Target substances before and after review 

• Number of wells with  status = Found AND Pure AND no  solvent Elution AND NOT 

eluting at the end of the chromatography 

• Comparisons  of  the  traditional  “Traffic  Light”  Approach were made  against  a  potentially 

more appropriate and practical “Combined Query” approach. 

• A review of the workflow optimization for large batch‐wise results review is also reported. 

Is a 100% Results QC Necessary to Ensure Accuracy? As  noted  already,  the  sample  datasets  used  in  this  study were  chosen  from  synthetic  crude mixtures.  It  so 

happened  that  there  was  also  significant  chromatographic  column  contamination  with  a  large  baseline 

disturbance and some baseline  resonance  towards  the end of  the chromatography as shown below.   We were 

able to deal with this effectively  in almost all samples analyzed. 

 

 

 

 

 

 

 

 

 

 

 

 

As  part  of  this  study, we wanted  to  determine  if  it would  be  possible  to  use  a  combined  query  system  as  a 

workflow and be able to rely on the quality of the answers it provided.  We found that for the 1015 samples that 

were  analyzed,  a  total  of  110  before  a  complete  results  review,  were  determined  to  meet  the  following 

compound query criteria ‐ “Found” AND “Pure {Integrated %Area of UV310} AND NOT “eluting within the solvent 

front”  AND  NOT  “Eluting  at  the  end  of  the  chromatography  (within  0.2  minutes).  After  review  the  same 

compound query returned 107 samples meeting the same criteria. This query is consistent with the requirements 

for substance screening for activity. 

We  found  that    the 3% of  results which were updated during  the  review,  the main  reason  for change was  the 

target  substance being defined as  found when  they  should have actually been not  found  {false positive}. This 

exclusively happened  in  the analysis of weak secondary adducts  {ie:  [M+Na]+,  [2M+H]+,  [2M+Na]+ etc..  } which 

were often very spikey and  low numbers of points across  the peak. Our approach was  to downgrade  these  to 

target not found. 

A Workflow Based Approach to Results Review 

 

Review of samples within a workflow 

 

 

 

 

 

  

 Global  visualization  is  at  the  plate  level,  local  visualization  of  individual  samples  is  achieved  by  having  a  results summary display  and  an  automated  auto‐advance  (Play button) which  visualizes only what  remains unfiltered. A reviewer can therefore evaluate quickly large quantities of results without needing to use the mouse or keyboard. 

Exposing Hidden Detail  Using a “Combined Query” Approach 

 

 

 

 

 

  

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Using a query based approach  to  results evaluation exposes  the hidden detail  in  the  samples, and allows a reviewer to remained focused on a single review task. We found that this appears to improve the accuracy of final results – though this has not been quantified at this time and was not the initial aim of the experiment. 

TRADITIONAL TRAFFIC LIGHT APPROACH 

REVIEW AS AREA% 

MAYBE PEAKS PRESENT 

FOUND, AREA%≥80%, NO SOLVENT FRONT ELUTION, DOES NOT ELUTE AT END OF 

CHROMATOGRAPHY 

REQUIRES PURIFICATION 

PURIFICATION – SLOW GRADIENT 

CUSTOM QUERY ‐ 1 

CUSTOM QUERY ‐ N 

STEP 1 

STEP 2 

STEP 3 

STEP 4 

STEP 5 

STEP 6 

STEP … 

STEP “N” 

TIMED AUTO‐ADVANCE TO NEXT UNFILTERED POSITION 

EXAMPLE QUERY  &  

VISUALIZATION CRITERIA 

Traffic Light Approach Combined Query Approach

47 

217 

Using the Combined Query Approach to find all "Maybe" peaks {Sample size 1015 samples} 

Maybe

Traffic Light Approach  Before Review  After Review Found  690  715 Maybe  47  0 

Not Found  278  300 Total  1015  1015 

Limited Visibility Data Review using the Traditional Traffic Light Approach 

Combined Query Approach  Before Review  After Review “Found” AND “%Area UV310 >= 

80% AND NOT “Solvent Front” AND NOT “End of Chrom 

110  107 

Any Maybe Peaks Present  217  0 Not Found  278  300 

Isobaric substances present  140  163 Substance Elutes in Solvent Front  6  6 

Combined Query Approach that allows you to ask many varied questions of the results 

Left – The Traditional Visualization of Found/Maybe/Not Found 

Right ‐ Visualization of all samples containing any maybe peaks