Chemoface User Guide

download Chemoface User Guide

of 22

Transcript of Chemoface User Guide

  • Chemoface

    User guide

    Cleiton A. Nunes

    Universidade Federal de Lavras

    Lavras MG Brasil

    Feb/2013

  • 2

    Contents

    Installing 3 Installation on Microsoft Windows 3 Installation on Unix-like systems 3 Updating 3 Running 3 Experimental Design 4 Designing the experiment 4 Inserting the responses 5 Analyzing the design (not applicable to Mixture Design) 5 Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs) 5 Additional features 6 Pattern Recognition 7 Inserting data set 7 Data preprocessing 8 Performing PCA 8 Performing HCA 9 Additional features 10 Multivariate calibration 11 Inserting data set 11 Data preprocessing 11 Selecting test set (optional) 12 Performing leave-one-out cross-validation 12 Performing the calibration (obtaining the model) 12 Performing external validation using test set (optional) 14 Predicting y for new samples 14 Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA) 14 Additional features 14 Data Plot 15 Inserting data set 15 Data preprocessing 15 Plotting data 15 Additional features 16 Data Organization 17 Importing files (only .txt, .dat, .csv, .bmp) 17 Exporting the dataset 19 Additional features 19 Figure handling 20 Saving figure with high resolution 20 Tools on the figure window 21 References 22

  • 3

    Installing

    Chemoface1 was developed on the MATLAB

    2 environment. It is a stand-alone application and does not

    require a MATLAB license installation to run. Indeed, only MATLAB Compiler Runtime (MCR) is required to

    be installed, which is freely available along with Chemoface. MCR is a set of shared libraries that provides

    complete support for all the features of MATLAB.

    Installation on Microsoft Windows

    Donwload and install MCR.

    Download and install Chemoface.

    Installation on Unix-like systems

    Install MCR and Chemoface using the Wine (www.winehq.org).

    For better viewing, install the corefonts and enable fontsmooth.

    Installing corefonts using Winetricks:

    Select the default wineprefix > Install a font > check corefonts > OK

    Enabling fontsmooth using Winetricks:

    Select the default wineprefix > Change settings > check fontsmooth=rgb > OK

    Updating

    At Chemoface home screen, access menu Help > Check for update. If there is a new version, download and

    install Chemoface as reported above. For update, MCR installation is no required.

    Running

    Access the modules from Chemoface home screen (Figure 1).

    A screen resolution lower than 1024x768 is not recommended.

    Chemoface uses dot (.) as decimal mark.

    Figure 1. Chemoface home screen

  • 4

    Experimental Design

    Figure 2. Designing experiments

    Designing the experiment

    Select the design type (1).

    Select the number of factors (2).

    Select the number of replications for the response (dependent variable) (3).

    If appropriate, add central point (4) and select the number of replications (5).

    If Fractional Factorial is used, select the size of the fraction (6).

    If Mixture Design is used, select the Simplex type (7).

    Set the label of the response and the labels and levels for the factors (8).

    Note: If Full, Fractional, Central Composite or Plackett-Burman is used, provide only values for -1 and +1;

    additional levels (e.g. axial and central points) are automatically calculated. If Mixture Design is used,

    provide values ranging from 0 to 1; constraints can be used, providing values >0 and/or Alternate coded-

    uncoed variables. The model will be fitted according to the provided levels (coded or uncoded).

    1

    2

    3

    4 5

    6

    7

    8 9

    10

    11 12

    13 14

    18 19 20 21

    15

    17 16

    22

  • 5

    Inserting the responses

    Insert the responses (dependent variable) typing it on the table (10) or paste from a spreadsheet using

    menu File > Paste responses.

    Analyzing the design (not applicable to Mixture Design)

    Select the significance level (11).

    Select the mode to calculate the error (12). If selected, error is calculated by replications on central

    points (if available). If unselected, the error is calculated by replications in each assay (if available). All

    errors are calculated as pure error. If Plackett-Burman is used and it is unselected, error is calculated by

    dummy factors.

    Press Effects button (13) to obtain the effects table (Figure 3A).

    Press Paretos Chart button (14) to obtain a Paretos graph for the effects (Figure 3B).

    A

    B Figure 3. Table and graph for the effects

    Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs)

    Select the model type (15).

    Select the significance level (16).

    Press Coefficients Stats button (17) to obtain a table with the model coefficients (Figure 4A).

    Choose if experimental points should be plotted (18) and if only significant coefficients (19) should be

    used for model fit.

    Select the graph type (20).

    If the design has more than two factors, select the factors to be plotted (21) and the set the level of the

    factor(s) no plotted (Figure 4B).

    Press Surface Plot button (22) to obtain the graph and the statistics for the model (Figure 4C-D).

  • 6

    A

    B

    C

    D Figure 4. Tables and graph for the model

    Additional features

    All obtained tables can be copied using menu Edit > Copy

    A designed experiment can be saved using menu File > Save data. After, this file can be opened using

    menu File > Open data.

    Experimental assays for a Mixture design can be plotted using menu +plot > Experimental points.

  • 7

    Pattern Recognition

    Figure 5. Pattern recognition by PCA and HCA

    Inserting data set

    Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

    menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

    columns. In the example, samples are propolis from different seasons and variables are specific m/z

    signals from its mass spectra.3

    Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.

    If appropriate, insert sample labels using menu Edit > Sample labels or classes. If there are sample

    classes, it can be inserted similarly (Figure 6A).

    If appropriated, insert variable labels using menu Edit > Variable labels. If data set is spectroscopic, use

    > Spectroscopic data (Figure 6C) and insert the spectral range. For all other data set, use > Generic

    data (Figure 6B).

    Note: if labels are not provided, these are numbered.

    1

    2 3

    4

    5

    6

    7

    8

    10

    16

    8

    11

    12

    13 14

    15

    15

    17

  • 8

    A

    B

    C

    D

    Figure 6. Inserting sample and variable labels

    Data preprocessing

    Select the preprocessing method (2).

    Press Apply button (3).

    Note: Multiple preprocessing methods can be applied just by selecting it and pressing Apply.

    To return the original data after a preprocessing, select No preprocess and press Apply.

    Performing PCA

    Press Run PCA button (4).

    Select PC to be plotted on X and Y axis (5). For a 3D plot (Figure 7D), select the PC to be plotted on Z

    axis; if a 2D plot is used, set it in none.

    Note: If sample classes are provided, they may be colored on the scores plot: select Colored by sample

    classes (6). Optionally sample labels also can be inserted: just provide the labels when solicited (Figure

    6D).

    For scores graph, press Scores plot button (7) (Figure 7A).

    For loadings graph (Figure 7B), select the plot type: points or vectors (8).

    Press Loadings plot button (9)

    For a scores and loadings graph (Figure 7C), press Biplot button (10)

  • 9

    A

    B

    C

    D

    Figure 7. PCA plots

    Performing HCA

    Select the distance type (11).

    Select the linkage type (12).

    Note: Optionally PCA can be preciously applied to HCA: select Apply PCA to data (13) and select the

    number of PC to be used when solicited.

    Press Dendrogram button (14) to rum HCA and obtain the dendrogram (Figure 8B).

    Insert the threshold to color the similar groups (Figure 8A). If not applicable, press Cancel.

  • 10

    A

    B

    Figure 8. HCA plot

    Additional features

    Additional columns and rows can be inserted on the table using + buttons (15).

    Columns and rows can be deleted using menu Edit > Delete

    Data set can be transposed using menu Edit > Transpose.

    Data set can be saved using menu File > Save data. After, this file can be opened using menu File >

    Open data.

    Numeric values used to build the graphs may be copied pressing Copy plot data button (16).

    PC variance table can be copied pressing Copy table data button (17).

  • 11

    Multivariate calibration

    Figure 9. Performing multivariate calibration

    Inserting data set

    To insert the independent variables (matrix X) paste it from a spreadsheet (or from Data Organization

    module) using menu File > Paste X. Use sample in rows and variables in columns. In the example, the

    matrix X is formed by transmittances from 4000 to 600 cm-1 for coffee samples (1).

    4

    To insert the dependent variable (vector y) paste it from a spreadsheet using menu File > Paste Y. In

    the example, y is the content of husk (%) in coffee samples (2).

    Note: It is possible to insert multiple y and build the respective models; for this, provide Y as a matrix with

    samples in rows and each y in a column.

    Optionally, sample and variables labels for X (3), as well as, property label for y (4) can be inserted

    using menu Edit > similarly to Figure 6. If labels are not provided, these are numbered.

    Data preprocessing

    Select the preprocessing method (5).

    Press Apply button (6).

    Note: Multiple preprocessing methods (7) can be applied just by selecting it and pressing Apply.

    To return the original data after a preprocessing, select No preprocess and press Apply.

    1 2

    3 4

    5 6 7

    8

    9 10

    11 22 15 22 18 22

    12 23 16 23 19 23

    13 14 17

    20

    24

    24 24

  • 12

    Selecting test set (optional)

    Samples can be selected and used as test set (external validation); for this use menu Edit > Select

    sample for test set and select the method (Figure 10A). The test set samples can be informed manually

    (Figure 10B) or automatically selected by Kennard-Stone algorithm5 (Figure 10C). Test samples are

    marked with a t on the tables (1,2).

    A

    B

    C Figure 10. Selecting samples for test set

    Performing leave-one-out cross-validation

    Select the regression method (PLS, PCR or MLR) (8).

    Select the maximum number of latent variables (for PLS) or principal components (for PCR) to be tested

    on the cross-validation (9).

    Press Rum CV button (10).

    To build the graphs (such as RMSECV, R, and others) (Figure 11A), select the data to be plotted in X

    and Y axis (11) and press Plot button (12).

    Performing the calibration (obtaining the model)

    If PLS or PCR is used, select the adequate number of LV or PC to be used in the model (13).

    Press Fit button (14).

    To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y

    axis (15) and press Plot button (16).

    To obtain the performance parameters of the model (such as RMSEC, R and others) (Figure 11B),

    select Model stats in X axis or Y axis (15) and press Plot button (16).

    Note: To detail about statistical parameters for model performance, please see cited references.6-7

  • 13

    A

    B

    C

    D

    Figure 11. Plots for multivariate calibration

  • 14

    Performing external validation using test set (optional)

    If samples for test set was previously provided (informing it manually or by Kennard-Stone algorithm5),

    press Predict button (17).

    If test set was not provided as above, insert the X and y (or Y if multiple y) for external validation using

    menu File > Paste new X for prediction and menu File > Paste new Y for prediction test respectively.

    To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y

    axis (18) and press Plot button (19).

    To obtain the performance parameters of the prediction, select Pred stats in X axis or Y axis (18) and

    press Plot button (19).

    Predicting y for new samples

    Insert the new X to have the y predicted using menu File > Paste new X for prediction.

    Press Predict button (17).

    View the predicted values selecting Samples in X axis and Predicted in Y axis (18) and press Plot

    button (19).

    Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA)

    If discriminant analysis is used, provide the classes as numbers in dependent variables (Y data, 2);

    e.g.: Class A = 1, Class B = 2, Class C = 3, etc.

    Enable Classes (For DA) (20).

    Perform leave-one-out cross validation as previously described.

    Perform calibration as previously described.

    Perform external validation using test set as previously described (optional).

    Predict the classes for new samples as previously described above in Predicting y for new samples.

    Additional features

    Outliers samples can be checked (Figure 11C). After to perform the calibration (14), select Sample

    leverages in X axis and Studentized residuals in Y axis (15) and press Plot button (16).

    If multiple y is used, a property (22) should be selected before any plot.

    Data plotted on the graphs can be copied and used out of Chemoface; use Copy plot data buttons (23).

    Additional columns and rows can be inserted on the tables using + buttons (24).

    Columns and rows can be deleted using menu Edit > Delete

    Data set can be transposed using menu Edit > Transpose.

    Measured x Predicted values for cross-validation, calibration and test sets can be plotted in a single

    graph (Figure 11D) using menu Multiplot > Measured x Predicted. Scores for PLS-DA or PCR-DA can

    be plotted similarly.

    Data set can be saved using menu File > Save data. After, this file can be opened using menu File >

    Open data.

    Obtained model can be saved using menu File > Save data. After, this file can be opened using menu

    File > Open model and the calibrated model can be used to further predictions.

  • 15

    Data Plot

    Figure 12. Plotting data

    Inserting data set

    Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

    menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

    columns. In the example, the data set is formed by transmittances from 4000 to 600 cm-1 (1).

    4

    Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.

    Optionally, sample and variables labels (2) can be inserted using menu Edit > similarly to Figure 6. If

    labels are not provided, these are numbered.

    Data preprocessing

    Select the preprocessing method (3).

    Press Apply button (4).

    Note: Multiple preprocessing methods (5) can be applied just by selecting it and pressing Apply.

    To return the original data after a preprocessing, select No preprocess and press Apply.

    Plotting data

    Insert the X axis label (6).

    Insert the Y axis label (7).

    If appropriate, insert the Y axis label for the preprocessed graph (8).

    1

    2

    3 4 5

    6 7 9

    8

    11

    12

    12

  • 16

    If appropriate, set reverse X axis (9).

    To plot no-preprocessed/preprocessed data (Figure 13), check Plot data without preprocessing (10).

    Press Pot data button (11).

    Figure 13. No preprocessed and preprocessed spectra

    Additional features

    Additional columns and rows can be inserted on the tables using + buttons (12).

    Columns and rows can be deleted using menu Edit > Delete

    Data set can be transposed using menu Edit > Transpose.

    Data set can be saved using menu File > Save data. After, this file can be opened using menu File >

    Open data.

  • 17

    Data Organization

    Data Organization module allows importing numerical data from .txt, .dat, .csv files, and images in .bmp.

    Multiple files, such as spectra files, can be imported simultaneously. The process of importing images (.bmp)

    is based on converting them in a three-way array containing the RGB values for each pixel. Then the values

    of R, G, and B are summed to each pixel, resulting in a two-way array (matrix). Finally, this matrix is unfolded

    to generate a vector. This is particularly useful to import molecular figures to be used as descriptors in MIA-

    QSAR models.8 In the example, files are MIR spectra for adulterated coffee.

    4

    Figure 14. Importing data set

    Importing files (only .txt, .dat, .csv, .bmp)

    Menu File > Import data.

    Select the file type (only .txt, .dat, .csv or .bmp) (Figure 15A).

    Select the files to be imported (use Shift or Ctrl keys to select multiple files) (Figure 15B)

    Press Open.

    Note:

    Rows from respective file names are showed as sample labels (1).

    If data set is very large, it is asked to show data in the table (Figure 15C). If No is selected, the files

    are imported and a message is showed (2) (Figure 15D).

    1

  • 18

    A

    B

    C

    D Figure 15. Selecting files to be imported

    2

  • 19

    Exporting the dataset

    To use the obtained dataset into Chemoface, use menu File > Copy data set. Then, paste it in the

    Chemoface tables using menu File > Paste of the used module, as previously described in Inserting

    data. Data set also can be pasted in another software with support to spreadsheet.

    Data set also can be saved as .txt (ASCII with space as column separator) using menu File > Save

    data. This saved file cannot be opened using the Chemoface modules using menu File > Open, but

    the numerical data can be copied and pasted using menu File > Paste in the Chemoface modules, as

    previously described.

    Additional features

    Specific, odd, even or null standard deviation columns and rows can be deleted using menu Edit >

    Delete

    Data set can be transposed using menu Edit > Transpose.

    Columns and rows can be numbered using menu Edit > Numbered

  • 20

    Figure handling

    Saving figure with high resolution

    Access menu File > Export setup in the figure window.

    In the setup window (Figure 16A), access Rendering (1).

    Set the color to be saved, e.g., RGB color, gray scale, etc (2).

    Set the resolution to be saved, e.g., 600 dpi (3).

    Press Export button (4).

    Set the file name (5) (Figure 16B)

    Select the file type, e.g., .tif (6).

    Press Save button (7)

    Note: If .fig (MATLAB figure) is selected in file type, the file can be opened only using menu File > Open

    Figure in the Chemoface home screen or by Matlab software. Figures (.fig) can be edited (e.g., font size,

    font color) only on the Matlab software.

    A

    B

    1 2

    3 4

    5 7

    6

  • 21

    Tools on the figure window

    *To move legend, click with left mouse button and drag.

    Save figure

    (low resolution)

    Print

    figure Zoon

    Move

    graph Rotate 3D

    graph

    Read data Click over graph to read values

    Enable/disable color bar

    Enable/disable

    Legend*

  • 22

    References

    1. NUNES, C. A.; FREITAS, M. P.; PINHEIRO, A. C. M.; BASTOS, S. C. Chemoface: a novel free user-

    friendly interface for Chemometrics. Journal of the Brazilian Chemical Society, 23, 2003-2010, 2012.

    2. MATLAB. The MathWorks, Inc.: Natick, MA, USA.

    3. NUNES, C. A.; GUERREIRO, M. C. Characterization of Brazilian green propolis throughout the seasons

    by Headspace-GC/MS and ESI-MS. Journal of the Science of Food and Agriculture, 92, 433-438, 2012.

    4. TAVARES, K. M.; PEREIRA, R. G. F.; PINHEIRO, A. C. M.; NUNES, C. A.; GUERREIRO, M. C.;

    RODARTE, M. P. Mid-infrared spectroscopy and sensory analysis applied to detection of adulteration in

    roasted coffee by addition of coffee husks. Qumica Nova, 35, 1164-1168, 2012.

    5. DE GROOT, P. J.; POSTMA, G. J.; MELSSEN, W. J.; BUYDENS, L. M. C. Selecting a representative

    training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta,

    392, 67-75, 1999.

    6. KIRALJ R.; FERREIRA M. M. C. Basic validation procedures for regression models in QSAR and QSPR

    studies: theory and application. Journal of the Brazilian Chemical Society, 20, 770-787, 2009.

    7. ROY, P. P.; PAUL, S.; MITRA, I.; ROY, K. On Two Novel Parameters for Validation of Predictive QSAR

    Models. Molecules, 14, 1660-1701, 2009.

    8. NUNES, C. A.; FREITAS, M. P. Introducing new dimensions in MIA-QSAR: a case for chemokine receptor

    inhibitors. European Journal of Medicinal Chemistry, 62, 297-300, 2013.