Chemoface User Guide

Chemoface

User guide

Cleiton A. Nunes

Universidade Federal de Lavras

Lavras MG Brasil

Feb/2013

2

Contents

Installing 3 Installation on Microsoft Windows 3 Installation on Unix-like systems 3 Updating 3 Running 3 Experimental Design 4 Designing the experiment 4 Inserting the responses 5 Analyzing the design (not applicable to Mixture Design) 5 Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs) 5 Additional features 6 Pattern Recognition 7 Inserting data set 7 Data preprocessing 8 Performing PCA 8 Performing HCA 9 Additional features 10 Multivariate calibration 11 Inserting data set 11 Data preprocessing 11 Selecting test set (optional) 12 Performing leave-one-out cross-validation 12 Performing the calibration (obtaining the model) 12 Performing external validation using test set (optional) 14 Predicting y for new samples 14 Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA) 14 Additional features 14 Data Plot 15 Inserting data set 15 Data preprocessing 15 Plotting data 15 Additional features 16 Data Organization 17 Importing files (only .txt, .dat, .csv, .bmp) 17 Exporting the dataset 19 Additional features 19 Figure handling 20 Saving figure with high resolution 20 Tools on the figure window 21 References 22

3

Installing

Chemoface1 was developed on the MATLAB

2 environment. It is a stand-alone application and does not

require a MATLAB license installation to run. Indeed, only MATLAB Compiler Runtime (MCR) is required to

be installed, which is freely available along with Chemoface. MCR is a set of shared libraries that provides

complete support for all the features of MATLAB.

Installation on Microsoft Windows

Donwload and install MCR.

Download and install Chemoface.

Installation on Unix-like systems

Install MCR and Chemoface using the Wine (www.winehq.org).

For better viewing, install the corefonts and enable fontsmooth.

Installing corefonts using Winetricks:

Select the default wineprefix > Install a font > check corefonts > OK

Enabling fontsmooth using Winetricks:

Select the default wineprefix > Change settings > check fontsmooth=rgb > OK

Updating

At Chemoface home screen, access menu Help > Check for update. If there is a new version, download and

install Chemoface as reported above. For update, MCR installation is no required.

Running

Access the modules from Chemoface home screen (Figure 1).

A screen resolution lower than 1024x768 is not recommended.

Chemoface uses dot (.) as decimal mark.

Figure 1. Chemoface home screen

4

Experimental Design

Figure 2. Designing experiments

Designing the experiment

Select the design type (1).

Select the number of factors (2).

Select the number of replications for the response (dependent variable) (3).

If appropriate, add central point (4) and select the number of replications (5).

If Fractional Factorial is used, select the size of the fraction (6).

If Mixture Design is used, select the Simplex type (7).

Set the label of the response and the labels and levels for the factors (8).

Note: If Full, Fractional, Central Composite or Plackett-Burman is used, provide only values for -1 and +1;

additional levels (e.g. axial and central points) are automatically calculated. If Mixture Design is used,

provide values ranging from 0 to 1; constraints can be used, providing values >0 and/or Alternate coded-

uncoed variables. The model will be fitted according to the provided levels (coded or uncoded).

1

2

3

4 5

6

7

8 9

10

11 12

13 14

18 19 20 21

15

17 16

22

5

Inserting the responses

Insert the responses (dependent variable) typing it on the table (10) or paste from a spreadsheet using

menu File > Paste responses.

Analyzing the design (not applicable to Mixture Design)

Select the significance level (11).

Select the mode to calculate the error (12). If selected, error is calculated by replications on central

points (if available). If unselected, the error is calculated by replications in each assay (if available). All

errors are calculated as pure error. If Plackett-Burman is used and it is unselected, error is calculated by

dummy factors.

Press Effects button (13) to obtain the effects table (Figure 3A).

Press Paretos Chart button (14) to obtain a Paretos graph for the effects (Figure 3B).

A

B Figure 3. Table and graph for the effects

Building model and response surfaces (not applicable to Fractional and Plackett-Burman designs)

Select the model type (15).

Select the significance level (16).

Press Coefficients Stats button (17) to obtain a table with the model coefficients (Figure 4A).

Choose if experimental points should be plotted (18) and if only significant coefficients (19) should be

used for model fit.

Select the graph type (20).

If the design has more than two factors, select the factors to be plotted (21) and the set the level of the

factor(s) no plotted (Figure 4B).

Press Surface Plot button (22) to obtain the graph and the statistics for the model (Figure 4C-D).

6

A

B

C

D Figure 4. Tables and graph for the model

Additional features

All obtained tables can be copied using menu Edit > Copy

A designed experiment can be saved using menu File > Save data. After, this file can be opened using

menu File > Open data.

Experimental assays for a Mixture design can be plotted using menu +plot > Experimental points.

7

Pattern Recognition

Figure 5. Pattern recognition by PCA and HCA

Inserting data set

Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

columns. In the example, samples are propolis from different seasons and variables are specific m/z

signals from its mass spectra.3

Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.

If appropriate, insert sample labels using menu Edit > Sample labels or classes. If there are sample

classes, it can be inserted similarly (Figure 6A).

If appropriated, insert variable labels using menu Edit > Variable labels. If data set is spectroscopic, use

> Spectroscopic data (Figure 6C) and insert the spectral range. For all other data set, use > Generic

data (Figure 6B).

Note: if labels are not provided, these are numbered.

1

2 3

4

5

6

7

8

10

16

8

11

12

13 14

15

15

17

8

A

B

C

D

Figure 6. Inserting sample and variable labels

Data preprocessing

Select the preprocessing method (2).

Press Apply button (3).

Note: Multiple preprocessing methods can be applied just by selecting it and pressing Apply.

To return the original data after a preprocessing, select No preprocess and press Apply.

Performing PCA

Press Run PCA button (4).

Select PC to be plotted on X and Y axis (5). For a 3D plot (Figure 7D), select the PC to be plotted on Z

axis; if a 2D plot is used, set it in none.

Note: If sample classes are provided, they may be colored on the scores plot: select Colored by sample

classes (6). Optionally sample labels also can be inserted: just provide the labels when solicited (Figure

6D).

For scores graph, press Scores plot button (7) (Figure 7A).

For loadings graph (Figure 7B), select the plot type: points or vectors (8).

Press Loadings plot button (9)

For a scores and loadings graph (Figure 7C), press Biplot button (10)

9

A

B

C

D

Figure 7. PCA plots

Performing HCA

Select the distance type (11).

Select the linkage type (12).

Note: Optionally PCA can be preciously applied to HCA: select Apply PCA to data (13) and select the

number of PC to be used when solicited.

Press Dendrogram button (14) to rum HCA and obtain the dendrogram (Figure 8B).

Insert the threshold to color the similar groups (Figure 8A). If not applicable, press Cancel.

10

A

B

Figure 8. HCA plot

Additional features

Additional columns and rows can be inserted on the table using + buttons (15).

Columns and rows can be deleted using menu Edit > Delete

Data set can be transposed using menu Edit > Transpose.

Data set can be saved using menu File > Save data. After, this file can be opened using menu File >

Open data.

Numeric values used to build the graphs may be copied pressing Copy plot data button (16).

PC variance table can be copied pressing Copy table data button (17).

11

Multivariate calibration

Figure 9. Performing multivariate calibration

Inserting data set

To insert the independent variables (matrix X) paste it from a spreadsheet (or from Data Organization

module) using menu File > Paste X. Use sample in rows and variables in columns. In the example, the

matrix X is formed by transmittances from 4000 to 600 cm-1 for coffee samples (1).

4

To insert the dependent variable (vector y) paste it from a spreadsheet using menu File > Paste Y. In

the example, y is the content of husk (%) in coffee samples (2).

Note: It is possible to insert multiple y and build the respective models; for this, provide Y as a matrix with

samples in rows and each y in a column.

Optionally, sample and variables labels for X (3), as well as, property label for y (4) can be inserted

using menu Edit > similarly to Figure 6. If labels are not provided, these are numbered.

Data preprocessing



Note: Multiple preprocessing methods (7) can be applied just by selecting it and pressing Apply.


1 2

3 4

5 6 7

8

9 10

11 22 15 22 18 22

12 23 16 23 19 23

13 14 17

20

24

24 24

12

Selecting test set (optional)

Samples can be selected and used as test set (external validation); for this use menu Edit > Select

sample for test set and select the method (Figure 10A). The test set samples can be informed manually

(Figure 10B) or automatically selected by Kennard-Stone algorithm5 (Figure 10C). Test samples are

marked with a t on the tables (1,2).

A

B

C Figure 10. Selecting samples for test set

Performing leave-one-out cross-validation

Select the regression method (PLS, PCR or MLR) (8).

Select the maximum number of latent variables (for PLS) or principal components (for PCR) to be tested

on the cross-validation (9).

Press Rum CV button (10).

To build the graphs (such as RMSECV, R, and others) (Figure 11A), select the data to be plotted in X

and Y axis (11) and press Plot button (12).

Performing the calibration (obtaining the model)

If PLS or PCR is used, select the adequate number of LV or PC to be used in the model (13).

Press Fit button (14).

To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y

axis (15) and press Plot button (16).

To obtain the performance parameters of the model (such as RMSEC, R and others) (Figure 11B),

select Model stats in X axis or Y axis (15) and press Plot button (16).

Note: To detail about statistical parameters for model performance, please see cited references.6-7

13

A

B

C

D

Figure 11. Plots for multivariate calibration

14

Performing external validation using test set (optional)

If samples for test set was previously provided (informing it manually or by Kennard-Stone algorithm5),

press Predict button (17).

If test set was not provided as above, insert the X and y (or Y if multiple y) for external validation using

menu File > Paste new X for prediction and menu File > Paste new Y for prediction test respectively.

To build the graphs (such as measured x predicted and others), select the data to be plotted in X and Y

axis (18) and press Plot button (19).

To obtain the performance parameters of the prediction, select Pred stats in X axis or Y axis (18) and

press Plot button (19).

Predicting y for new samples

Insert the new X to have the y predicted using menu File > Paste new X for prediction.

Press Predict button (17).

View the predicted values selecting Samples in X axis and Predicted in Y axis (18) and press Plot

button (19).

Performing Discriminant Analysis (PLS-DA, PCR-DA, MLR-DA)

If discriminant analysis is used, provide the classes as numbers in dependent variables (Y data, 2);

e.g.: Class A = 1, Class B = 2, Class C = 3, etc.

Enable Classes (For DA) (20).

Perform leave-one-out cross validation as previously described.

Perform calibration as previously described.

Perform external validation using test set as previously described (optional).

Predict the classes for new samples as previously described above in Predicting y for new samples.

Additional features

Outliers samples can be checked (Figure 11C). After to perform the calibration (14), select Sample

leverages in X axis and Studentized residuals in Y axis (15) and press Plot button (16).

If multiple y is used, a property (22) should be selected before any plot.

Data plotted on the graphs can be copied and used out of Chemoface; use Copy plot data buttons (23).

Additional columns and rows can be inserted on the tables using + buttons (24).



Measured x Predicted values for cross-validation, calibration and test sets can be plotted in a single

graph (Figure 11D) using menu Multiplot > Measured x Predicted. Scores for PLS-DA or PCR-DA can

be plotted similarly.


Open data.

Obtained model can be saved using menu File > Save data. After, this file can be opened using menu

File > Open model and the calibrated model can be used to further predictions.

15

Data Plot

Figure 12. Plotting data

Inserting data set

Type data on the table (1) or paste it from a spreadsheet (or from Data Organization module) using

menu File > Paste data. Only numeric data should be inserted. Use sample in rows and variables in

columns. In the example, the data set is formed by transmittances from 4000 to 600 cm-1 (1).

4

Note: see Data Organization chapter to detail about how to import data set such as spectra, bitmaps, etc.

Optionally, sample and variables labels (2) can be inserted using menu Edit > similarly to Figure 6. If

labels are not provided, these are numbered.

Data preprocessing



Note: Multiple preprocessing methods (5) can be applied just by selecting it and pressing Apply.


Plotting data

Insert the X axis label (6).

Insert the Y axis label (7).

If appropriate, insert the Y axis label for the preprocessed graph (8).

1

2

3 4 5

6 7 9

8

11

12

12

16

If appropriate, set reverse X axis (9).

To plot no-preprocessed/preprocessed data (Figure 13), check Plot data without preprocessing (10).

Press Pot data button (11).

Figure 13. No preprocessed and preprocessed spectra

Additional features

Additional columns and rows can be inserted on the tables using + buttons (12).




Open data.

17

Data Organization

Data Organization module allows importing numerical data from .txt, .dat, .csv files, and images in .bmp.

Multiple files, such as spectra files, can be imported simultaneously. The process of importing images (.bmp)

is based on converting them in a three-way array containing the RGB values for each pixel. Then the values

of R, G, and B are summed to each pixel, resulting in a two-way array (matrix). Finally, this matrix is unfolded

to generate a vector. This is particularly useful to import molecular figures to be used as descriptors in MIA-

QSAR models.8 In the example, files are MIR spectra for adulterated coffee.

4

Figure 14. Importing data set

Importing files (only .txt, .dat, .csv, .bmp)

Menu File > Import data.

Select the file type (only .txt, .dat, .csv or .bmp) (Figure 15A).

Select the files to be imported (use Shift or Ctrl keys to select multiple files) (Figure 15B)

Press Open.

Note:

Rows from respective file names are showed as sample labels (1).

If data set is very large, it is asked to show data in the table (Figure 15C). If No is selected, the files

are imported and a message is showed (2) (Figure 15D).

1

18

A

B

C

D Figure 15. Selecting files to be imported

2

19

Exporting the dataset

To use the obtained dataset into Chemoface, use menu File > Copy data set. Then, paste it in the

Chemoface tables using menu File > Paste of the used module, as previously described in Inserting

data. Data set also can be pasted in another software with support to spreadsheet.

Data set also can be saved as .txt (ASCII with space as column separator) using menu File > Save

data. This saved file cannot be opened using the Chemoface modules using menu File > Open, but

the numerical data can be copied and pasted using menu File > Paste in the Chemoface modules, as

previously described.

Additional features

Specific, odd, even or null standard deviation columns and rows can be deleted using menu Edit >

Delete


Columns and rows can be numbered using menu Edit > Numbered

20

Figure handling

Saving figure with high resolution

Access menu File > Export setup in the figure window.

In the setup window (Figure 16A), access Rendering (1).

Set the color to be saved, e.g., RGB color, gray scale, etc (2).

Set the resolution to be saved, e.g., 600 dpi (3).

Press Export button (4).

Set the file name (5) (Figure 16B)

Select the file type, e.g., .tif (6).

Press Save button (7)

Note: If .fig (MATLAB figure) is selected in file type, the file can be opened only using menu File > Open

Figure in the Chemoface home screen or by Matlab software. Figures (.fig) can be edited (e.g., font size,

font color) only on the Matlab software.

A

B

1 2

3 4

5 7

6

21

Tools on the figure window

*To move legend, click with left mouse button and drag.

Save figure

(low resolution)

Print

figure Zoon

Move

graph Rotate 3D

graph

Read data Click over graph to read values

Enable/disable color bar

Enable/disable

Legend*

22

References

1. NUNES, C. A.; FREITAS, M. P.; PINHEIRO, A. C. M.; BASTOS, S. C. Chemoface: a novel free user-

friendly interface for Chemometrics. Journal of the Brazilian Chemical Society, 23, 2003-2010, 2012.

2. MATLAB. The MathWorks, Inc.: Natick, MA, USA.

3. NUNES, C. A.; GUERREIRO, M. C. Characterization of Brazilian green propolis throughout the seasons

by Headspace-GC/MS and ESI-MS. Journal of the Science of Food and Agriculture, 92, 433-438, 2012.

4. TAVARES, K. M.; PEREIRA, R. G. F.; PINHEIRO, A. C. M.; NUNES, C. A.; GUERREIRO, M. C.;

RODARTE, M. P. Mid-infrared spectroscopy and sensory analysis applied to detection of adulteration in

roasted coffee by addition of coffee husks. Qumica Nova, 35, 1164-1168, 2012.

5. DE GROOT, P. J.; POSTMA, G. J.; MELSSEN, W. J.; BUYDENS, L. M. C. Selecting a representative

training set for the classification of demolition waste using remote NIR sensing. Analytica Chimica Acta,

392, 67-75, 1999.

6. KIRALJ R.; FERREIRA M. M. C. Basic validation procedures for regression models in QSAR and QSPR

studies: theory and application. Journal of the Brazilian Chemical Society, 20, 770-787, 2009.

7. ROY, P. P.; PAUL, S.; MITRA, I.; ROY, K. On Two Novel Parameters for Validation of Predictive QSAR

Models. Molecules, 14, 1660-1701, 2009.

8. NUNES, C. A.; FREITAS, M. P. Introducing new dimensions in MIA-QSAR: a case for chemokine receptor

inhibitors. European Journal of Medicinal Chemistry, 62, 297-300, 2013.

Chemoface User Guide

Documents

Transcript of Chemoface User Guide