Detection and quantiﬁcation of adulteration in sandalwood...

Detection and quantification of adulteration in sandalwood oil through nearinfrared spectroscopy†

Saji Kuriakose,‡a Xavier Thankappan,a Hubert Joe*a and Venkateswaran Venkataramanb

Received 23rd April 2010, Accepted 21st June 2010

DOI: 10.1039/c0an00261e

The confirmation of authenticity of essential oils and the detection of adulteration are problems of

increasing importance in the perfumes, pharmaceutical, flavor and fragrance industries. This is

especially true for ‘value added’ products like sandalwood oil. A methodical study is conducted here to

demonstrate the potential use of Near Infrared (NIR) spectroscopy along with multivariate calibration

models like principal component regression (PCR) and partial least square regression (PLSR) as rapid

analytical techniques for the qualitative and quantitative determination of adulterants in sandalwood

oil. After suitable pre-processing of the NIR raw spectral data, the models are built-up by cross-

validation. The lowest Root Mean Square Error of Cross-Validation and Calibration (RMSECV and

RMSEC % v/v) are used as a decision supporting system to fix the optimal number of factors. The

coefficient of determination (R2) and the Root Mean Square Error of Prediction (RMSEP % v/v) in the

prediction sets are used as the evaluation parameters (R2¼ 0.9999 and RMSEP¼ 0.01355). The overall

result leads to the conclusion that NIR spectroscopy with chemometric techniques could be successfully

used as a rapid, simple, instant and non-destructive method for the detection of adulterants, even 1% of

the low-grade oils, in the high quality form of sandalwood oil.

Introduction

Essential oils are complex mixtures of various terpenoids, alde-

hydes, ketones, alcohols, esters and other aromatic substances.

Most of the oils are used for flavoring of foodstuffs, in perfume

compositions or in mouth care products.1 Some essential oils

containing phenol content are also used in phyto-pharmaceutical

products or as additives relating to antibiotic properties.

Sandalwood oil is a volatile essential oil obtained by steam

distillation of the dried wood from the trunk and roots of the

plant Santalum album L (Indian sandalwood) (Kingdom –

Plantae, Class – Magnoliopsida, Family – Santalaceae, Genus –

Santalum L). This oil appears as a pale yellow/yellow liquid with

a characteristic soft, warm, woody odor and a slightly bitter

resinous taste (FCC 2003). Sandalwood oil is used as a flavor

ingredient, with a daily consumption of 0.0074 mg/kg and as an

adjuvant in the food industry. In perfumery also, it is used

extensively. The heartwood of mature trees (>10 years old)

contains oils whose main constituents are sesquiterpene alcohols,

cis-a-santalol, cis-b-santaol etc.2 This oil is approved for food

usage by the United States Food and Drug Administration

(FDA), Flavor and Extract Manufactures Association (FEMA)

and Council of Europe (COE).3,4

It is identified that sandalwood oil consists of more than

100 constituents. The a-santalol (�$60% of total santalol) and

b-santalol (�$33% of total santalol) are mainly responsible for

the odor depending on the sourced species,5 although 2-furfuryl

pyrrole may also contribute.6 It also contains sesquiterpene

hydrocarbons (�60%)7 that are mostly a-santalene, b-santalene,

epi-b-santalene, as well as a-curcumene, b-curcumene, g-curcu-

mene, b-bisabolene and a-bisabolol.8 The other constituents

reported are dihydro-b-agarofuran, santene, teresantol, borneol,

teresantalic acid, tricycloekasantalal, santalone and santanol.9

Three new neolignans and a new aromatic ester have been iso-

lated from the heartwood of S. album L recently.10

Sandalwood oil and its major constituents have short sensitive

oral and dermal toxicity in laboratory animals. Sandalwood oil is

found to have antiviral, anticarcinogenic and bactericidal

activity. It is also not mutagenic in spore Rec assay.11 Sanskrit

manuscripts reveal that sandalwood has been in use for over

4000 years. The commercial use of sandalwood oil in the USA

began in the early 1800s.

Due to its sensory quality, extensive use, and steep rise in the

price, sandalwood oil is often adulterated with low grade cost-

effective oils and synthetic or semi-synthetic substitutes such as

SandaloreÒ.12,13 Adulteration of sandalwood oil is a serious

problem for regulatory agencies, oil suppliers, and a threat to the

health of consumers. Substitution and synthetic additives would

influence the chemical composition and physical properties of the

oil; these factors may affect oil quality and the allergic potential.

The common adulterants reported include castor oil, cedarwood

oil and low-grade oil from ‘sandalwood’ species other than

S. album.12,14 The most common adulterant is the castor oil

(botanical name Ricinus communis of the family Eurphorbiacae).

Various authorities have recommended that the oil from

S. album should not contain less than 90% w/w of (free) alcohols,

aCentre for Molecular and Biophysics, Department of Physics, Mar IvaniosCollege, Thiruvananthapuram, 695 015, Kerala, India. E-mail: [email protected]; [email protected]; Fax: +91 471 2530023; Tel: +91 4712531053bCentral Electronics Engineering Research Institute, Chennai Centre,CSIR Complex, India

† Electronic supplementary information (ESI) available: supplementarytable of percentages of castor oil and sandalwood oil in 0–100%adulterated mixtures. See DOI: 10.1039/c0an00261e

‡ Permanent address: St Thomas H.S.S., Pala, Kerala, India.

2676 | Analyst, 2010, 135, 2676–2681 This journal is ª The Royal Society of Chemistry 2010

PAPER www.rsc.org/analyst | Analyst

Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

calculated as santalols.14–17 The acetylation methods18,19

described to asses the santalol content of sandalwood oil gener-

ally lack specificity and accuracy. More recently, the ISO (2002)

has suggested the analysis of S. album oil using gas chromatog-

raphy (GC). However, these reports do not address the detection

of adulterants. Taking into consideration the above facts, there is

an increasing demand for the development of a new, rapid, and

non-destructive method instead of traditional, time consuming

and expensive analysis techniques. Until this date, there is no

standard method that has been explored or reported, for finding

out the adulteration of sandalwood oil.

The application of near infrared (NIR) spectroscopy

combined with chemometric techniques is a relatively new

approach to determine authenticity and to quantify the adul-

teration of essential oils. Recent reports reveal that NIR spec-

troscopy along with chemometrics is widely applied for the rapid

quantitative analysis of a wide range of vital constituents in food

and agricultural products.20 Oliveira et al. proposed partial least

square regression calibration models based on Fourier Trans-

form NIR measurements to evaluate the quality of hydrated

ethyl alcohol fuel and to detect its adulteration with methanol.21

Christy et al. studied NIR spectroscopy to detect and quantify

adulteration of olive oil with soybean, sunflower, corn, walnut

and hazelnut oils.22 Bewig et al.23 and Chen et al.24 used the NIR

spectral profiles to predict quality parameters of vegetable oils.

Multivariate analyses like Principal Component Regression

(PCR)25 and Partial Least Square Regression (PLSR)26,27 have

been applied to NIR spectrometry for quantitative analysis to

extract vital information through non-destructive methods.28

In the present study, both PCR and PLSRmethods are applied

to NIR spectra of pure sandalwood oil and oil adulterated with

various proportions of castor oil. These two multivariate tech-

niques could provide better accuracy, precision and significantly

more information in considerably less time than previous data

analysis methods. To the best of our knowledge, there is no

attempt other than this till now to use near infrared spectroscopy

(NIRS) along with multivariate regression methods for esti-

mating the quantity of adulterants viz. castor oil in sandalwood

oil.

Sample collection and experimental analysis

Samples

Pure sandalwood oil and castor oil (batch nos. 5BB 801622, 5BB

900502, 231) were procured from Khadi Gramodyog Bhavan,

Khadi and Village Industries Commission, Govt. of India.

Source of procurement from Govt. of India food and oil regu-

latory agencies ensures the authenticity of samples.

Chemicals

The solvent (carbon tetrachloride) used in this study was

obtained from Merck. The reagent used is analytical grade

without further purification.

Instrumentation

UV/VIS NIR Spectrophotometer of Cary 5000 (Sl. No:

EL03127331, www.varianinc.com) with a Pbs detector,

wavelength range from ca. 175–3300 nm and 0.01 nm resolution

is used to capture the spectra. A quartz window of 1 mm path

length Camloc cell is used as a sample holder. Serial port

communication is used to capture the raw spectral data. The

monochromator and sample compartments have separate

nitrogen purging capabilities, allowing the sample compartment

to be purged at a higher rate than the instrument.

Sample preparations

The samples are stored in hermetically sealed aluminum bottles

in the dark at 4 �C. The samples are brought to ambient

temperature of 20 �C eight hours prior to measurement. Using an

electromagnetic stirrer, the sandalwood and castor oil samples

are homogenized with proper solvent (1 : 10 v/v) for 15 min in

two separate conical flasks with stoppers. Proper precautions are

taken to avoid loss/change during the process. Samples are

prepared by adding percentile standard low-grade oil in solvent

with standard sandalwood oil in the same solvent. The relative

castor oil fraction (% v/v) in the samples varies from 0 to 100%

(refer to the supplementary table provided†). The oil samples are

blended under normal temperature and pressure. Thus, a set of

56 samples ranging from 0 to 100% (v/v) percentile is prepared.

Out of these, 45 samples are used for calibration and an inde-

pendent set of 11 samples with percentage ranges 0, 1, 5, 8, 12, 20,

25, 50, 70, 85, and 90% are used for prediction respectively.

The samples are labeled as ‘calibration set’ and ‘prediction set’

separately. To ensure a wide range of coverage, proper care is

adopted as norms set by the chemical sample-preparation

procedure. All the samples are kept in glass bottles and stored in

the dark at 3–4 �C. All measurements are carried out at 20 �C in

closed rooms.

Spectral acquisition

Thirty-two scans are performed at 1 nm intervals within the

wavelength range of 700–2200 nm to capture the spectra. The

time to acquire scans is approximately 28 s. The mean spectrum

is computed from the collected data. Background spectra with

reference sample are collected for every sample immediately

before the collection of the sample single-beam spectrum. The

sample spectrum is automatically ratioed against the background

spectrum and that spectrum is automatically stored in the

computer. The spectral data are transformed into ASCII format

by Varian software equipped with the spectrometer. In the

experiment, all of the spectra are recorded in absorbance mode.

During the experiment, the sample cell components are cleaned

with hexane. Thereafter with warm water, rinsed with deionised

water and then with CCl4 at room temperature to avoid oil build-

up on the cell windows. Components are dried using tissue paper.

During experimentation, the quartz cell is dried by exposing it to

a natural source of light to avoid any water film stuck on it

during washing since the presence of OH-groups will influence

the shape of the spectra and hinder the spectral features.

Calibration and quantitative analysis are performed using

PLSR and PCR methods. The root mean square error of cross-

validation (RMSECV) values are calculated for each factor with

the ‘leave-one-out’ cross-validation to determine the optimal

number of factors to be included in the calibration model.

This journal is ª The Royal Society of Chemistry 2010 Analyst, 2010, 135, 2676–2681 | 2677

Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

Chemometrics

Data analysis. Chemometric analysis including detection and

quantification are performed with a Pentium 4 Laptop computer

utilizing PLS Toolbox 5.8.1, March 2010 (The Eigen vector Inc.)

that works under Matlab 7.0.1 environment (Mathworks,

Natick, USA). Detection is performed by principal component

regression technique. Quantification of castor oil adulteration

levels is calculated by partial least square regression. These

involve a calibration step in which the relationship between

spectra and component concentrations is estimated from a set of

reference (measured) samples and a prediction step in which the

results of the calibration are used to estimate the component

concentrations from an unknown sample spectrum.28

Principal component regression (PCR). PCR is the combina-

tion of Principal Component Analysis and Multiple Linear

Regression. Through PCA, the larger number of variables is

reduced to have real contributed components called principal

components that contain most information.29 This is a well-

known technique of multivariate analysis.30–33 In the second step

of PCR, a multiple linear regression is performed on the scores/

loadings obtained in the PCA technique.

Partial least square regression (PLSR). PLSR is another well-

known regression technique for multivariate data, principally

applied for prediction.34 This method is especially useful when

(i) the number of predictor variables is similar to or higher than

the number of observations and (ii) predictors are highly corre-

lated. This tool is applicable when there is partial knowledge of

data, an example being the measurement of protein in wheat by

NIR spectroscopy. The interference and overlapping of the

spectral information may be overcome by PLS techniques to

a certain extent. PLS is a method that uses the full spectral region

selected and is based on the use of latent variables.

Model selection. The model is built by cross-validation method

during the calibration developments. The optimum number of

principal factors can be selected by cross-validation, employing

the cancellation of one sample at a time. This is done by plotting

the number of factors against the root mean square error of

cross-validation (RMSECV) and from this, the optimum number

of factors is selected28,35 for both PCR and PLSR models.

The best model selected is used to determine the concentration

of the samples in the independent prediction set. The relative

performance of the established model is accessed by the root

mean square error of calibration (RMSEC), RMSECV and

multiple coefficient of determination or regression coefficient.

(R2).36 The predictive ability of the model is evaluated from the

root mean square of prediction (RMSEP).37 The lower the

RMSEP value, the higher the degree of accuracy of the predic-

tion result provided by the calibration model.38

Modeling and data pre-processing are carried out using PLS

toolbox 5.8.1, Eigenvector Research39 supported on Matlab.40

The NIRS data from the spectrometer may contain background

information and noise in addition to sample information. Hence,

to obtain reliable, accurate and stable calibration models, it is

necessary to pre-process spectral data before modeling. The pre-

processing methods, in this study, are chosen based on prior

knowledge for each spectroscopic technique combined with

different permutations.31,37,41

Results and discussion

NIR spectra

Fig. 1 shows the average response of the acquired NIR absorp-

tion spectra for pure and blended mixtures of sandalwood oil

over the spectral range of 700–2200 nm at 1 nm spacing. (Spectra

of 45 samples with different relative fractions 0–100% (v/v) of

castor oil in clean sandalwood oil.)

It could be observed that the oil spectra are nearly identical

which makes the calibration problem non-trivial. However, there

are a few subtle but systematic differences in these spectra that

might be amplified by various pre-processing techniques. Fig. 2

shows the NIR spectra of pure sandal oil and castor oil.

Spectra investigation

According to former studies performed on various essential oils,

the NIR spectra of the analyzed oil samples are dominated by

overtones and different combinations of CH stretching and

bending vibrations occurring between 1000 and 2498 nm.42 There

has been much debate as to the importance of finding those

wavelengths that contain significant information, thus reducing

the number of wavelengths, variables, and model complexity. In

this work, the spectral region 700–2200 nm is selected to reduce

the number of insignificant variables and hence the model

complexity.

From Fig. 2, it is observed that the peaks are present at

1179.78, 1387, 1693, 1730, and 1861 nm. Absorption bands

observed at 1179 nm are due to methylene (CH) stretching [2nd

(3n) overtone and 2n combination bands (1135–1215 nm)]. The

peak at 1387 nm was related to methyl (CH) stretching and

bending combination [2nd (3n) overtone and 2n combination

Fig. 1 NIR spectra of pure and blended mixture of sandalwood oil

(at 20 �C; relative castor oil fraction 0–100% (v/v) in pure sandal oil; the

top spectrum represents 0% adulteration, the bottom spectrum represents

100% adulteration and 1–99% adulterations are in order from top to

bottom).


Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

bands (1375–1399 nm)]. The two peaks around 1693 and

1730 nm are associated with methyl and methylene asymmetric

stretching respectively. The peak centred near 1861 nm is unique

to molecular water (OH) combination vibration.43,44 The report

reveals that the 1st (2n) overtone CH stretching bands are at

1690–1695 nm and 1725–1731 nm.

Spectral pre-processing

The data set is loaded as a matrix with X – block and Y – block.

Most of the peaks are observed in the wavelength range

1100–1900 nm. The changes in the spectral regions

1350–1450 nm and 1550–1850 nm are exploited since significant

differences between NIR spectra of sandal and castor oils are

observed in this region. Hence, more emphasis is given to this

region for extracting required information through the optimal

calibration model.

In this study, several spectral pretreatments including auto-

scale, mean centre, none (without pre-processing), multiplicative

scatter correction (msc) and smoothing (Savitzky–Golay filters)

coupled with autoscale and with mean centre are investigated

(see Table 1). The root mean square error of calibration

(RMSEC), the root mean square error of cross-validation

(RMSECV), the root mean square error of prediction (RMSEP)

and the coefficient of determination (R2) are used to investigate

the methods and for model development.

The qualities of the results are compared using RMSEC/

RMSECV, RMSEP and R2 values. Since the Savitzky–Golay

method (window 15 pts, order 2) coupled with autoscale

produced the lowest RMSEC/RMSECV and RMSEP values and

the highest R2 value, this pre-processing method is chosen as the

best. Other pre-processing methods have not yielded good result

for this application since these produced comparatively high

RMSEP/RMSECV values (low values yield good results) and

low R2 values (high value is good).

Calibration and cross-validation

Optimum number of components. A calibration and quantita-

tive analysis is performed using PCR and PLS methods. Forty-

five samples are used to develop the calibration and eleven

independent samples are used as a prediction set for both

methods. To determine the optimal number of factors to be

included in the calibration model, the RMSECV/RMSEC values

are calculated using the Leave-One-Out (LOO) cross-validation.

The number of principal components (PCs)/latent variables

(LVs) to be used in each case is determined by the lowest

Fig. 2 NIR spectra of pure sandal oil and pure castor oil (as an adul-

terant).

Table 1 RMSECV/RMSEP values for PCR and PLS with variouspre-processing methods

Pre-processing method

RMSECV/RMSEP (% v/v)

PCR PLSR

Msc (mean) 0.0906/0.07791 0.0993/0.0700Autoscale 0.03445/0.03170 0.03440/0.0315Mean centre 0.03426/0.03158 0.03420/0.03148None 0.03378/0.02935 0.03371/0.02951Smoothing (Savitzky–Golay) +

mean centre0.002794/0.0213 0.003314/0.00998

Smoothing (Savitzky–Golay) +autoscale

0.002592/0.01364 0.002888/0.01355

Fig. 3 (a) Principal components and RMSECV,RMSEC through PCR

for sandal oil, and (b) latent variables and RMSECV,RMSEC through

PLS for sandal oil.

Fig. 4 (a) Trends in principal components in adulteration (relative

percentile 0–100% v/v), and (b) trends in latent variables in adulteration

(relative percentile 0–100% v/v).


Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

RMSECV, RMSEC.37,45 The PC/LV vs. RMSEC plots are

shown in Fig. 3.

From Fig. 3, it is clear that the optimum number of PCs/LVs

that could be suggested is 2 for both PCR and PLS models.

Table 1 shows the RMSECV, RMSEP values with 2 PCs for

PCR and 2 LVs for PLS.

Model building using pre-processed data. Two calibration

models are built in order to predict the adulterant content in

blends with sandalwood oil using the pre-processed data, namely

PCR and PLSR. The cumulative variance for the first two

components (99.99%) for both themodels is found to be the same.

The first two PCs/LVs account for 99.99% of the variation in

the spectra. In PCR, PC1 explains 96.62% and PC2 explains

3.37%. In PLSR, LV1 explains 98.19% and LV2 explains 1.80%

of the total variance between the samples. Fig. 4 shows the first

two PC/LV scores plotted in a scatter diagram for both the

models.

From Fig. 4, it is observed that 45 samples with different

adulterant concentrations (0–100%, v/v) in pure sandalwood oil

are grouped into two classes. The first group with negative scores

values represents the samples with less adulterant contamination

and more sandal oil.

The second group with positive scores values indicates

samples with more adulterant contamination (>50%) and less

sandal oil character. It is seen that the PLSR model can be

used to separate the samples (pure and blended) in a better

way; even 1% of adulteration in sandalwood oil could be

identified.

Prediction/validation by the models

The Fig. 5 reflects the accuracy and the performances of the

models. The plot of the measured values of concentrations

against the predicted values of concentrations reveals the

accountability of the models.

The statistics of results obtained from the calibration models is

shown in Table 2 below.

The correlation coefficient (R2) is the intensity measure of

the correlation between the measured values and the values

predicted by the model. This may range from 0 to +1. The

closer the value to +1, the higher the correlation between the

data.38 Both PCR and PLSR models shown, in this study,

have very good correlation between the real and predicted

concentrations with coefficient of determination (R2) values

equal to 0.99985 and 0.99986 respectively, a good linear fit

(see Fig. 5). For the two models presented here, the number

of variables significantly reduced to 2 principal/latent variables

that could explain 99.99% of the total variances. It is also

revealed that the RMSEP value (0.01364 for PCR and

0.01355 for PLSR) for each model is minimum. The lower

RMSEP value has a higher degree of accuracy of prediction

by the model.37,38 Both PCR and PLSR give almost the same

R2 value. On the closest examination of the scores plot,

RMSEP and R2 values, the PLSR model is found to be the

best.

Conclusion

In this work, near infrared (NIR) spectroscopy combined with

chemometric techniques is used for screening analysis to

identify sandal oil samples adulterated with low-cost and low-

grade oils like castor oil. This method is accurate and reliable

to detect a deceit and can assist the laboratories, the service of

inspection and quality control of essential oils. For the models

proposed in this work, PCR and PLSR show the lowest

RMSECV and RMSEP values and high correlation between

the measured and predicted concentrations. The methodology

of NIR spectra associated with PCR and PLS techniques is

proven to be suitable as a practical analytical tool to predict

the adulterant content in sandal oil in the range 0–100% (v/v).

Even 1% of contamination can be measured precisely. This

technique may be used for discriminating the counterfeit

effectively. It is also observed that the shift of samples in one

quadrant from the other in the scores plot reflects the real

percentile of adulteration. This information is very helpful in

determining the percentage of adulteration in a non-destructive

manner.

Based on the above findings, for the future we suggest NIR

spectroscopy through chemometrics as a detection tool for

quantitative as well as qualitative analysis of adulterations in

essential oils.

Fig. 5 (a) PCR model measured vs. predicted sandal oil, and (b) PLS

model measured vs. predicted sandal oil.

Table 2 The prediction summary of PCR and PLSR modelsa

Statistical parameters PCR PLSR

RMSEC 0.0011 0.002051RMSECV 0.002592 0.002888RMSEP 0.01364 0.01355Bias 0.002361 0.001920R2 Cal 0.99982 0.99931R2 CV 0.99978 0.99937R2 Pred 0.99985 0.99986

a PCR: Principal Component Regression; PLSR: Partial Least SquareRegression. RMSEC: Root Mean Square Error of Calibration;RMSECV: Root Mean Square Error of Cross-Validation; RMSEP:Root Mean Square Error of Prediction; R2: coefficient of determination.


Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

Acknowledgements

The authors would like to thank Mr Jeremy M. Shaver, Chief of

Technology Development, Help desk, Eigenvector Research,

Inc. and Mr S. Valiathan, Quaero, CSG systems, Inc, Bellevue,

WA, USA for the technical assistance and help.

References

1 H. Schulz, M. Baranska, H. H. Belz, P. Rosch, M. A. Strehle andJ. Popp, Vib. Spectrosc., 2004, 35, 81.

2 J. Verghese, T. P. Sunny and K. V. Balakrishnan, Flavour FragranceJ., 1990, 5, 223.

3 C. G. Jones, E. L. Barbour, E. L. Ghisalberti and J. A. Plummer,Phytochemistry, 2006, 67, 2463.

4 G. A. Burdock and I. G. Carabin, Food Chem. Toxicol., 2008, 46, 421.5 B. M. Lawrence, Perfumer and Flavorist, 1991, 16, 49.6 Anonymous, Sandalwood, in The Review of Natural Products, ed. C.Dombek, Wolters Kluwer Co., St. Louis, MO, 1993.

7 G. A. Burdock, Sandalwood white, Fenarolis Handbook of FlavorIngredients, CRC Press, Boca Raton FL, 4th edn, 2002, p. 1684.

8 N. A. Braun, M. Meier and W. M. Pickenhagen, J. Essent. Oil Res.,2003, 15, 63.

9 A. Y. Leung and S. Foster, Sandalwood Oil, Encyclopedia ofCommon Natural Ingredients used in Food, in Drugs andCosmetics, John Wiley and Sons Inc, New York, 2nd edn, 1996.

10 T. H. Kim, H. Ito, K. Hayashi, T. Hasegawa, T. Machiguchi andT. Yoshida, Chem. Pharm. Bull., 2005, 53, 641.

11 S. Arctander, Sandalwood Oil in East India, in Perfume and FlavorMaterials of Natural Origin, Denmark, 1960, p. 574.

12 D. P. Anonis, Perfumer Flavorist, 1998, 23, 19.13 R. E. Naipawer, in Flavors and Fragrances, A World Perspective

Proceedings of the 10th International Conference of Essential Oils,Flavor Fragance, ed. B. M. Lawrence, B. D. Mookherjee andB. J. Willis, Elsevier, Amsterdam, 1988, p. 805.

14 U.S. Dispensatory, Lippincott, Philadelphia, 25th edn, 1955, p. 1836.15 British Pharmaceutical Codex, The Pharmaceutical Society, London,

1949, p. 612.16 Food Chemicals Codex, National Academy Press, Washington DC,

3rd edn, 1981, p. 268.17 International Organisation for Standardisation, 1st edn, ISO 3518,

1979.18 International Organization for Standardisation, ISO 3793, 1976.19 International Organization for Standardisation, 2nd edn, ISO 3518,

2002.20 M. D. Guillen and N. Cabo, J. Agric. Food Chem., 1997, 45, 1.

21 J. S. Oliveira, R. Montalvao, L. Daher, P. A. Z. Suarez andJ. C. Rubim, Talanta, 2006, 69, 1278.

22 A. A. Christy, S. Kasemsumran, Y. P. Du and Y. Ozaki, Anal. Sci.,2004, 20, 935.

23 K. M. Bewig, A. D. Clarke, C. Roberts and N. Unklesbay, J. Am. OilChem. Soc., 1994, 71, 195.

24 Y. S. Chen and A. O. Chen, in Proceedings of the 6th InternationalConference on Near Infrared Spectroscopy, ed. G. D. Batten, P. C.Flinn, L. A Welsh and A. B Blakeney, J. Crainger of Johnzart PtyLtd, Victoria, Australia, 1995, p. 316.

25 M. Marjoniemi, Appl. Spectrosc., 1992, 46, 1908.26 T. Fearn, NIR News, 2002, 13, 12.27 J. Guiteras, J. L. Beltran and R. Ferrer, Anal. Chim. Acta, 1998, 361,

233.28 H. Martens and T. Naes,Multivariate Calibration, Wiley, Chichester,

1989.29 J. E. Jackson, J. Qual. Tech., 1980, 12, 201.30 R. G. Brereton, Chemometrics Data Analysis for the Laboratory and

Chemical Plant, Wiley, Chichester, 2003.31 D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and

L. Kaufman, Chemometrics. A Textbook, Elsevier Science Publishers,Amsterdam, 1988.

32 T. Naes and H. Martens, J. Chemom., 1988, 2, 155.33 S.Wold, K. Esbensen and P. Geladi,Chemom. Intell. Lab. Syst., 1987,

2, 37.34 P. Geladi and B. Kowalski, Anal. Chim. Acta, 1986, 185, 1.35 T. Naes, T. Isaksson, T. Fearn and A. M. Davies, A User Friendly

Guide to Multivariate Calibration and Classification, NIRpublications, UK, 2002.

36 L. Wang, F. S. C. Lee, X. R. Wang and I.E., Food Chem., 2006, 95,529.

37 O. Divya and A. K. Mishra, Talanta, 2007, 72, 43.38 C. N. C. Corgozinho, V. M. D. Pasa and P. J. S. Barbeira, Talanta,

2008, 76, 479.39 B. M. Wise, N. B. Gallagher, R. Bro and J. M. Shaver, PLS Toolbox

5.8.1, Eigenvector Research, 2010.40 Mathlab 7.0.1., Mathworks, Natick, USA.41 K. Kramer and S. Ebel, Anal. Chim. Acta, 2000, 420, 155.42 H. Schulz, H. H. Drews and H. Kruger, J. Essent. Oil Res., 1999, 11,

185.43 J. J. Workman, Technology and Applications Development for

Molecular Spectroscopy and Microanalysis, Thermo ElectronCorporation, NIR correlation chart: How to interpret NIR fora variety of applications.

44 F. Westad, A. Schmidt and M. Kemit, J. Near Infrared Spectrosc.,2008, 16, 265.

45 Q. Chen, J. Zhao, H. Zhang and X. Wang, Anal. Chim. Acta, 2006,572, 77.


Dow

nlo

aded

by C

SIR

MA

DR

AS

CO

MP

LE

X(C

SIR

M)

on 2

3 S

epte

mber

2010

Publi

shed

on 0

3 S

epte

mber

2010 o

n h

ttp:/

/pubs.

rsc.

org

| doi:

10.1

039/C

0A

N00261E

View Online

This article appeared in a journal published by Elsevier. The attached

copy is furnished to the author for internal non-commercial research

and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or

licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the

article (e.g. in Word or Tex form) to their personal website or

institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are

encouraged to visit:

http://www.elsevier.com/copyright

Author's personal copy

Analytical Methods

Qualitative and quantitative analysis in sandalwood oils using near infrared

spectroscopy combined with chemometric techniques

Saji Kuriakose 1, Hubert Joe ⇑

Centre for Molecular and Biophysics, Mar Ivanios College, Thiruvananthapuram, Kerala, India

a r t i c l e i n f o

Article history:

Received 30 April 2011

Received in revised form 16 May 2011

Accepted 15 April 2012

Available online 21 April 2012

Keywords:

Essential oils

Sandalwood oil

Near infrared spectroscopy

Chemometric techniques

Support vector machine regression

Qualitative

Quantitative differences

a b s t r a c t

Sandalwood oil is an essential oil which finds very wide application in the flavor and fragrance, pharma-

ceutical industry. The objective of this study is to use the potential of near infrared spectroscopy as a

rapid analytical technique for the qualitative and quantitative assessment of purity in sandalwood oils.

The quality and efficacy of sandalwood oils, even though come from the same species, are somewhat dif-

ferent according to growing conditions (origin) and poor extraction methods. Classification of sandal oils

based on their NIR spectra is performed by principal component analysis, hierarchical cluster analysis

and self organising map (Kohonen neural network). All these techniques clearly differentiate the oils

according to the area from which the sandalwood has been cut. Support vector machine regression

(SVM R) is used to predict the purity of the oils.

Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction

Sandalwood oil is an essential oil obtained by the distillation of

the heartwood and roots of the plant Santalum album (family – San-

talaceae). S. album is a small hemiparasitic tree of great economic

value, growing in Southern India, Sri Lanka, Australia and Indonesia.

Its trunk contains resins and essential oils particularly the a and b-

santalols, santalenes and many other minor sesquiterpenoids

(Jones, Ghisalberti, Plummer, & Barbour, 2006). These sesquiterpe-

noids are responsible for the unique sandalwood fragrance.

Sandalwood oil is used in the food industry as a flavour ingredient.

This oil serves as a fixative for many high – end perfumes. A number

of aromatic and phenolic compounds have also been identified in

the oil S. album (Kim et al., 2005). The quantity of oil produced in

a tree varies considerably according to location (environmental fac-

tors) and age of the tree, even in nearly identical growing conditions

(Jones, Plummer, & Barbour, 2007). It should also be noted that

santalol composition can vary depending on the method of oil

extraction (Piggott, Ghisalberti, & Trengove, 1997). There has been

serious decline in the population of santalum in India due to com-

plex cultivation requirements and non-stop harvesting (especially

from smuggling) associated with limited regeneration (Fox, 2000;

Radomiljac, Ananthapadmanabha, Welbourn, & Rao, 1998).

Sandalwood oil is approved for food and flavour uses by Council

of Europe (CoE, 2000), Flavour and Extract Manufactures Associa-

tion (FEMA) and the United States Food and Drug Administration

(FDA). The sandalwood oil specifications have been reported in

the Food Chemicals Codex (FCC, 2003). The international standard

(ISO 3518, 2002) for sandalwood oil and similar authorities stipu-

late a minimum of 90% w/w santalol (as free alcohol) in the oil (a-santalol comprising approximately 60% and b-santalol comprising

approximately 33% of total santalol) (British Pharmaceutical Co-

dex, 1949; ISO 3518, 2002). Sandalwood oils with santalol level

below these specifications are of inferior quality due to poor

extraction methods, adulteration with synthetic or semi-synthetic

substitutes or some other counterfeit substitution.

Kumar and Maddan (1979) have reported the recommended

iodine value as 283–288 to identify adulteration in pure sandal-

wood oil. Verghese, Sunny, and Balakrishnan (1990) have

suggested, using gas chromatography (GC), 40–55% of a-santaloland 17–27% of b-santalol in the total santalol content (w/w). GC

analysis of S. album oil by ISO 3518 (2002) specifies similar propor-

tions of Z–a-santalol (41–55%) and Z–b-santalol (16–24%) (ISO

3518, 2002). However, these reports do not discuss the potential

variation in S. album oil composition depending on the origin or

age of the tree, nor do they address the evaluation of counterfeit.

The quality and efficacy of the sandalwood oil, even from same

species, are somewhat different according to growing conditions

0308-8146/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.foodchem.2012.04.073

⇑ Corresponding author. Address: Centre for Molecular and Biophysics Research,

Department of Physics, Mar Ivanios College, Thiruvananthapuram 695 015, Kerala,

India. Tel.: +91 471 2531053; fax: +91 471 2530023.

E-mail addresses: [email protected], [email protected] (H. Joe).1 Present address: St. Thomas H.S.S., Pala, Kerala, India.

Food Chemistry 135 (2012) 213–218

Contents lists available at SciVerse ScienceDirect

Food Chemistry

journal homepage: www.elsevier .com/locate / foodchem


based on geographical origin. This enforces the requirement of a ra-

pid and accurate analytical method for the correct value estimation

based on origin and for the prevention of illegal distribution. How-

ever the existing analytical tools are not sufficient to determine the

geographical origin clearly as they are time consuming, complex

and tedious. Since sandalwood oil contains more than 100 major

components that are slightly different according to growing condi-

tions viz. geo. origin, we can not select several specific components

as essential criteria. In view of the current issues associated with

sandalwood oil, we conduct an investigative study to develop an

appropriate tool to assess the quality of sandalwood essential oils.

Near Infrared spectroscopy (NIRS) in combination with sophisti-

cated chemometric algorithms can be an excellent tool for quality

control purposes and selection of high qualitymaterials. NIR pattern

recognition method was applied for the discrimination of roasted

coffees (Martin, Pablos, & Gonzalez, 1996) and vegetable oils (Be-

wig, Clarke, Roberts, &Unklesbay, 1994). These proposed researches

were based on the classification of samples with various chemical

constituents. In our previous study,we have developed two reliable,

accurate and non- destructivemodels - principal component regres-

sion (PCR) and partial least square regression (PLSR)models – to de-

tect and quantify caster oil adulteration in pure sandalwood oil

through near infrared spectral data collected from blended sandal

oil samples (Kuriakose, Thankappan, Joe, & Venkataraman, 2010).

The objective of the present work is to apply NIRS along with

chemometric techniques like principal component analysis (PCA)

(Wold, Esbensen, & Geladi, 1987) to ascertain discrimination of

sandal oils. Also pattern recognition technique namely hierarchical

cluster analysis (HCA) (Armenta, Garrigues, & Guardia, 2007) and

self organising map (SOM) (Kohonen, Oja, Simula, Visa, & Kangas,

1996) are applied to classify sandal oils from same species

(chemotypes) but different geographical origin. Support vector ma-

chine regression (SVM R) (Vapnik, 1995) is used to quantify the %

level of counterfeits in the oils.

2. Materials and experimental methods

2.1. Materials

Five sandalwood oil samples are procured from three different

geographical origins (three regions of two states, Kerala and Karna-

taka in India) consisting of the same species. The samples are clas-

sified and named into five groups as A–E. Sample A is acquired

from Kairali, Arts and Crafts, Kerala Government. Samples B and

C are obtained from two different sandalwood factories in Mysore,

Karnataka State. Samples D and E are collected from two sandal-

wood industries in Banglore, Karnataka State. All samples are

stored at 4 °C in aluminium bottles and protected from light until

they are analysed. None of them are subjected to any treatment

as these may change their composition.

2.2. Samples preparation

At least 8 h prior to spectroscopic measurement, the samples

are brought to ambient temperature (20 °C). Each sample is diluted

to 10% with proper solvent (carbon tetrachloride (v/v)) and homog-

enised using an electromagnetic stirrer. Stringent protocols have

been applied to avoid any sample loss or change. A total of 49 sam-

ples are prepared from all the classes. (10 samples each from 4

groups A–D and 9 samples from group E).

2.3. NIR spectra collection

Near infrared spectra of 49 samples are recorded over 800 nm –

2500 nm spectral region, at 1 nm spacing with a NIR spectropho-

tometer Cary 5000 (SI No. EL 03127331) with a resolution of

0.01 nm. The spectra are collected in 1 nm data intervals. An

average spectrum of a number of spectra for each sample is

obtained. All the spectra are recorded in absorbance units. The

sample spectra are divided into training set (n = 39) and test set

(n = 10), comprising those from each group.

2.4. Chemometrics and data analysis

PCA, HCA, SOM and SVM R are performed using algorithms from

PLS Toolbox 6.0.1 supported by Matlab environment (Wise, Galla-

gher, Bro, & Shaver, 2010; Matlab, 2010a, 2010). (In this study, the

unsupervised methods PCA, HCA and SOM are used for model com-

parison and confirmation of the results obtained). A Proper prepro-

cessing technique namely smoothing (Savizky–Golay filters)

coupled with mean centre is used to remove background noise

and to increase spectral resolution (Savizky & Golay, 1964). Leave

– one-out (LOO) Cross-validation is used to calibrate the model.

3. Results and discussion

3.1. Near infrared absorbance spectra

The near infrared absorption spectra of 5 classes of sandalwood

oils (collected from 3 different geographical origins) over the spec-

tral range 800–2500 nm at 1 nm spacing are measured and are

converted to ASCII files using Varian software. (Spectra of 49 sam-

ples; 10 samples each from classes A–D, 9 samples from Class E).

3.1.1. Spectra investigation

The near infrared region (780–2498 nm) is dominated by over-

tones and combination bands arising from the unharmonic nature

of molecular vibrations. Absorption in the NIR region arises from

the vibrational motion of molecules. According to former studies

performed on essential oils, the majority of the absorption bands

in the near infrared spectra of the analysed oil samples arise from

the overtones of hydrogenic stretching vibrations or combination

involving stretching and bending modes (Schulz, Prews, & Kruger,

1999). The absorption bands observed at 1135–1215 nm are due to

methylene (CH) stretching (2nd (3n) overtone and 2n combination

bands). The peaks at 1375–1399 nm are related to methly1 (CH)

stretch and bending combination (2nd (3n) overtone and 2n com-

bination bands). The peak around 1690–1695 nm is associated

with methyl asymmetric stretching (1st (2n) overtone (CH)

stretch) respectively (Westad, Schmidt, & Kemit, 2008). The in-

tense peaks centered near 2308 and 2348 nm are related with

combination of CH stretching vibration and deformation tones.

(Hourant, Baeten, Morales, Meurens, & Aparicio, 2000).

3.1.2. Wavelength selection

There has been much debate as to the importance of selecting

those fewwavelengths that contain significant information for opti-

malmodel development, thus reducing the number of wavelengths,

variables and model complexity. Recently researches are conduct-

ing to investigate the importance of combining some wavelengths,

(synergic) to those containing problem dependent information

(descriptive wavelengths) to improve the model performance.

There are considerable differences observed in the spectra of all

samples in the wavelengths range 1650–2500 nm. Also water

(20 °C) has overtones at 1450, 970 and 760 nm. A combination of

OH stretching and bending occurs around 1940 nm. The presence

of water content hinders the model performance for this applica-

tion. Taking into account the above circumstances, the full spectrum

range is divided into two regions; 1650–1900 and 2000–2500 nm

214 S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218


and combined. This combined spectral region is exploited for the

model development.

3.2. Principal component analysis (PCA)

Spectral data contains variables that hold in redundant infor-

mation. These variables are collinear (i.e. highly correlated). Due

to the high collinearity, a new co-ordinate system based on vari-

ance is required to represent the original data.

Principal component analysis (PCA) can be thought of as an

algorithm that finds principal components (PC’s) that either maxi-

mise variance or minimise the sum of squares of the residuals of

the samples (Esbensen, Schonkopf, & Midtgaard, 1994). PCA will

continue to generate PC’s which are orthogonal to each other.

The orthogonally property of the PC’s removes collinearity among

the spectral variables without eliminating spectral information.

PCA is done here to graphically determine grouping patterns

(based on geographical origin) in an effort to classify (qualitative

difference) sandalwood oil samples according to percentage of

adulterants. In addition, this pattern recognition technique is used

in order to observe the similarities among different oil samples.

PCA is carried out in the ranges of the selected spectral region

1650–1900 and 2000–2500 nm at 1 nm intervals as discussed ear-

lier. Out of the 49 sample spectra recorded, 39 spectra involving all

classes are randomly selected for PCA.

3.2.1. Spectral preprocessing

Before performing PCA, all of the samples spectra are pre-pro-

cessed to reduce noise and enhance the spectral features. Various

pre-processing techniques like first derivative, second derivative,

none (without pre-processing), smoothing (Savizky–Golay filters),

autoscale, mean centre, multiplicative scatter correction and their

various combinations are investigated. The root mean square error

of calibration (RMSEC) and the root mean square error of cross-val-

idation (RMSECV) are used as tools to investigate the pre-process-

ing methods and to select the best. The qualities of the results are

compared using RMSEC values. Since smoothing (Savizky–Golay

filters order – 2 and window 15 points) coupled with autoscale

produced the lowest RMSECV/RMSEC values, this method is chosen

as the best. Other pre-processing techniques have not yielded good

result for this application.

3.2.2. Calibration and cross validation – optimum number of

components

In order to determine the optimum number of PC’s, one must

examine the percent variance captured (high is good) or the RMSEC

values (low is good). For this example, 2 components are enough to

reach a stable minimum RMSEC (=0.413% v/v) value. Hence 2 PC’s

are used to optimise the PCA model performance and minimise the

model errors caused by the under fitting and over fitting the data.

3.2.3. Model development using pre-processed data

PCA model with 2 PC’s is built to group 5 classes of sandal wood

oils of different origin using the pre-processed data. The first 2 PC’s

account for 82.49% of the data matrix variance. PC1 explains

52.60%, PC2 explains 29.89%. For PC1, the eigen value of covariance

is 3.94 e + 002 and for PC2, it is 2.24 e + 002. Fig. 1 shows the first 2

PC scores plotted in the scatter diagram.

From Fig. 1, it is observed that the five classes A–E of sandal-

wood oils are distributed in four quadrants. Oils from class E are

on the bottom left quadrant in the negative PC1; oil classes B

and D are on the bottom right quadrant in the positive PC1; oils

from class A lies in the top right quadrant in the positive PC2; oil

class C lies in the top left quadrant in the positive PC2.

We also see that the class E is the most unique group in the data

sets. The distance between E and its nearest neighbour is further

than that between E and any other class on the score plot. A trend

that stretches from class C to class B is also seen. We expect that

classes with similar scores should be similar. For example, classes

B and D are close together, implying that they are similar. Classes

that are diametrically opposed are ‘negatively correlated’. This sug-

gests that they measure opposite properties. For instance, a high E

corresponds to a low B value and vice versa.

The PCA on assessment shows the ability to classify the sandal-

wood oil samples into four classes instead of the actual five classes,

indicating that this model is not enough to distinguish the five dif-

ferent regions effectively. Hence unsupervised pattern techniques

like HCA or SOM is suggested.

3.3. Support vector machine regression (SVM R)

Quantification of adulterant level is calculated by SVR .The same

spectral regions used for the PCA is also used for SVR model. 39

samples are selected for calibration and 10 independent samples

involving all classes are chosen for prediction. Savizky–Golay

smoothing (order-2, window 15 pts) coupled with mean centre is

used as pretreatment method for denoising.

The support vector machine (SVM) is a novel machine learning

method based on statistical learning theory. This aims to minimise

the training error (empirical risk) and the model complexity, there-

by providing high generalisation abilities (estimation accuracy)

(Vapnik, 1995). SVM provides nonlinear and robust solutions by

mapping the input space into higher dimensional feature space

using kernel functions. SVM has been extended to solve nonlinear

regression estimation problems such as support vector regression

(SVR) which have been shown to exhibit excellent performance

(Smola, 1996). Detailed descriptions of SVR can be found in Vapnik

(1995) and Smola (1996).

It is well-known that the selection of the kernel function and

corresponding parameters plays an important role in obtaining

good forecasting .There are many types of kernel functions. In this

study, the commonly used radial basis function (RBF) is adopted.

The accuracy of SVR depends on the selection of the hyper param-

eters (cost-C and epsilon-e) and the kernel parameter (gamma-c).e is the precision parameter. Bigger e leads to fewer support vec-

tors. C is the penalty factor. It determines the trade-off between

the model complexity and the degree to which deviations larger

than e are tolerated. If C is too small (large), it may cause underfit-

ting (overfitting).

The suitable value of parameters C, e and c are obtained after

several trials. The root mean square values (RMSEC/RMSECV) are

compared (low value is good) to reach at suitable choice of

these parameters (In the present study, RMSEC = 0.007293 and

RMSECV = 0.007159 are chosen.). In this algorithm C = 31.6228,

e = 0.01, c = 0.001. The number of SVs is 3. Leave-one-out cross-

validation is used to generate the model with SVM on the input

data set. Fig. 2 shows the cross-validation optimisation.

The correlation coefficient (R) and root mean square error of

prediction (RMSEP) are used to judge the performance of the SVR

model in predicting the percentage of adulteration. A coefficient

of determination, R2 = 0.9998 (RMSEP = 0.00804) is achieved using

RBF kernel. The plot as given in Fig. 3 between actual and predicted

concentrations as well as the high R2 value suggests a better per-

formance by RBF kernel for this NIR data set.

3.4. Hierarchical cluster analysis (HCA)

HCA is an unsupervised technique for solving classification

problems. Its aim is to sort cases into clusters such that the degree

of association is strong between members of the same cluster and

weak between members of different clusters. It is a statistical tool

for selecting relatively homogeneous groups of case based on

S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218 215


measured variables. It starts with each case in a separate cluster

and then combines the clusters sequentially, reducing the number

of clusters at each step until only one cluster is left (Massart,

Vandeginste, Buydens, Jony, & Lewi, 1997).

In this study, HCA is carried out on the spectral region 1650–

1900, 2000–2500 nm at 1 nm spacing of the NIR spectra of sandal-

wood oils. The algorithm used to classify the oils samples from

their NIR spectra is the K-nearest neighbour linkage method which

calculates Mahalanobis distance.

Fig. 4 shows the results from the HCA. Five clusters can be iden-

tified. Clusters A–D contains 8 oil samples each. Cluster E consists

of 7 samples.

The dendrogram shows that the five clusters of oil samples are

well separated from each other. Cluster E is unique and is sepa-

rated from other four classes. There is no overlap between the clas-

ses of oils. This means that 100% correct classification is possible by

HCA.

3.5. Self organising map (SOM)

The self-organising map proposed by Kohonen is also suitable

and efficient for performing an unsupervised clustering. A SOM

consists of a two-dimensional grid of the computational units

called neurons in the network. SOM is similar to the PCA method

that performs dimensionality reduction and classification. The dif-

ference between the two approaches is that the SOM performs a

nonlinear lower dimensional mapping while PCA is a linear map-

ping technique. SOM is a ‘‘map’’ of the training data, dense where

there is a lot of data and thin where the data density is low. The

map constitutes of neurons located on a regular map grid. The lat-

tice of the grid can be either hexagonal or rectangular. Each neuron

(hexagon) has an associated prototype vector (weight). The algo-

rithm trains the SOM iteratively. After training, neighbouring neu-

rons have similar prototype vectors.. In each training iteration, a

sample vector X is selected from the input data set and the grid

Fig. 1. Trends in principal components in classification of sandal oils.

Fig. 2. Contour plot of cross validation accuracy for SVM regression.

Fig. 3. Actual vs. predicted concentration by using radial basis kernel of support vector machines.



node that is nearest to X (also called ‘‘best matching unit’’, BMU) is

determined. The BMU of a data vector is the unit on the map whose

model vector best resembles the data vector. In practise the simi-

larity is measured as the minimum distance (commonly evaluated

using the Euclidean metric) between data vector and each model

vector on the map. In the next step, the weight vector of the

BMU and those of its grid neighbours are moved closer to the input

vector X using the Kohonen learning rule. The result of such reor-

ganization is that similar weight vectors are brought closer to each

other while leaving apart the dissimilar ones. Implementing this

procedure iteratively forces the randomly initialized weight vec-

tors to mimic the distribution of input data patterns in the output

space.

A visual inspection of the trained SOM is provided by a widely

used method described below. The unified distance matrix

(UDM) provides significant information in the form of distances

between nodes of the SOM grid. In this method, a matrix of dis-

tances (called ‘‘U-matrix’’) between the d-dimensional weight vec-

tors of neighbouring nodes of the two-dimensional SOM is

computed. The U-matrix distances can be used to unravel the

structure of the data clusters present in the data set under study.

The density of the weight vectors is illustrative of the density of

the input data patterns. Accordingly, the UDMmeasuring distances

between the weight vectors is indicative of the said density and a

proper representation such as grey level or colour imaging can be

designed to interpret the distances between two neighbouring grid

nodes. The optimum size of the two-dimensional SOM grid is se-

lected by training the SOM with different grid sizes and pre-spec-

ified number of training iterations. The optimum grid size obtained

thereby contains an array of [10 � 10] nodes. Here, the SOM algo-

rithm is run for 10,000 training iterations. The results of the SOM-

based classification are portrayed in the form of a U-matrix plot in

Fig. 5. In this figure, the actual data points are also plotted as dark

coloured hexagons. In the U-matrix plot, a dark coloured node indi-

cates that its weight vector is at a higher distance from those of the

adjoining light coloured nodes.

The Labeled matrix is shown in Fig. 6.

Both Figs. 5 and 6 show distinct differentiation of the 5 classes

of sandalwood oils based on their origin.

4. Conclusion

The results of this research confirm that near infrared spectros-

copy can be used for the qualitative as well as quantitative analysis

of compositions of sandalwood oils. Sandalwood oils of the same

species produced by various extraction methods, either developed

or under developed and collected from different regions (origin)

can be easily discriminated by the difference in their NIR spectra.

Near infrared spectroscopy assisted by multivariate chemometric

techniques viz. principal component analysis, hierarchical cluster

analysis or self organising map can be successfully applied to the

classification of sandalwood oils according to their quality. Quanti-

fication of the constituents, especially adulterants if any, in sandal-

wood oils is achieved by support vector machine regression with

proper combinations of data pretreatments.

In a nutshell, NIR spectroscopy technique has high potential to

determine the qualitative differences and to quantify simulta-

neously the compositions of essential oils.

Acknowledgements

The authors are grateful to Mr. Xavier T. S., Research Scholar,

Center for Molecular and Biophysics Research. Mar Ivanios College,

Fig. 4. HCA dendrogram of sandal oils using Mahalanobis distances: (A–E indicate various classes of oils).

Fig. 5. Unified distance matrix or SOM neighbour distances.

Fig. 6. Labeled matrix or SOM sample hits.

S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218 217


Thiruvananthapuram, Kerala, India for the valuable suggestions

and also to STIC, Cochin University of Science and Technology (CU-

SAT), Kerala, India for the technical assistance and help.

References

Armenta, S., Garrigues, S., & Guardia, M. I. (2007). Determination of edible oilparameters by near infrared spectrometry. Analitica Chimica Acta, 596(2),330–337.

Bewig, K. M., Clarke, A. D., Roberts, C., & Unklesbay, N. (1994). Discriminant analysisof vegetable oils using near infrared spectroscopy. Journal of American OilChemists Society, 71, 195–201.

British Pharmaceutical Codex, (1949). The Pharmaceutical Society, London, pp. 611–612.

CoE (2000), Santalum album L., Natural sources of flavorings, Council of Europepublishing, Strasbourg Cedex, France, pp. 235–236.

Esbensen, K., Schonkopf, S., & Midtgaard, T. (1994). Multivariate analysis in practice.CAMO AS, Wennbergs Trykkeri AS, Trondheim, Norway.

FCC (2003). Sandalwood oil, East Indian type, Food Chemicals Codex, (5th ed.),National Academy Press, Washington DC, pp. 395–397.

Fox, J. E. D. (2000). Sandalwood: The royal tree. Biologist (London), 47(6), 31–34.Hourant, P., Baeten, V., Morales, M. T., Meurens, M., & Aparicio, R. (2000). Oil and fat

classification by selected bands of near infrared spectroscopy. AppliedSpectroscopy, 54(8), 1168–1174.

ISO 3518 (2002), Oil of sandalwood (Santalum album L.), (2nd ed.), InternationalOrganization for Standardization.

Jones, C. G., Ghisalberti, E. L., Plummer, J. A., & Barbour, E. L. (2006). Quantitative co-occurrence of sesquiterpenes; a tool for elucidating their biosynthesis in Indiansandalwood Santalum album. Phytochemistry, 67(22), 2463–2468.

Jones, C. G., Plummer, J. A., & Barbour, E. L. (2007). Non-destructive sampling ofIndian sandalwood, Santalum album L. for oil content and composition. Journalof Essential Oil Research, 19, 157–164.

Kim, T. H., Ito, H., Hayashi, K., Hasegawa, T., Machiguchi, T., & Yoshida, T. (2005).Aromatic constituents from the heartwood of Santalum album L. Chemical

Pharmaceutical Bulletin, 53, 641–644.Kohonen, T., Oja, E., Simula, O., Visa, A., & Kangas, J. (1996). Engineering applications

of the self organizing map. Proceedings of the IEEE, 84(10), 1358–1384.

Kumar, S., & Maddan, T. R. (1979). A rapid method for detecting adulteration inessential oils. Research and Industry, 24, 180–182.

Kuriakose, S., Thankappan, X., Joe, H., & Venkataraman, V. (2010). Detection andquantification of adulteration in sandalwood oil through near infraredspectroscopy. Analyst, 135, 2676–2681. http://dx.doi.org/10.1039/C0AN00261E.

Martin, M. J., Pablos, F., & Gonzalez, A. G. (1996). Application of pattern recognitionto the discrimination of roasted coffees. Analytica Chimica Acta, 320, 191–197.

Massart, D.L., Vandeginste, L.M.C., Buydens, S. Jony, P.J.D., & Lewi, S.V. (1997).Handbook of Chemometrics and Qualimetrics: Part a, Amsterdam, Elsevier.

Matlab 7.0.1, (2010), Natick, USA, Mathworks Inc.Piggott, M. J., Ghisalberti, E. L., & Trengove, R. D. (1997). Western Australian

sandalwood oil extraction by different techniques and variations of the majorcomponents in different sections of a single tree. Flavour and Fragrance Journal,12, 43–46.

Radomiljac, A.M., Ananthapadmanabha, H.S., Welbourn, R.M., & Rao, K.S. (1998),Eds., Sandal and its products, ACIAR Proceedings No. 84, Australian Centre forInternational Agricultural Research, Canberra.

Savizky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data bysimplified squares procedures. Analytical Chemistry, 36, 1627–1639.

Schulz, H., Prews, H. H., & Kruger, H. (1999). Rapid nirs determination of qualityparameters in leaves and isolated essential oils of menthe species. Journal ofEssential Oil Research, 11, 185–190.

Smola, A. J. (1996). Regression estimation with support vector learning machines.Master’s thesis: Technische, Universitot, Miinchen.

Vapnik, V. N. (1995). The nature of statistical learning theory. Verlag, New York:Springer.

Verghese, J., Sunny, T. P., & Balakrishnan, K. V. (1990). (+)-a-santalol and (ÿ)-b-santalol (Z) concentration, a new quality determinant of East Indiansandalwood oil. Flavour and Fragrance Journal, 5, 223–226.

Westad, F., Schmidt, A., & Kemit, M. (2008). Incorporating chemical bandassignment in near spectroscopy regression models. Journal of Near Infrared

Spectroscopy, 16, 265–273.Wise, B.M., Gallagher, N.B., Bro, R., & Shaver, J.M. (2010). PLS Tool box 5.8.2,

Eigenvector Research.Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis-a tutorial.

Chemoetrics and Intelligent Laboratory Systems, 2, 37–52.


Feasibility of using near infrared spectroscopy to detect

and quantify an adulterant in high quality sandalwood oil

Saji Kuriakose 1, I. Hubert Joe ⇑

Centre for Molecular and Biophysics Research, Department of Physics, Mar Ivanios College, Thiruvananthapuram 695 015, Kerala, India

h i g h l i g h t s

� Sequential spectra approach is used

to observe the wavelength relevance

on prediction.

� Synergic and descriptive relations

among consecutive wavelength

regions are studied.

� Oil authenticity and adulteration can

be detected with less than 0.0001%

error.

g r a p h i c a l a b s t r a c t

a r t i c l e i n f o

Article history:

Received 20 March 2013

Received in revised form 13 June 2013

Accepted 19 June 2013

Available online 1 July 2013

Keywords:

Oil authenticity

Economic adulteration

NIR spectroscopy

PLSR

LWR

a b s t r a c t

Determination of the authenticity of essential oils has become more significant, in recent years, following

some illegal adulteration and contamination scandals. The present investigative study focuses on the

application of near infrared spectroscopy to detect sample authenticity and quantify economic adultera-

tion of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using

partial least square regression (PLSR). The quantitative data analysis is done using a new spectral

approach – full spectrum or sequential spectrum. The optimum number of PLS components is obtained

according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root

mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of

determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method,

locally weighted regression (LWR), is added to extract nonlinear information and to compare with the

linear PLSR model.

Ó 2013 Elsevier B.V. All rights reserved.

Introduction

Sandalwood oil, a volatile essential oil obtained by the steam

distillation of the dried wood from the trunk and roots of a small

hemi-parasitic tree, is an economically important product. The

main constituent of the oil (70–90%) is an alcohol predominantly

a – santalol and b – santalol. It also contains resins; santalenes

and many other minor sesquiterpenoids [1]. The a – santalol and

b – santalol are responsible for the unique sandalwood fragrance.

Sandalwood oil serves as a fixative for many perfumes. It is used

in cosmetic products and as a flavor component in many food

products. This essential oil is found to have antiviral anti-carcino-

genic and bactericidal activity. Hence the oil is used medicinally for

common colds, fever bronchitis, inflammation of the mouth and

pharynx, infection of urinary tract, liver and gall bladder com-

plaints and other maladies [2].

In recent years, almost all of the sandalwood oil traded interna-

tionally is called East Indian Sandalwood oil. It is distilled from the

trunk and roots of Santalum album (family – santalaceae). Santalum

1386-1425/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.

http://dx.doi.org/10.1016/j.saa.2013.06.076

⇑ Corresponding author. Tel.: +91 471 2531053; fax: +91 471 2530023.

E-mail addresses: [email protected] (S. Kuriakose), hubertjoe@gmail.

com (I.H. Joe).1 Present address: St. Thomas H.S.S., Pala, Kerala 686 575, India.

Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573

Contents lists available at SciVerse ScienceDirect

Spectrochimica Acta Part A: Molecular andBiomolecular Spectroscopy

journal homepage: www.elsevier .com/locate /saa

spicatum sandalwood and its oil are obtained from a small, wild

growing shrub. Although similar to East Indian sandalwood oil,

top notes of S. spicatum sandalwood oil are characteristically

different [3].

The Indian sandalwood oil (species – S. album) contains

71.2–90% of total santalol (46.6–59.9% of a – santalol and

24.6–30.1% of b – santalol and the sandalwood oil from S. spicatum

contains 38.7–46.1% total santalol [4] (27.9–35.3% of a – santalol

and 10.8% of b – santalol. The minimum level of total santalol con-

tent stipulated by regulatory agencies is nearly 90%. Sandalwood

oils having total santalol content around this level are considered

as superior or high quality oils and those much below this level

are considered as inferior or low quality oils.

Due to its sensory quality, extensive use and steep rise in the

price, sandalwood oil is subjected to two types of adulteration

[5,6]. The first is the blending of higher grade, high cost sandal-

wood oil with lower grade sandalwood oil (containing low percent

santalol content). The second is the mixing of sandalwood oil with

synthetic or semi synthetic substitutes such as sandaloreÒ. Adulte-

ration with low cost or low quality oils may afford considerable

profits from the economic point of view. It is a serious problem

for regulatory agencies, oils suppliers and can be a threat to the

health of consumers. The issue of oil authenticity is mainly related

to improper labeling and substitution, in part or whole; of cheaper

products for high cost ones. Low cost oil substitution has become

recently the main form of sandalwood oil adulteration (as per de-

tails collected from local vendors).

The determination of authenticity for sandalwood oils is tradi-

tionally a time – consuming and laborious process typically using

chromatographic methods like HPLC, and GC. Near infrared spec-

troscopy (NIRS) aided by chemometric methods can be an effective

analytical technique to verify the authenticity of oils due to its sim-

plicity, rapidity and easiness in sample preparation. NIRS has been

used in the quantitative measurement of adulteration of virgin ol-

ive oils by different vegetable oils or by olive pamace oils [7–10].

The results of the study by Christy et al. revealed that the models

could predict the adulterants, corn oil, sun flower oil, soya oil, wal-

nut oil and hazelnut oil involved in olive oil with error limits ±0.57,

±1.32, ±0.96, ±0.56 and ±0.57% weight/weight, respectively.

Although sandalwood oil has been used for centuries in cosmetics,

food and flavors, no studies other than ours evaluating its adulte-

ration level using NIRS [11,12] are found in the scientific literature

so far.

The aim of this study is to investigate the use of near infrared

spectroscopy combined with chemometric data analysis to address

the issue of oil authenticity and economic adulteration in sandal-

wood oil. Partial least square regression (PLSR), a factor based lin-

ear multivariate calibration method, is used to resolve the spectral

data into ‘loadings’ and ‘scores’ and to build the corresponding cal-

ibration model from the new variables. Also, locally weighted

regression (LWR) using local PLS algorithm (a nonlinear method)

is used to detect and quantify the adulteration with other low cost

oils in high quality, high cost sandalwood oils.

Materials and methods

Materials

Sandalwood oils of two different species (higher grade and low-

er grade) were procured. Indian sandalwood oil (S. album) was ob-

tained from Mysore Sandalwood Industries, Karnataka state, India.

This oil sample was highly concentrated, expensive and satisfies

the conditions stipulated by Indian regulatory authorities. The san-

dalwood oil of S. spicatum was sponsored by a local vendor. It was

thin, colorless, odorless and less expensive. Both the oil samples

were stored at 4 °C in aluminum bottles and protected from light

until they were analyzed. The oils were analyzed without any prior

treatment. The sandalwood oil of S. spicatum is chosen as the adul-

terant because of its year round availability and low cost as com-

pared to S. album. (This is as per details collected from local

vendors at ‘Marayoor’, a place in Idukki district, Kerala, India where

sandal trees are greatly cultivated and harvested).

Samples preparation

Prior to sample preparation and spectroscopic measurement,

both the oils are brought to ambient temperature (20 °C). Each

oil is diluted to 10% (v/v) with a solvent like carbon tetrachloride2

in separate flasks and homogenized using an electromagnetic

stirrer. Stringent protocols viz. normal temperature and pressure,

non-exposure to direct light, accurate measurement using very

precise instruments etc. have been applied to avoid any sample

loss or change during preparation. Samples are prepared by adding

percentile low cost oil (S. spicatum) in solvent with high cost oil

(S. album) in the same solvent. Relative S. spicatum oil fraction (%

v/v) in the samples varies from 0% to 100%. The samples prepared

represent sandalwood oil species, neat S. spicatum (taken as 100%

adulteration) and authentic, neat S. album (taken as 0% adultera-

tion) and a range of blended samples containing 1%, 5–95% of S.

spicatum oil in 5% increment according to volume. Thus a set of

22 classes of blended oil samples are prepared and are used for cal-

ibration (class 0: 0% S. spicatum and 100% S. album, class 1:1% S.

spicatum and 99% S. album, class 5: 5% S. spicatum and 95% S. album

and so forth until class 22: 100% S. spicatum and 0% S. album.). An

independent set of 22 classes of sandalwood oil samples collected

from vendors at different regions of ‘Marayoor’ is used for predic-

tion. The samples are labeled as ‘training set’ (n = 22) and ‘test set’

(n = 22) separately. All the samples are kept in glass bottles. All

measurement is carried out in 20 °C in air conditioned rooms.

Near infrared spectra acquisition

NIR spectra of 44 samples over 900–2000 nm spectral regions at

1 nm interval are recorded, for a total of 1100 wavelengths using a

NIR spectrophotometer JASCO, V – 570 (JASCO INTERNATIONAL

CO., Ltd., 4–21, Sennin–cho 2 – chome, Hachioji, Tokyo 193-0835,

Japan) with a resolution of 0.05 nm. The blending of sandalwood

oils is carefully carried out in a 4 mm quartz cuvette, starting with

pure sandalwood oil in the cuvette first. Oil samples at 20 °C are

scanned and the spectrum per sample (an average of eight scans)

is recorded. All the spectra are recorded in absorbance units.

Chemometrics

Since spectral data contain noise and extra information irrele-

vant to the calibration, a robust model is necessary to extract rel-

evant information for the prediction of adulteration percentage

(response variable). Several spectral data pre-treatments like first

and second derivatives using Savitzky–Golay algorithm, smoothing

(window pt – 15, order – 2), mean centre, autoscale, smooth-

ing + mean centre and smoothing + autoscale are explored and

the results are compared to assess the best pre-treatment and

regression model combination for resolving the oil authenticity.

PLSR [13] is performed using algorithm from PLS Toolbox 7.0.3

[14] supported by Matlab environment [15] to develop the model.

Two spectral approaches (An investigation on the significance of

combining some wavelengths (synergic), though not necessarily

2 Disclaimer: The authors disclaim any obligation, responsibility or liability which

concerns or relates to the handling of the chemical.

S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573 569

important by themselves, to that containing problem dependent

information (descriptive wavelengths) to improve the model per-

formance.) are studied to build the calibration models; full spec-

trum and a sequential spectrum in which 100 nm wavelengths

are added sequentially to the previous model to determine the best

optimal solution. The wavelengths are added ‘sequentially’ to the

previous model so as to check the improvement in the perfor-

mance of the new model over the just previous model and also

to determine the effective wavelength range of the best, reliable

model.

The effects of the data pre-treatment methods on the perfor-

mance of the PLS models are evaluated in terms of root mean

square error of calibration (RMSEC), how well the selected model

fits the calibration data; the root mean square error of prediction

(RMSEP), the error expected when the model is used in future pre-

dictions and the coefficient of determination for calibration (R2

cal.) and prediction (R2 pred., the relationship between measured

and predicted values.

Generally, the NIR spectral data contain both linear and nonlin-

ear information. The PLSR model extracts only the linear informa-

tion from the spectral data. The nonlinear method, LWR (locally

weighted regression) is added to extract nonlinear information

and to compare with the PLSR model.

LWR [15] calculates a single locally weighted regression model

using the given number of principal components to predict a

dependent variable Y, adulteration percentage from a set of inde-

pendent variables X, spectral data. LWR model is useful for per-

forming predictions when the dependent variable y, has a

nonlinear relationship with the measured independent variables,

x. This model works by choosing a subset of the calibration data

to create a local model for a given new sample. Once the samples

are selected, one of the three algorithms namely global principal

component regression/principal component regression/partial

least square regression is used to calculate the local model. In this

study, the raw data of the selected samples are used to create a

weighted PLS model (Local PLS). The models are more adaptable

to highly varying nonlinearity.

Results and discussion

NIR spectra of sandalwood oils

The absorbance spectra for pure and blended mixtures of

S. album and S. spicatum (used as adulterant) over the spectral

range 900–2000 nm are shown in Fig. 1. (Spectra of 22 sample clas-

ses with different relative fractions 0–100% v/v of S. spicatum in S.

album oil).

Overall, a lower absorbance is visible for samples containing

100% S. album where as the higher absorbance values represent

samples containing 100% S. spicatum oil. As the percentage of adul-

teration increases, the sample absorbance also increases at every

wavelength.

According to our former studies performed on sandalwood oils,

the near infrared spectra (900–2000 nm) of the analyzed oil sam-

ples are dominated by overtones and different combinations of

C–H stretching and bending vibrations [11,12]. Fig. 2 shows the

representative spectra of pure and neat oil of S. album and adulter-

ant oil.

It is observed that the peaks are present at 1152 nm, 1205 nm

(due to adulterant only), 1365 nm, 1411 nm, 1650 nm, 1693 nm,

1712 nm, 1755 nm (due to adulterant only), 1836 nm and

1891 nm. The 1152 and 1205 nm are related to methylene, methyl

and ethenyl C–H stretch (2nd overtone bands); the 1365 and

1411 nm are incorporated to methylene, methyl C–H stretch and

bending combination (2nd overtone and 2n combination

bands);the 1650, 1693, 1712, 1755 and 1836 nm are attributed

to, methylene, methyl and ethenyl asymmetric stretching (1st

overtone C–H stretch bands);the 1891 nm is a combination band

from moisture (H2O), but has poor information for the character-

ization of oils [16,17]. Hence this peak 1891 nm is excluded from

the wavelength selection. When any of these specific peaks alone

is targeted for calibration, the performance of the models is seen

worse. This means that the information incorporated into the mod-

els is due to the combined effect of wavelengths that contain rele-

vant information. This requires the necessity of applying a ‘spectral

approach’ for selecting effective wavelength range.

Partial least square regression (PLSR)

The orthoganality property of the latent variables removes the

collinearity among spectral variables without eliminating spectral

information.

Fig. 1. NIR absorbance spectra of 22 classes of sandalwood oils – relative fraction

0–100% v/v of S. spicatum in S. album oil. Samples are named according to % of

adulteration of S. spicatum oil. (0% = 0% S. spicatum and 100% S. album; 1% = 1% S.

spicatum and 99% S. album; 100% = 100% S. spicatum and 0% S. album; 5–95%

according to 5% increment in adulteration percentage of S. spicatum). The bottom

spectrum represents 0% class and the topmost spectrum represents 100% class. The

intermediate spectra represent 1%, 5–95% classes. (1836 nm indicates biomarker

peak).

Fig. 2. Representative spectra of neat high quality oil of S. album and adulterant oil.

570 S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573

Spectral approach for wavelength selection and data pre-treatment

Two approaches are investigated in each data pre-treatment

(none or untreated data, 1st and 2nd derivatives, smoothing

(Savitzky–Golay), mean centre, autoscale, smoothing (window –

15 pt, order – 2) + mean centre and smoothing + auto scale) to

observe wavelength relevance on model prediction. The ‘full spec-

trum approach’ uses the full wavelength region, 1850–1100 nm

(750 data points). In the ‘sequential spectrum approach’, a

100 nm region is added to the previous consecutive spectral region

to determine a new model. However, the ‘‘sequential spectrum’’

method seems to be identical to Interval PLS (iPLS), it is quite dif-

ferent from the standard iPLS approach [18]. The first model is

developed based on 1850–1800 nm wavelength range. The

remaining models are developed by adding a 100 nm wavelength

range at a time, starting from the higher wavelength side since

prominent peaks are seen there. Hence the second model corre-

sponds to 1850–1700 nm; the third corresponds to 1850–1600

nm etc. until the 8th model which uses the 1850–1100 nm range.

The wavelength region (region of spectra having the strongest cor-

relations with the property of interest) corresponding to ‘the mod-

el’ (out of the final 8 models developed) that produces the least

errors (RMSEC and RMSEP) is selected and given much significance.

Therefore, the spectral region 1850–1800 nm (containing the bio-

marker peak 1836 nm) is chosen as the best wavelength region

for this study as is observed and confirmed in the results and dis-

cussion. If 100 or 50 wavelength region is added starting from the

shorter wavelength side, the performance of the models is compar-

atively low.

Optimum number of latent variables in model development

If the entire set of latent variables is used, there is no clear distinc-

tion between structure part (problem–dependent information) and

the noise. Hence to determine the optimum number of latent vari-

ables that is to be used in the regressionmodel, the variance ‘RMSEC

vs. number of latent variables’ is evaluated. The number of latent

variables corresponding to minimum RMSEC or maximum percent

variance is selected as the optimum number of latent variables.

Quantification of oil authenticity and adulteration

The results based on RMSEC and RMSEP of all the models ((8

models for 8 partitioned spectra � 7 pre-treatments) + 8 models

for untreated data = 64 models) of the various data pre treatments

are prepared and compared (not shown).

Table 1 shows the results of the models with least errors, opti-

mum number of latent variables and maximum coefficients of

determination for both full spectrum and partitioned spectrum of

each pretreatment. The first row of the table indicates the results

of the untreated data.

The model with the least errors in terms of RMSEC and RMSEP

of each data pre-treatment for full spectrum and partitioned spec-

trum is then compared with the corresponding results of the

untreated data (see Table 1). The first and second derivatives do

not improve the model performance when the RMSEC and RMSEP

values are compared to those collected from the untreated data.

In general, the values of the RMSEC of the models range from

0.00009% v/v (15 pt smoothing + mean centre of a model using

1850–1800 nm range) to 0.04435 (2nd derivative using the

1850–1600 nm range). Selecting an effective data pre-treatment is

problem dependent and depends on the wavelength range used in

the model calibration. If the full spectrum 1850–1100 nm is used,

the mean centre (RMSEC = 0.00028) performs almost equally with

the smoothing + mean centre (RMSEC = 0.00029) at 1850–1100

nm range, but higher than none or the untreated data (RMSEC =

0.00079). Here the number of latent variables required is 3. If the

sequential/partitioned spectrum is used, the lowest RMSEC

(0.00009) across the spectrum is generated using smoothing. (15

pt, order 2) + mean centre at a wavelength range of 1850–

1800 nm. Here the optimum number of latent variables is only 2.

The RMSEP of 15 pt smoothing + mean centre data pre-treat-

mentmethod shows a similar trend to that observed in correspond-

ing RMSEC. For the full spectrum (1850–1100 nm), the mean centre

method produces aminimumRMSEP of 0.00049. The lowest RMSEP

of 0.00016% v/v is observed using smoothing + mean centre for the

sequential spectrum in the 1850–1800 nm wavelength range. The

first and the second derivativemethods in this study performworse

where as all other pre-treatment methods generate almost equally

but comparatively low RMSEP values in all regions.

Table 1 also shows the ‘best model’ performance in terms of

RMSEC, RMSEP and the coefficient of determination (R2 cal. or R2

pred.).

For the full spectral approach at 1850–1100 nm, the mean cen-

tre pre-treatment provides the best model performance in terms of

RMSEC (0.00028), RMSEP (0.00049) and the number of latent vari-

ables (3) in the model. The R2 prediction value is 0.99894. The best

model for the sequential spectral approach is generated using

smoothing + mean centre in the 1850–1800 nm wavelength ranges

in terms of RMSEC (0.00009%), RMSEP (0.00016%), and the number

of latent variables (2). The R2 prediction is 0.99989. (Downey et al.

[9] could predict the adulterant content in extra virgin olive oils

from the eastern Mediterranean only with a standard error equal

to 0.8% w/w using 1st derivative data between 1100 and

2498 nm. Of the spectroscopic methods studied on oil adulteration

by Yang et al. [10] the selected Fourier transform-Raman spectros-

copy could give a correlation coefficient (R2 prediction) of 0.997

and a standard error of prediction of 1.72%.) As observed, the first

and second derivative methods do not have a better model perfor-

mance for both the approaches. This exception aside, the models

using sequential/partitioned spectrum produce lower RMSEC and

RMSEP values than those using the full spectrum (see Table 1). This

phenomenon suggests that the addition of synergic wavelengths

with predictive wavelengths enhances sample authenticity and

model performance. Even though the untreated, smoothing, mean

centre, autoscale, and smoothing + autoscale pre-treatment meth-

ods generate lower RMSEC and RMSEP values on the full spectrum

i.e. 1850–1100 nm range, the 15 pt, order 2 smoothing + mean cen-

tre data pre-treatment on the sequential spectrum 1850–1800 nm

range is preferred because it generates the lowest RMSEC by using

minimum number of factors (2) in order to obtain the lowest

RMSEP (% v/v) with the highest R2 pred. (0.99989) value.

To ensure and verify that the ‘bestmodel’ chosen (NIR sequential

spectrum in the 1850–1800 nmwavelength range using smoothing

(15 window pt, order – 2) + mean centre data pre-treatment) is able

to describe and predict the newdata, the residual variancemust also

be considered. To determine the optimal number of latent variables

to be included in the best model, the lowest RMSEC value is consid-

ered. Accordingly, for this particularmodel, the optimumnumber of

latent variables is 2. The cumulative variance for the first two latent

variables is 99.99%; LV1 explains 99.72% and LV2 explains 0.27%.

Bias = 0; Prediction Bias = ÿ3.4003eÿ005.

Fig. 3 confirms the accuracy and robust performance of the best

model.

The coefficient of determination is the intensity measure of the

correlation between the measured values and the values predicted

by the chosen model. The model presented here has a very good

correlation with R2 pred. equal to 0.99989, showing a good linear

fit (the closer the R2 pred. value to +1, the higher the correlation

and hence the accuracy).

Locally weighted regression (LWR)

LWR is performed on the NIR spectral data in the 1850–1800

nm wavelength range by using smoothing (15 window pt, order


2) + mean centre data pre treatment. 22 samples are taken as train-

ing set and another 22 independent samples (new raw data) as test

set. The model is built from local PLS algorithm. The number of

principal components used in the model is 2 since it provides the

lowest RMSEC value; the number of similar samples selected from

the calibration set for the local regression model is 7 because of the

maximum percent variance and minimum errors when 7 samples

are selected.

When the model is built, the first two principal components ac-

count for 99.99% of the total variance; component 1 explains

99.75% and component 2 explains 0.24%. The residual variance

RMSEC = 0.00004 (R2 cal. = 0.99999). Bias: ÿ2.31857eÿ006; Pre-

diction Bias:ÿ4.79679eÿ005.

The estimated Y values from the LWR model are basically iden-

tical to those of the PLS model. The low RMSEP (0.00022) and high

R2 (0.99982) indicate that the results of the LWR model are also

reliable.

Comparison of models developed

On closer inspection of both the models (PLSR and LWR),

in terms of RMSEC, RMSEP, concerned R2 values and reduced

complexity, the ‘best model’ to detect sample authenticity is the

PLSR model that uses sequential/partitioned spectrum approach

i.e. 1850–1800 nm wavelength range with biomarker peak

1836 nm using smoothing (window pt 15, order – 2) + mean centre

as data pre-treatment. The ‘best model’ developed in the present

study to detect and quantify adulteration in oils achieves a com-

plete accuracy in terms of evaluation tools better than the most

successful models available in scientific literatures.

Conclusion

Adulteration of high quality and high valued essential oils (for

example, sandalwood oils) is common problem affecting the qual-

ity and commercial value of the product. The overall results of this

study demonstrate the feasibility of utilizing NIR spectroscopy

associated to chemometric techniques to detect sample authentic-

ity and economic adulteration of sandalwood oils.

In the present research, a spectral data approach ‘full or sequen-

tial/partitioned spectrum’ is used in generating the robust regres-

sion model. The mean centre data pre-processing generates a low

error for full spectrum (1850–1100 nm) approach (RMSEC =

0.00028 and RMSEP = 0.00049) and the highest coefficient of deter-

mination (R2 = 0.99894). Using the sequential/partitioned spectral

approach, a data set containing the 1850–1800 nm wavelength

region with the smoothing (Savitzky–Golay) + mean centre pre-

treatment generates the ‘best model’ to determine sample authen-

ticity and to quantify the adulteration with very much less than

1% error (RMSEC = 0.00009%, RMSEP = 0.00016%, R2 = 0.99989.).

The nonlinear method LWR for multivariate y using the local

PLS algorithm is employed to detect and quantify the adulteration

and to compare with the PLS model. This nonlinear model is also

stable, defined by two principal components with low errors and

high prediction values.

In brief, the results from this study (Figs. 1–3, and Table 1 can be

chosen as a basis for a new exercise) indicate that it is possible to

detect sample authenticity and counterfeits by exploring near

infrared spectral data in the wavelength range 1850–1800 nmwith

prominent biomarker peak 1836 nm that are attributed to methy-

lene, methyl and ethenyl asymmetric stretching (1st overtone C–H

stretch bands) of sandalwood oil or any other essential oil samples.

Acknowledgements

The authors wish to thank the efforts of Mr. Joseph Kuriakose,

Bloomfield, Gillen, Alice spring, NT 0870, Australia for his contribu-

tions to this manuscript. The authors are also grateful to Mr. Sony

George, Research scholar, Department of Photonics, Cochin Univer-

sity of Science and Technology (CUSAT), Kochi, Kerala, India for the

technical assistance and help. The authors also express their sin-

cere thanks to Manonmaniam Sundaranar University, Thirunelveli,

Table 1

Model performance for full and sequential spectrum wavelength data for each data pre-treatment. Best models are in bold. Feasibility of using near infrared spectroscopy to

detect and quantify an adulterant in a high quality sandalwood oil.

Data pretreatment Full spectrum Partitioned/sequential spectrum

Wavelength

range (nm)

aL V bRMSEC (%v/v)d(R2 cal.)

cRMSEP (%v/v)e(R2 pred.)

Wavelength

range (nm)

aL V bRMSEC (%v/v)

(R2 cal.)

cRMSEP (%v/v)

(R2 pred.)

None/untreated 1850–1100 3 0.00079 (0.99635) 0.00131 (0.99194) 1850–1700 3 0.00025 (0.99964) 0.00021 (0.99987)

1st Deri. 1850–1100 3 0.04380 (0.20036) 0.04379 (0.25413) 1850–1600 3 0.04380 (0.21798) 0.04379 (0.26757)

2nd Deri. 1850–1100 3 0.04435 (0.20736) 0.04433 (0.26782) 1850–1600 2 0.04435 (0.21672) 0.04433 (0.26506)

Smoothing 1850–1100 4 0.00079 (0.99636) 0.00132 (0.99193) 1850–1800 2 0.00062 (0.99988) 0.00069 (0.99987)

Mean centre 1850–1100 3 0.00028 (0.99953) 0.00049 (0.99894) 1850–1800 2 0.00010 (0.99993) 0.00017 (0.99987)

Autoscale 1850–1100 3 0.00032 (0.99939) 0.00049 (0.99896) 1850–1800 2 0.00011 (0.99993) 0.00018 (0.99986)

Smoothing + mean

centre

1850–1100 3 0.00029 (0.99952) 0.00050 (0.99893) 1850–1800 2 0.00009 (0.99995) 0.00016 (0.99989)

Smoothing + autoscale 1850–1100 3 0.00032 (0.99939) 0.00053 (0.99891) 1850–1800 2 0.00013 (0.99992) 0.00019 (0.99980)

a LV – Latent Variables.b RMSEC – Root Mean Square Error of Calibration.c RMSEP – Root Mean Square Error of Prediction.d R2 cal. – Coefficient of Determination for Calibration.e R2 pred. – Coefficient of Determination for Prediction.

Fig. 3. Measured versus predicted adulteration levels; PLSR model with 2 factors

showing the prediction data (pre treatment smoothing + mean centre, spectrum

1850–1800 nm range).

572 S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573

Tamil Nadu, India for having given an opportunity to do the

research.

References

[1] C.G. Jones, E.L. Ghisalberti, J.A. Plummer, E.L. Barbour, Phytochemistry 67(2006) 2463–2468.

[2] P.D.R. Herbal, Sandalwood, Santalum album, PDR for Herbal Medicine, third ed.,Medical Economics Company, Montvale, NJ, 2004.

[3] Anonymous, Sandalwood, in: C. Dombek (Ed.), The Review of NaturalProducts, Wolters Kluwer Co., St. Louis, MO, 1993.

[4] B.M. Lawrence, Recent progress in essential oils, Perfum. Flavor. 16 (1991) 49–58.

[5] D.P. Anonis, Sandalwood and sandalwood compounds, Perfum. Flavor. 23(1998) 19–20.

[6] R.E. Naipawer, in: B.M. Lawrence, B.D. Mookherjee, B.J. Willis (Eds.), Flavorsand fragrances: A world Perspective Proceedings of the 10th InternationalConference of Essential Oils, Flavor fragrance, Elsevier, Amsterdam, 1988, p.805.

[7] A.A. Christy, S. Kasemsumran, Y.P. Du, Y. Ozaki, Anal. Sci. 20 (2004) 935–940.[8] I.J. Wesley, R.J. Barnes, A.E.J. Mc Gill, J. Am. Oil Chem. Soc. 72 (1995) 289–292.[9] G. Downey, P. Mc Intyre, A.N. Davies, J. Agric. Food Chem. 50 (2002) 5520–

5525.[10] H. Yang, J. Irudayaraj, J. Am. Oil Chem. Soc. 78 (2001) 889–895.[11] S. Kuriakose, T. Xavier, Hubert Joe, V. Venkataraman, Analyst 135 (2010) 2676–

2681.[12] S. Kuriakose, Hubert Joe, Food Chem. 135 (2012) 213–218.[13] R.G. Brereton, Chemometrics Data Analysis for the Laboratory and Chemical

Plant, Wiley, Chichester, 2003.[14] B.M. Wise, N.B. Gallagher, R. Bro, J.M. Shaver, PLS Toolbox 7.0.3, Eigenvector

Research, 2013.[15] MATLAB 7.10.0.499 (R 2010a) MathWorks Inc., Natick, USA 2010.[16] B.G. Osborn, T. Fearn, P.H. Hindle, Practical NIR Spectroscopy with Applications

in Food and Beverage Analysis, in: Conference, second ed., Longman Scientificand Technical, Harlow, Essex, UK, 1993.

[17] J. Workman Jr., L. Weyer, Practical Guide to Interpretive Near-InfraredSpectroscopy, CRC Press, Boca Raton, 2007.

[18] Norgaard et al., Appl. Spec. 54 (3) (2000) 413–420.


Detection and quantiﬁcation of adulteration in sandalwood...

Documents

Transcript of Detection and quantiﬁcation of adulteration in sandalwood...