Detection and quantification of adulteration in sandalwood...
Transcript of Detection and quantification of adulteration in sandalwood...
Detection and quantification of adulteration in sandalwood oil through nearinfrared spectroscopy†
Saji Kuriakose,‡a Xavier Thankappan,a Hubert Joe*a and Venkateswaran Venkataramanb
Received 23rd April 2010, Accepted 21st June 2010
DOI: 10.1039/c0an00261e
The confirmation of authenticity of essential oils and the detection of adulteration are problems of
increasing importance in the perfumes, pharmaceutical, flavor and fragrance industries. This is
especially true for ‘value added’ products like sandalwood oil. A methodical study is conducted here to
demonstrate the potential use of Near Infrared (NIR) spectroscopy along with multivariate calibration
models like principal component regression (PCR) and partial least square regression (PLSR) as rapid
analytical techniques for the qualitative and quantitative determination of adulterants in sandalwood
oil. After suitable pre-processing of the NIR raw spectral data, the models are built-up by cross-
validation. The lowest Root Mean Square Error of Cross-Validation and Calibration (RMSECV and
RMSEC % v/v) are used as a decision supporting system to fix the optimal number of factors. The
coefficient of determination (R2) and the Root Mean Square Error of Prediction (RMSEP % v/v) in the
prediction sets are used as the evaluation parameters (R2¼ 0.9999 and RMSEP¼ 0.01355). The overall
result leads to the conclusion that NIR spectroscopy with chemometric techniques could be successfully
used as a rapid, simple, instant and non-destructive method for the detection of adulterants, even 1% of
the low-grade oils, in the high quality form of sandalwood oil.
Introduction
Essential oils are complex mixtures of various terpenoids, alde-
hydes, ketones, alcohols, esters and other aromatic substances.
Most of the oils are used for flavoring of foodstuffs, in perfume
compositions or in mouth care products.1 Some essential oils
containing phenol content are also used in phyto-pharmaceutical
products or as additives relating to antibiotic properties.
Sandalwood oil is a volatile essential oil obtained by steam
distillation of the dried wood from the trunk and roots of the
plant Santalum album L (Indian sandalwood) (Kingdom –
Plantae, Class – Magnoliopsida, Family – Santalaceae, Genus –
Santalum L). This oil appears as a pale yellow/yellow liquid with
a characteristic soft, warm, woody odor and a slightly bitter
resinous taste (FCC 2003). Sandalwood oil is used as a flavor
ingredient, with a daily consumption of 0.0074 mg/kg and as an
adjuvant in the food industry. In perfumery also, it is used
extensively. The heartwood of mature trees (>10 years old)
contains oils whose main constituents are sesquiterpene alcohols,
cis-a-santalol, cis-b-santaol etc.2 This oil is approved for food
usage by the United States Food and Drug Administration
(FDA), Flavor and Extract Manufactures Association (FEMA)
and Council of Europe (COE).3,4
It is identified that sandalwood oil consists of more than
100 constituents. The a-santalol (�$60% of total santalol) and
b-santalol (�$33% of total santalol) are mainly responsible for
the odor depending on the sourced species,5 although 2-furfuryl
pyrrole may also contribute.6 It also contains sesquiterpene
hydrocarbons (�60%)7 that are mostly a-santalene, b-santalene,
epi-b-santalene, as well as a-curcumene, b-curcumene, g-curcu-
mene, b-bisabolene and a-bisabolol.8 The other constituents
reported are dihydro-b-agarofuran, santene, teresantol, borneol,
teresantalic acid, tricycloekasantalal, santalone and santanol.9
Three new neolignans and a new aromatic ester have been iso-
lated from the heartwood of S. album L recently.10
Sandalwood oil and its major constituents have short sensitive
oral and dermal toxicity in laboratory animals. Sandalwood oil is
found to have antiviral, anticarcinogenic and bactericidal
activity. It is also not mutagenic in spore Rec assay.11 Sanskrit
manuscripts reveal that sandalwood has been in use for over
4000 years. The commercial use of sandalwood oil in the USA
began in the early 1800s.
Due to its sensory quality, extensive use, and steep rise in the
price, sandalwood oil is often adulterated with low grade cost-
effective oils and synthetic or semi-synthetic substitutes such as
SandaloreÒ.12,13 Adulteration of sandalwood oil is a serious
problem for regulatory agencies, oil suppliers, and a threat to the
health of consumers. Substitution and synthetic additives would
influence the chemical composition and physical properties of the
oil; these factors may affect oil quality and the allergic potential.
The common adulterants reported include castor oil, cedarwood
oil and low-grade oil from ‘sandalwood’ species other than
S. album.12,14 The most common adulterant is the castor oil
(botanical name Ricinus communis of the family Eurphorbiacae).
Various authorities have recommended that the oil from
S. album should not contain less than 90% w/w of (free) alcohols,
aCentre for Molecular and Biophysics, Department of Physics, Mar IvaniosCollege, Thiruvananthapuram, 695 015, Kerala, India. E-mail: [email protected]; [email protected]; Fax: +91 471 2530023; Tel: +91 4712531053bCentral Electronics Engineering Research Institute, Chennai Centre,CSIR Complex, India
† Electronic supplementary information (ESI) available: supplementarytable of percentages of castor oil and sandalwood oil in 0–100%adulterated mixtures. See DOI: 10.1039/c0an00261e
‡ Permanent address: St Thomas H.S.S., Pala, Kerala, India.
2676 | Analyst, 2010, 135, 2676–2681 This journal is ª The Royal Society of Chemistry 2010
PAPER www.rsc.org/analyst | Analyst
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
calculated as santalols.14–17 The acetylation methods18,19
described to asses the santalol content of sandalwood oil gener-
ally lack specificity and accuracy. More recently, the ISO (2002)
has suggested the analysis of S. album oil using gas chromatog-
raphy (GC). However, these reports do not address the detection
of adulterants. Taking into consideration the above facts, there is
an increasing demand for the development of a new, rapid, and
non-destructive method instead of traditional, time consuming
and expensive analysis techniques. Until this date, there is no
standard method that has been explored or reported, for finding
out the adulteration of sandalwood oil.
The application of near infrared (NIR) spectroscopy
combined with chemometric techniques is a relatively new
approach to determine authenticity and to quantify the adul-
teration of essential oils. Recent reports reveal that NIR spec-
troscopy along with chemometrics is widely applied for the rapid
quantitative analysis of a wide range of vital constituents in food
and agricultural products.20 Oliveira et al. proposed partial least
square regression calibration models based on Fourier Trans-
form NIR measurements to evaluate the quality of hydrated
ethyl alcohol fuel and to detect its adulteration with methanol.21
Christy et al. studied NIR spectroscopy to detect and quantify
adulteration of olive oil with soybean, sunflower, corn, walnut
and hazelnut oils.22 Bewig et al.23 and Chen et al.24 used the NIR
spectral profiles to predict quality parameters of vegetable oils.
Multivariate analyses like Principal Component Regression
(PCR)25 and Partial Least Square Regression (PLSR)26,27 have
been applied to NIR spectrometry for quantitative analysis to
extract vital information through non-destructive methods.28
In the present study, both PCR and PLSRmethods are applied
to NIR spectra of pure sandalwood oil and oil adulterated with
various proportions of castor oil. These two multivariate tech-
niques could provide better accuracy, precision and significantly
more information in considerably less time than previous data
analysis methods. To the best of our knowledge, there is no
attempt other than this till now to use near infrared spectroscopy
(NIRS) along with multivariate regression methods for esti-
mating the quantity of adulterants viz. castor oil in sandalwood
oil.
Sample collection and experimental analysis
Samples
Pure sandalwood oil and castor oil (batch nos. 5BB 801622, 5BB
900502, 231) were procured from Khadi Gramodyog Bhavan,
Khadi and Village Industries Commission, Govt. of India.
Source of procurement from Govt. of India food and oil regu-
latory agencies ensures the authenticity of samples.
Chemicals
The solvent (carbon tetrachloride) used in this study was
obtained from Merck. The reagent used is analytical grade
without further purification.
Instrumentation
UV/VIS NIR Spectrophotometer of Cary 5000 (Sl. No:
EL03127331, www.varianinc.com) with a Pbs detector,
wavelength range from ca. 175–3300 nm and 0.01 nm resolution
is used to capture the spectra. A quartz window of 1 mm path
length Camloc cell is used as a sample holder. Serial port
communication is used to capture the raw spectral data. The
monochromator and sample compartments have separate
nitrogen purging capabilities, allowing the sample compartment
to be purged at a higher rate than the instrument.
Sample preparations
The samples are stored in hermetically sealed aluminum bottles
in the dark at 4 �C. The samples are brought to ambient
temperature of 20 �C eight hours prior to measurement. Using an
electromagnetic stirrer, the sandalwood and castor oil samples
are homogenized with proper solvent (1 : 10 v/v) for 15 min in
two separate conical flasks with stoppers. Proper precautions are
taken to avoid loss/change during the process. Samples are
prepared by adding percentile standard low-grade oil in solvent
with standard sandalwood oil in the same solvent. The relative
castor oil fraction (% v/v) in the samples varies from 0 to 100%
(refer to the supplementary table provided†). The oil samples are
blended under normal temperature and pressure. Thus, a set of
56 samples ranging from 0 to 100% (v/v) percentile is prepared.
Out of these, 45 samples are used for calibration and an inde-
pendent set of 11 samples with percentage ranges 0, 1, 5, 8, 12, 20,
25, 50, 70, 85, and 90% are used for prediction respectively.
The samples are labeled as ‘calibration set’ and ‘prediction set’
separately. To ensure a wide range of coverage, proper care is
adopted as norms set by the chemical sample-preparation
procedure. All the samples are kept in glass bottles and stored in
the dark at 3–4 �C. All measurements are carried out at 20 �C in
closed rooms.
Spectral acquisition
Thirty-two scans are performed at 1 nm intervals within the
wavelength range of 700–2200 nm to capture the spectra. The
time to acquire scans is approximately 28 s. The mean spectrum
is computed from the collected data. Background spectra with
reference sample are collected for every sample immediately
before the collection of the sample single-beam spectrum. The
sample spectrum is automatically ratioed against the background
spectrum and that spectrum is automatically stored in the
computer. The spectral data are transformed into ASCII format
by Varian software equipped with the spectrometer. In the
experiment, all of the spectra are recorded in absorbance mode.
During the experiment, the sample cell components are cleaned
with hexane. Thereafter with warm water, rinsed with deionised
water and then with CCl4 at room temperature to avoid oil build-
up on the cell windows. Components are dried using tissue paper.
During experimentation, the quartz cell is dried by exposing it to
a natural source of light to avoid any water film stuck on it
during washing since the presence of OH-groups will influence
the shape of the spectra and hinder the spectral features.
Calibration and quantitative analysis are performed using
PLSR and PCR methods. The root mean square error of cross-
validation (RMSECV) values are calculated for each factor with
the ‘leave-one-out’ cross-validation to determine the optimal
number of factors to be included in the calibration model.
This journal is ª The Royal Society of Chemistry 2010 Analyst, 2010, 135, 2676–2681 | 2677
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
Chemometrics
Data analysis. Chemometric analysis including detection and
quantification are performed with a Pentium 4 Laptop computer
utilizing PLS Toolbox 5.8.1, March 2010 (The Eigen vector Inc.)
that works under Matlab 7.0.1 environment (Mathworks,
Natick, USA). Detection is performed by principal component
regression technique. Quantification of castor oil adulteration
levels is calculated by partial least square regression. These
involve a calibration step in which the relationship between
spectra and component concentrations is estimated from a set of
reference (measured) samples and a prediction step in which the
results of the calibration are used to estimate the component
concentrations from an unknown sample spectrum.28
Principal component regression (PCR). PCR is the combina-
tion of Principal Component Analysis and Multiple Linear
Regression. Through PCA, the larger number of variables is
reduced to have real contributed components called principal
components that contain most information.29 This is a well-
known technique of multivariate analysis.30–33 In the second step
of PCR, a multiple linear regression is performed on the scores/
loadings obtained in the PCA technique.
Partial least square regression (PLSR). PLSR is another well-
known regression technique for multivariate data, principally
applied for prediction.34 This method is especially useful when
(i) the number of predictor variables is similar to or higher than
the number of observations and (ii) predictors are highly corre-
lated. This tool is applicable when there is partial knowledge of
data, an example being the measurement of protein in wheat by
NIR spectroscopy. The interference and overlapping of the
spectral information may be overcome by PLS techniques to
a certain extent. PLS is a method that uses the full spectral region
selected and is based on the use of latent variables.
Model selection. The model is built by cross-validation method
during the calibration developments. The optimum number of
principal factors can be selected by cross-validation, employing
the cancellation of one sample at a time. This is done by plotting
the number of factors against the root mean square error of
cross-validation (RMSECV) and from this, the optimum number
of factors is selected28,35 for both PCR and PLSR models.
The best model selected is used to determine the concentration
of the samples in the independent prediction set. The relative
performance of the established model is accessed by the root
mean square error of calibration (RMSEC), RMSECV and
multiple coefficient of determination or regression coefficient.
(R2).36 The predictive ability of the model is evaluated from the
root mean square of prediction (RMSEP).37 The lower the
RMSEP value, the higher the degree of accuracy of the predic-
tion result provided by the calibration model.38
Modeling and data pre-processing are carried out using PLS
toolbox 5.8.1, Eigenvector Research39 supported on Matlab.40
The NIRS data from the spectrometer may contain background
information and noise in addition to sample information. Hence,
to obtain reliable, accurate and stable calibration models, it is
necessary to pre-process spectral data before modeling. The pre-
processing methods, in this study, are chosen based on prior
knowledge for each spectroscopic technique combined with
different permutations.31,37,41
Results and discussion
NIR spectra
Fig. 1 shows the average response of the acquired NIR absorp-
tion spectra for pure and blended mixtures of sandalwood oil
over the spectral range of 700–2200 nm at 1 nm spacing. (Spectra
of 45 samples with different relative fractions 0–100% (v/v) of
castor oil in clean sandalwood oil.)
It could be observed that the oil spectra are nearly identical
which makes the calibration problem non-trivial. However, there
are a few subtle but systematic differences in these spectra that
might be amplified by various pre-processing techniques. Fig. 2
shows the NIR spectra of pure sandal oil and castor oil.
Spectra investigation
According to former studies performed on various essential oils,
the NIR spectra of the analyzed oil samples are dominated by
overtones and different combinations of CH stretching and
bending vibrations occurring between 1000 and 2498 nm.42 There
has been much debate as to the importance of finding those
wavelengths that contain significant information, thus reducing
the number of wavelengths, variables, and model complexity. In
this work, the spectral region 700–2200 nm is selected to reduce
the number of insignificant variables and hence the model
complexity.
From Fig. 2, it is observed that the peaks are present at
1179.78, 1387, 1693, 1730, and 1861 nm. Absorption bands
observed at 1179 nm are due to methylene (CH) stretching [2nd
(3n) overtone and 2n combination bands (1135–1215 nm)]. The
peak at 1387 nm was related to methyl (CH) stretching and
bending combination [2nd (3n) overtone and 2n combination
Fig. 1 NIR spectra of pure and blended mixture of sandalwood oil
(at 20 �C; relative castor oil fraction 0–100% (v/v) in pure sandal oil; the
top spectrum represents 0% adulteration, the bottom spectrum represents
100% adulteration and 1–99% adulterations are in order from top to
bottom).
2678 | Analyst, 2010, 135, 2676–2681 This journal is ª The Royal Society of Chemistry 2010
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
bands (1375–1399 nm)]. The two peaks around 1693 and
1730 nm are associated with methyl and methylene asymmetric
stretching respectively. The peak centred near 1861 nm is unique
to molecular water (OH) combination vibration.43,44 The report
reveals that the 1st (2n) overtone CH stretching bands are at
1690–1695 nm and 1725–1731 nm.
Spectral pre-processing
The data set is loaded as a matrix with X – block and Y – block.
Most of the peaks are observed in the wavelength range
1100–1900 nm. The changes in the spectral regions
1350–1450 nm and 1550–1850 nm are exploited since significant
differences between NIR spectra of sandal and castor oils are
observed in this region. Hence, more emphasis is given to this
region for extracting required information through the optimal
calibration model.
In this study, several spectral pretreatments including auto-
scale, mean centre, none (without pre-processing), multiplicative
scatter correction (msc) and smoothing (Savitzky–Golay filters)
coupled with autoscale and with mean centre are investigated
(see Table 1). The root mean square error of calibration
(RMSEC), the root mean square error of cross-validation
(RMSECV), the root mean square error of prediction (RMSEP)
and the coefficient of determination (R2) are used to investigate
the methods and for model development.
The qualities of the results are compared using RMSEC/
RMSECV, RMSEP and R2 values. Since the Savitzky–Golay
method (window 15 pts, order 2) coupled with autoscale
produced the lowest RMSEC/RMSECV and RMSEP values and
the highest R2 value, this pre-processing method is chosen as the
best. Other pre-processing methods have not yielded good result
for this application since these produced comparatively high
RMSEP/RMSECV values (low values yield good results) and
low R2 values (high value is good).
Calibration and cross-validation
Optimum number of components. A calibration and quantita-
tive analysis is performed using PCR and PLS methods. Forty-
five samples are used to develop the calibration and eleven
independent samples are used as a prediction set for both
methods. To determine the optimal number of factors to be
included in the calibration model, the RMSECV/RMSEC values
are calculated using the Leave-One-Out (LOO) cross-validation.
The number of principal components (PCs)/latent variables
(LVs) to be used in each case is determined by the lowest
Fig. 2 NIR spectra of pure sandal oil and pure castor oil (as an adul-
terant).
Table 1 RMSECV/RMSEP values for PCR and PLS with variouspre-processing methods
Pre-processing method
RMSECV/RMSEP (% v/v)
PCR PLSR
Msc (mean) 0.0906/0.07791 0.0993/0.0700Autoscale 0.03445/0.03170 0.03440/0.0315Mean centre 0.03426/0.03158 0.03420/0.03148None 0.03378/0.02935 0.03371/0.02951Smoothing (Savitzky–Golay) +
mean centre0.002794/0.0213 0.003314/0.00998
Smoothing (Savitzky–Golay) +autoscale
0.002592/0.01364 0.002888/0.01355
Fig. 3 (a) Principal components and RMSECV,RMSEC through PCR
for sandal oil, and (b) latent variables and RMSECV,RMSEC through
PLS for sandal oil.
Fig. 4 (a) Trends in principal components in adulteration (relative
percentile 0–100% v/v), and (b) trends in latent variables in adulteration
(relative percentile 0–100% v/v).
This journal is ª The Royal Society of Chemistry 2010 Analyst, 2010, 135, 2676–2681 | 2679
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
RMSECV, RMSEC.37,45 The PC/LV vs. RMSEC plots are
shown in Fig. 3.
From Fig. 3, it is clear that the optimum number of PCs/LVs
that could be suggested is 2 for both PCR and PLS models.
Table 1 shows the RMSECV, RMSEP values with 2 PCs for
PCR and 2 LVs for PLS.
Model building using pre-processed data. Two calibration
models are built in order to predict the adulterant content in
blends with sandalwood oil using the pre-processed data, namely
PCR and PLSR. The cumulative variance for the first two
components (99.99%) for both themodels is found to be the same.
The first two PCs/LVs account for 99.99% of the variation in
the spectra. In PCR, PC1 explains 96.62% and PC2 explains
3.37%. In PLSR, LV1 explains 98.19% and LV2 explains 1.80%
of the total variance between the samples. Fig. 4 shows the first
two PC/LV scores plotted in a scatter diagram for both the
models.
From Fig. 4, it is observed that 45 samples with different
adulterant concentrations (0–100%, v/v) in pure sandalwood oil
are grouped into two classes. The first group with negative scores
values represents the samples with less adulterant contamination
and more sandal oil.
The second group with positive scores values indicates
samples with more adulterant contamination (>50%) and less
sandal oil character. It is seen that the PLSR model can be
used to separate the samples (pure and blended) in a better
way; even 1% of adulteration in sandalwood oil could be
identified.
Prediction/validation by the models
The Fig. 5 reflects the accuracy and the performances of the
models. The plot of the measured values of concentrations
against the predicted values of concentrations reveals the
accountability of the models.
The statistics of results obtained from the calibration models is
shown in Table 2 below.
The correlation coefficient (R2) is the intensity measure of
the correlation between the measured values and the values
predicted by the model. This may range from 0 to +1. The
closer the value to +1, the higher the correlation between the
data.38 Both PCR and PLSR models shown, in this study,
have very good correlation between the real and predicted
concentrations with coefficient of determination (R2) values
equal to 0.99985 and 0.99986 respectively, a good linear fit
(see Fig. 5). For the two models presented here, the number
of variables significantly reduced to 2 principal/latent variables
that could explain 99.99% of the total variances. It is also
revealed that the RMSEP value (0.01364 for PCR and
0.01355 for PLSR) for each model is minimum. The lower
RMSEP value has a higher degree of accuracy of prediction
by the model.37,38 Both PCR and PLSR give almost the same
R2 value. On the closest examination of the scores plot,
RMSEP and R2 values, the PLSR model is found to be the
best.
Conclusion
In this work, near infrared (NIR) spectroscopy combined with
chemometric techniques is used for screening analysis to
identify sandal oil samples adulterated with low-cost and low-
grade oils like castor oil. This method is accurate and reliable
to detect a deceit and can assist the laboratories, the service of
inspection and quality control of essential oils. For the models
proposed in this work, PCR and PLSR show the lowest
RMSECV and RMSEP values and high correlation between
the measured and predicted concentrations. The methodology
of NIR spectra associated with PCR and PLS techniques is
proven to be suitable as a practical analytical tool to predict
the adulterant content in sandal oil in the range 0–100% (v/v).
Even 1% of contamination can be measured precisely. This
technique may be used for discriminating the counterfeit
effectively. It is also observed that the shift of samples in one
quadrant from the other in the scores plot reflects the real
percentile of adulteration. This information is very helpful in
determining the percentage of adulteration in a non-destructive
manner.
Based on the above findings, for the future we suggest NIR
spectroscopy through chemometrics as a detection tool for
quantitative as well as qualitative analysis of adulterations in
essential oils.
Fig. 5 (a) PCR model measured vs. predicted sandal oil, and (b) PLS
model measured vs. predicted sandal oil.
Table 2 The prediction summary of PCR and PLSR modelsa
Statistical parameters PCR PLSR
RMSEC 0.0011 0.002051RMSECV 0.002592 0.002888RMSEP 0.01364 0.01355Bias 0.002361 0.001920R2 Cal 0.99982 0.99931R2 CV 0.99978 0.99937R2 Pred 0.99985 0.99986
a PCR: Principal Component Regression; PLSR: Partial Least SquareRegression. RMSEC: Root Mean Square Error of Calibration;RMSECV: Root Mean Square Error of Cross-Validation; RMSEP:Root Mean Square Error of Prediction; R2: coefficient of determination.
2680 | Analyst, 2010, 135, 2676–2681 This journal is ª The Royal Society of Chemistry 2010
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
Acknowledgements
The authors would like to thank Mr Jeremy M. Shaver, Chief of
Technology Development, Help desk, Eigenvector Research,
Inc. and Mr S. Valiathan, Quaero, CSG systems, Inc, Bellevue,
WA, USA for the technical assistance and help.
References
1 H. Schulz, M. Baranska, H. H. Belz, P. Rosch, M. A. Strehle andJ. Popp, Vib. Spectrosc., 2004, 35, 81.
2 J. Verghese, T. P. Sunny and K. V. Balakrishnan, Flavour FragranceJ., 1990, 5, 223.
3 C. G. Jones, E. L. Barbour, E. L. Ghisalberti and J. A. Plummer,Phytochemistry, 2006, 67, 2463.
4 G. A. Burdock and I. G. Carabin, Food Chem. Toxicol., 2008, 46, 421.5 B. M. Lawrence, Perfumer and Flavorist, 1991, 16, 49.6 Anonymous, Sandalwood, in The Review of Natural Products, ed. C.Dombek, Wolters Kluwer Co., St. Louis, MO, 1993.
7 G. A. Burdock, Sandalwood white, Fenarolis Handbook of FlavorIngredients, CRC Press, Boca Raton FL, 4th edn, 2002, p. 1684.
8 N. A. Braun, M. Meier and W. M. Pickenhagen, J. Essent. Oil Res.,2003, 15, 63.
9 A. Y. Leung and S. Foster, Sandalwood Oil, Encyclopedia ofCommon Natural Ingredients used in Food, in Drugs andCosmetics, John Wiley and Sons Inc, New York, 2nd edn, 1996.
10 T. H. Kim, H. Ito, K. Hayashi, T. Hasegawa, T. Machiguchi andT. Yoshida, Chem. Pharm. Bull., 2005, 53, 641.
11 S. Arctander, Sandalwood Oil in East India, in Perfume and FlavorMaterials of Natural Origin, Denmark, 1960, p. 574.
12 D. P. Anonis, Perfumer Flavorist, 1998, 23, 19.13 R. E. Naipawer, in Flavors and Fragrances, A World Perspective
Proceedings of the 10th International Conference of Essential Oils,Flavor Fragance, ed. B. M. Lawrence, B. D. Mookherjee andB. J. Willis, Elsevier, Amsterdam, 1988, p. 805.
14 U.S. Dispensatory, Lippincott, Philadelphia, 25th edn, 1955, p. 1836.15 British Pharmaceutical Codex, The Pharmaceutical Society, London,
1949, p. 612.16 Food Chemicals Codex, National Academy Press, Washington DC,
3rd edn, 1981, p. 268.17 International Organisation for Standardisation, 1st edn, ISO 3518,
1979.18 International Organization for Standardisation, ISO 3793, 1976.19 International Organization for Standardisation, 2nd edn, ISO 3518,
2002.20 M. D. Guillen and N. Cabo, J. Agric. Food Chem., 1997, 45, 1.
21 J. S. Oliveira, R. Montalvao, L. Daher, P. A. Z. Suarez andJ. C. Rubim, Talanta, 2006, 69, 1278.
22 A. A. Christy, S. Kasemsumran, Y. P. Du and Y. Ozaki, Anal. Sci.,2004, 20, 935.
23 K. M. Bewig, A. D. Clarke, C. Roberts and N. Unklesbay, J. Am. OilChem. Soc., 1994, 71, 195.
24 Y. S. Chen and A. O. Chen, in Proceedings of the 6th InternationalConference on Near Infrared Spectroscopy, ed. G. D. Batten, P. C.Flinn, L. A Welsh and A. B Blakeney, J. Crainger of Johnzart PtyLtd, Victoria, Australia, 1995, p. 316.
25 M. Marjoniemi, Appl. Spectrosc., 1992, 46, 1908.26 T. Fearn, NIR News, 2002, 13, 12.27 J. Guiteras, J. L. Beltran and R. Ferrer, Anal. Chim. Acta, 1998, 361,
233.28 H. Martens and T. Naes,Multivariate Calibration, Wiley, Chichester,
1989.29 J. E. Jackson, J. Qual. Tech., 1980, 12, 201.30 R. G. Brereton, Chemometrics Data Analysis for the Laboratory and
Chemical Plant, Wiley, Chichester, 2003.31 D. L. Massart, B. G. M. Vandeginste, S. N. Deming, Y. Michotte and
L. Kaufman, Chemometrics. A Textbook, Elsevier Science Publishers,Amsterdam, 1988.
32 T. Naes and H. Martens, J. Chemom., 1988, 2, 155.33 S.Wold, K. Esbensen and P. Geladi,Chemom. Intell. Lab. Syst., 1987,
2, 37.34 P. Geladi and B. Kowalski, Anal. Chim. Acta, 1986, 185, 1.35 T. Naes, T. Isaksson, T. Fearn and A. M. Davies, A User Friendly
Guide to Multivariate Calibration and Classification, NIRpublications, UK, 2002.
36 L. Wang, F. S. C. Lee, X. R. Wang and I.E., Food Chem., 2006, 95,529.
37 O. Divya and A. K. Mishra, Talanta, 2007, 72, 43.38 C. N. C. Corgozinho, V. M. D. Pasa and P. J. S. Barbeira, Talanta,
2008, 76, 479.39 B. M. Wise, N. B. Gallagher, R. Bro and J. M. Shaver, PLS Toolbox
5.8.1, Eigenvector Research, 2010.40 Mathlab 7.0.1., Mathworks, Natick, USA.41 K. Kramer and S. Ebel, Anal. Chim. Acta, 2000, 420, 155.42 H. Schulz, H. H. Drews and H. Kruger, J. Essent. Oil Res., 1999, 11,
185.43 J. J. Workman, Technology and Applications Development for
Molecular Spectroscopy and Microanalysis, Thermo ElectronCorporation, NIR correlation chart: How to interpret NIR fora variety of applications.
44 F. Westad, A. Schmidt and M. Kemit, J. Near Infrared Spectrosc.,2008, 16, 265.
45 Q. Chen, J. Zhao, H. Zhang and X. Wang, Anal. Chim. Acta, 2006,572, 77.
This journal is ª The Royal Society of Chemistry 2010 Analyst, 2010, 135, 2676–2681 | 2681
Dow
nlo
aded
by C
SIR
MA
DR
AS
CO
MP
LE
X(C
SIR
M)
on 2
3 S
epte
mber
2010
Publi
shed
on 0
3 S
epte
mber
2010 o
n h
ttp:/
/pubs.
rsc.
org
| doi:
10.1
039/C
0A
N00261E
View Online
This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/copyright
Author's personal copy
Analytical Methods
Qualitative and quantitative analysis in sandalwood oils using near infrared
spectroscopy combined with chemometric techniques
Saji Kuriakose 1, Hubert Joe ⇑
Centre for Molecular and Biophysics, Mar Ivanios College, Thiruvananthapuram, Kerala, India
a r t i c l e i n f o
Article history:
Received 30 April 2011
Received in revised form 16 May 2011
Accepted 15 April 2012
Available online 21 April 2012
Keywords:
Essential oils
Sandalwood oil
Near infrared spectroscopy
Chemometric techniques
Support vector machine regression
Qualitative
Quantitative differences
a b s t r a c t
Sandalwood oil is an essential oil which finds very wide application in the flavor and fragrance, pharma-
ceutical industry. The objective of this study is to use the potential of near infrared spectroscopy as a
rapid analytical technique for the qualitative and quantitative assessment of purity in sandalwood oils.
The quality and efficacy of sandalwood oils, even though come from the same species, are somewhat dif-
ferent according to growing conditions (origin) and poor extraction methods. Classification of sandal oils
based on their NIR spectra is performed by principal component analysis, hierarchical cluster analysis
and self organising map (Kohonen neural network). All these techniques clearly differentiate the oils
according to the area from which the sandalwood has been cut. Support vector machine regression
(SVM R) is used to predict the purity of the oils.
Ó 2012 Elsevier Ltd. All rights reserved.
1. Introduction
Sandalwood oil is an essential oil obtained by the distillation of
the heartwood and roots of the plant Santalum album (family – San-
talaceae). S. album is a small hemiparasitic tree of great economic
value, growing in Southern India, Sri Lanka, Australia and Indonesia.
Its trunk contains resins and essential oils particularly the a and b-
santalols, santalenes and many other minor sesquiterpenoids
(Jones, Ghisalberti, Plummer, & Barbour, 2006). These sesquiterpe-
noids are responsible for the unique sandalwood fragrance.
Sandalwood oil is used in the food industry as a flavour ingredient.
This oil serves as a fixative for many high – end perfumes. A number
of aromatic and phenolic compounds have also been identified in
the oil S. album (Kim et al., 2005). The quantity of oil produced in
a tree varies considerably according to location (environmental fac-
tors) and age of the tree, even in nearly identical growing conditions
(Jones, Plummer, & Barbour, 2007). It should also be noted that
santalol composition can vary depending on the method of oil
extraction (Piggott, Ghisalberti, & Trengove, 1997). There has been
serious decline in the population of santalum in India due to com-
plex cultivation requirements and non-stop harvesting (especially
from smuggling) associated with limited regeneration (Fox, 2000;
Radomiljac, Ananthapadmanabha, Welbourn, & Rao, 1998).
Sandalwood oil is approved for food and flavour uses by Council
of Europe (CoE, 2000), Flavour and Extract Manufactures Associa-
tion (FEMA) and the United States Food and Drug Administration
(FDA). The sandalwood oil specifications have been reported in
the Food Chemicals Codex (FCC, 2003). The international standard
(ISO 3518, 2002) for sandalwood oil and similar authorities stipu-
late a minimum of 90% w/w santalol (as free alcohol) in the oil (a-santalol comprising approximately 60% and b-santalol comprising
approximately 33% of total santalol) (British Pharmaceutical Co-
dex, 1949; ISO 3518, 2002). Sandalwood oils with santalol level
below these specifications are of inferior quality due to poor
extraction methods, adulteration with synthetic or semi-synthetic
substitutes or some other counterfeit substitution.
Kumar and Maddan (1979) have reported the recommended
iodine value as 283–288 to identify adulteration in pure sandal-
wood oil. Verghese, Sunny, and Balakrishnan (1990) have
suggested, using gas chromatography (GC), 40–55% of a-santaloland 17–27% of b-santalol in the total santalol content (w/w). GC
analysis of S. album oil by ISO 3518 (2002) specifies similar propor-
tions of Z–a-santalol (41–55%) and Z–b-santalol (16–24%) (ISO
3518, 2002). However, these reports do not discuss the potential
variation in S. album oil composition depending on the origin or
age of the tree, nor do they address the evaluation of counterfeit.
The quality and efficacy of the sandalwood oil, even from same
species, are somewhat different according to growing conditions
0308-8146/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.foodchem.2012.04.073
⇑ Corresponding author. Address: Centre for Molecular and Biophysics Research,
Department of Physics, Mar Ivanios College, Thiruvananthapuram 695 015, Kerala,
India. Tel.: +91 471 2531053; fax: +91 471 2530023.
E-mail addresses: [email protected], [email protected] (H. Joe).1 Present address: St. Thomas H.S.S., Pala, Kerala, India.
Food Chemistry 135 (2012) 213–218
Contents lists available at SciVerse ScienceDirect
Food Chemistry
journal homepage: www.elsevier .com/locate / foodchem
Author's personal copy
based on geographical origin. This enforces the requirement of a ra-
pid and accurate analytical method for the correct value estimation
based on origin and for the prevention of illegal distribution. How-
ever the existing analytical tools are not sufficient to determine the
geographical origin clearly as they are time consuming, complex
and tedious. Since sandalwood oil contains more than 100 major
components that are slightly different according to growing condi-
tions viz. geo. origin, we can not select several specific components
as essential criteria. In view of the current issues associated with
sandalwood oil, we conduct an investigative study to develop an
appropriate tool to assess the quality of sandalwood essential oils.
Near Infrared spectroscopy (NIRS) in combination with sophisti-
cated chemometric algorithms can be an excellent tool for quality
control purposes and selection of high qualitymaterials. NIR pattern
recognition method was applied for the discrimination of roasted
coffees (Martin, Pablos, & Gonzalez, 1996) and vegetable oils (Be-
wig, Clarke, Roberts, &Unklesbay, 1994). These proposed researches
were based on the classification of samples with various chemical
constituents. In our previous study,we have developed two reliable,
accurate and non- destructivemodels - principal component regres-
sion (PCR) and partial least square regression (PLSR)models – to de-
tect and quantify caster oil adulteration in pure sandalwood oil
through near infrared spectral data collected from blended sandal
oil samples (Kuriakose, Thankappan, Joe, & Venkataraman, 2010).
The objective of the present work is to apply NIRS along with
chemometric techniques like principal component analysis (PCA)
(Wold, Esbensen, & Geladi, 1987) to ascertain discrimination of
sandal oils. Also pattern recognition technique namely hierarchical
cluster analysis (HCA) (Armenta, Garrigues, & Guardia, 2007) and
self organising map (SOM) (Kohonen, Oja, Simula, Visa, & Kangas,
1996) are applied to classify sandal oils from same species
(chemotypes) but different geographical origin. Support vector ma-
chine regression (SVM R) (Vapnik, 1995) is used to quantify the %
level of counterfeits in the oils.
2. Materials and experimental methods
2.1. Materials
Five sandalwood oil samples are procured from three different
geographical origins (three regions of two states, Kerala and Karna-
taka in India) consisting of the same species. The samples are clas-
sified and named into five groups as A–E. Sample A is acquired
from Kairali, Arts and Crafts, Kerala Government. Samples B and
C are obtained from two different sandalwood factories in Mysore,
Karnataka State. Samples D and E are collected from two sandal-
wood industries in Banglore, Karnataka State. All samples are
stored at 4 °C in aluminium bottles and protected from light until
they are analysed. None of them are subjected to any treatment
as these may change their composition.
2.2. Samples preparation
At least 8 h prior to spectroscopic measurement, the samples
are brought to ambient temperature (20 °C). Each sample is diluted
to 10% with proper solvent (carbon tetrachloride (v/v)) and homog-
enised using an electromagnetic stirrer. Stringent protocols have
been applied to avoid any sample loss or change. A total of 49 sam-
ples are prepared from all the classes. (10 samples each from 4
groups A–D and 9 samples from group E).
2.3. NIR spectra collection
Near infrared spectra of 49 samples are recorded over 800 nm –
2500 nm spectral region, at 1 nm spacing with a NIR spectropho-
tometer Cary 5000 (SI No. EL 03127331) with a resolution of
0.01 nm. The spectra are collected in 1 nm data intervals. An
average spectrum of a number of spectra for each sample is
obtained. All the spectra are recorded in absorbance units. The
sample spectra are divided into training set (n = 39) and test set
(n = 10), comprising those from each group.
2.4. Chemometrics and data analysis
PCA, HCA, SOM and SVM R are performed using algorithms from
PLS Toolbox 6.0.1 supported by Matlab environment (Wise, Galla-
gher, Bro, & Shaver, 2010; Matlab, 2010a, 2010). (In this study, the
unsupervised methods PCA, HCA and SOM are used for model com-
parison and confirmation of the results obtained). A Proper prepro-
cessing technique namely smoothing (Savizky–Golay filters)
coupled with mean centre is used to remove background noise
and to increase spectral resolution (Savizky & Golay, 1964). Leave
– one-out (LOO) Cross-validation is used to calibrate the model.
3. Results and discussion
3.1. Near infrared absorbance spectra
The near infrared absorption spectra of 5 classes of sandalwood
oils (collected from 3 different geographical origins) over the spec-
tral range 800–2500 nm at 1 nm spacing are measured and are
converted to ASCII files using Varian software. (Spectra of 49 sam-
ples; 10 samples each from classes A–D, 9 samples from Class E).
3.1.1. Spectra investigation
The near infrared region (780–2498 nm) is dominated by over-
tones and combination bands arising from the unharmonic nature
of molecular vibrations. Absorption in the NIR region arises from
the vibrational motion of molecules. According to former studies
performed on essential oils, the majority of the absorption bands
in the near infrared spectra of the analysed oil samples arise from
the overtones of hydrogenic stretching vibrations or combination
involving stretching and bending modes (Schulz, Prews, & Kruger,
1999). The absorption bands observed at 1135–1215 nm are due to
methylene (CH) stretching (2nd (3n) overtone and 2n combination
bands). The peaks at 1375–1399 nm are related to methly1 (CH)
stretch and bending combination (2nd (3n) overtone and 2n com-
bination bands). The peak around 1690–1695 nm is associated
with methyl asymmetric stretching (1st (2n) overtone (CH)
stretch) respectively (Westad, Schmidt, & Kemit, 2008). The in-
tense peaks centered near 2308 and 2348 nm are related with
combination of CH stretching vibration and deformation tones.
(Hourant, Baeten, Morales, Meurens, & Aparicio, 2000).
3.1.2. Wavelength selection
There has been much debate as to the importance of selecting
those fewwavelengths that contain significant information for opti-
malmodel development, thus reducing the number of wavelengths,
variables and model complexity. Recently researches are conduct-
ing to investigate the importance of combining some wavelengths,
(synergic) to those containing problem dependent information
(descriptive wavelengths) to improve the model performance.
There are considerable differences observed in the spectra of all
samples in the wavelengths range 1650–2500 nm. Also water
(20 °C) has overtones at 1450, 970 and 760 nm. A combination of
OH stretching and bending occurs around 1940 nm. The presence
of water content hinders the model performance for this applica-
tion. Taking into account the above circumstances, the full spectrum
range is divided into two regions; 1650–1900 and 2000–2500 nm
214 S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218
Author's personal copy
and combined. This combined spectral region is exploited for the
model development.
3.2. Principal component analysis (PCA)
Spectral data contains variables that hold in redundant infor-
mation. These variables are collinear (i.e. highly correlated). Due
to the high collinearity, a new co-ordinate system based on vari-
ance is required to represent the original data.
Principal component analysis (PCA) can be thought of as an
algorithm that finds principal components (PC’s) that either maxi-
mise variance or minimise the sum of squares of the residuals of
the samples (Esbensen, Schonkopf, & Midtgaard, 1994). PCA will
continue to generate PC’s which are orthogonal to each other.
The orthogonally property of the PC’s removes collinearity among
the spectral variables without eliminating spectral information.
PCA is done here to graphically determine grouping patterns
(based on geographical origin) in an effort to classify (qualitative
difference) sandalwood oil samples according to percentage of
adulterants. In addition, this pattern recognition technique is used
in order to observe the similarities among different oil samples.
PCA is carried out in the ranges of the selected spectral region
1650–1900 and 2000–2500 nm at 1 nm intervals as discussed ear-
lier. Out of the 49 sample spectra recorded, 39 spectra involving all
classes are randomly selected for PCA.
3.2.1. Spectral preprocessing
Before performing PCA, all of the samples spectra are pre-pro-
cessed to reduce noise and enhance the spectral features. Various
pre-processing techniques like first derivative, second derivative,
none (without pre-processing), smoothing (Savizky–Golay filters),
autoscale, mean centre, multiplicative scatter correction and their
various combinations are investigated. The root mean square error
of calibration (RMSEC) and the root mean square error of cross-val-
idation (RMSECV) are used as tools to investigate the pre-process-
ing methods and to select the best. The qualities of the results are
compared using RMSEC values. Since smoothing (Savizky–Golay
filters order – 2 and window 15 points) coupled with autoscale
produced the lowest RMSECV/RMSEC values, this method is chosen
as the best. Other pre-processing techniques have not yielded good
result for this application.
3.2.2. Calibration and cross validation – optimum number of
components
In order to determine the optimum number of PC’s, one must
examine the percent variance captured (high is good) or the RMSEC
values (low is good). For this example, 2 components are enough to
reach a stable minimum RMSEC (=0.413% v/v) value. Hence 2 PC’s
are used to optimise the PCA model performance and minimise the
model errors caused by the under fitting and over fitting the data.
3.2.3. Model development using pre-processed data
PCA model with 2 PC’s is built to group 5 classes of sandal wood
oils of different origin using the pre-processed data. The first 2 PC’s
account for 82.49% of the data matrix variance. PC1 explains
52.60%, PC2 explains 29.89%. For PC1, the eigen value of covariance
is 3.94 e + 002 and for PC2, it is 2.24 e + 002. Fig. 1 shows the first 2
PC scores plotted in the scatter diagram.
From Fig. 1, it is observed that the five classes A–E of sandal-
wood oils are distributed in four quadrants. Oils from class E are
on the bottom left quadrant in the negative PC1; oil classes B
and D are on the bottom right quadrant in the positive PC1; oils
from class A lies in the top right quadrant in the positive PC2; oil
class C lies in the top left quadrant in the positive PC2.
We also see that the class E is the most unique group in the data
sets. The distance between E and its nearest neighbour is further
than that between E and any other class on the score plot. A trend
that stretches from class C to class B is also seen. We expect that
classes with similar scores should be similar. For example, classes
B and D are close together, implying that they are similar. Classes
that are diametrically opposed are ‘negatively correlated’. This sug-
gests that they measure opposite properties. For instance, a high E
corresponds to a low B value and vice versa.
The PCA on assessment shows the ability to classify the sandal-
wood oil samples into four classes instead of the actual five classes,
indicating that this model is not enough to distinguish the five dif-
ferent regions effectively. Hence unsupervised pattern techniques
like HCA or SOM is suggested.
3.3. Support vector machine regression (SVM R)
Quantification of adulterant level is calculated by SVR .The same
spectral regions used for the PCA is also used for SVR model. 39
samples are selected for calibration and 10 independent samples
involving all classes are chosen for prediction. Savizky–Golay
smoothing (order-2, window 15 pts) coupled with mean centre is
used as pretreatment method for denoising.
The support vector machine (SVM) is a novel machine learning
method based on statistical learning theory. This aims to minimise
the training error (empirical risk) and the model complexity, there-
by providing high generalisation abilities (estimation accuracy)
(Vapnik, 1995). SVM provides nonlinear and robust solutions by
mapping the input space into higher dimensional feature space
using kernel functions. SVM has been extended to solve nonlinear
regression estimation problems such as support vector regression
(SVR) which have been shown to exhibit excellent performance
(Smola, 1996). Detailed descriptions of SVR can be found in Vapnik
(1995) and Smola (1996).
It is well-known that the selection of the kernel function and
corresponding parameters plays an important role in obtaining
good forecasting .There are many types of kernel functions. In this
study, the commonly used radial basis function (RBF) is adopted.
The accuracy of SVR depends on the selection of the hyper param-
eters (cost-C and epsilon-e) and the kernel parameter (gamma-c).e is the precision parameter. Bigger e leads to fewer support vec-
tors. C is the penalty factor. It determines the trade-off between
the model complexity and the degree to which deviations larger
than e are tolerated. If C is too small (large), it may cause underfit-
ting (overfitting).
The suitable value of parameters C, e and c are obtained after
several trials. The root mean square values (RMSEC/RMSECV) are
compared (low value is good) to reach at suitable choice of
these parameters (In the present study, RMSEC = 0.007293 and
RMSECV = 0.007159 are chosen.). In this algorithm C = 31.6228,
e = 0.01, c = 0.001. The number of SVs is 3. Leave-one-out cross-
validation is used to generate the model with SVM on the input
data set. Fig. 2 shows the cross-validation optimisation.
The correlation coefficient (R) and root mean square error of
prediction (RMSEP) are used to judge the performance of the SVR
model in predicting the percentage of adulteration. A coefficient
of determination, R2 = 0.9998 (RMSEP = 0.00804) is achieved using
RBF kernel. The plot as given in Fig. 3 between actual and predicted
concentrations as well as the high R2 value suggests a better per-
formance by RBF kernel for this NIR data set.
3.4. Hierarchical cluster analysis (HCA)
HCA is an unsupervised technique for solving classification
problems. Its aim is to sort cases into clusters such that the degree
of association is strong between members of the same cluster and
weak between members of different clusters. It is a statistical tool
for selecting relatively homogeneous groups of case based on
S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218 215
Author's personal copy
measured variables. It starts with each case in a separate cluster
and then combines the clusters sequentially, reducing the number
of clusters at each step until only one cluster is left (Massart,
Vandeginste, Buydens, Jony, & Lewi, 1997).
In this study, HCA is carried out on the spectral region 1650–
1900, 2000–2500 nm at 1 nm spacing of the NIR spectra of sandal-
wood oils. The algorithm used to classify the oils samples from
their NIR spectra is the K-nearest neighbour linkage method which
calculates Mahalanobis distance.
Fig. 4 shows the results from the HCA. Five clusters can be iden-
tified. Clusters A–D contains 8 oil samples each. Cluster E consists
of 7 samples.
The dendrogram shows that the five clusters of oil samples are
well separated from each other. Cluster E is unique and is sepa-
rated from other four classes. There is no overlap between the clas-
ses of oils. This means that 100% correct classification is possible by
HCA.
3.5. Self organising map (SOM)
The self-organising map proposed by Kohonen is also suitable
and efficient for performing an unsupervised clustering. A SOM
consists of a two-dimensional grid of the computational units
called neurons in the network. SOM is similar to the PCA method
that performs dimensionality reduction and classification. The dif-
ference between the two approaches is that the SOM performs a
nonlinear lower dimensional mapping while PCA is a linear map-
ping technique. SOM is a ‘‘map’’ of the training data, dense where
there is a lot of data and thin where the data density is low. The
map constitutes of neurons located on a regular map grid. The lat-
tice of the grid can be either hexagonal or rectangular. Each neuron
(hexagon) has an associated prototype vector (weight). The algo-
rithm trains the SOM iteratively. After training, neighbouring neu-
rons have similar prototype vectors.. In each training iteration, a
sample vector X is selected from the input data set and the grid
Fig. 1. Trends in principal components in classification of sandal oils.
Fig. 2. Contour plot of cross validation accuracy for SVM regression.
Fig. 3. Actual vs. predicted concentration by using radial basis kernel of support vector machines.
216 S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218
Author's personal copy
node that is nearest to X (also called ‘‘best matching unit’’, BMU) is
determined. The BMU of a data vector is the unit on the map whose
model vector best resembles the data vector. In practise the simi-
larity is measured as the minimum distance (commonly evaluated
using the Euclidean metric) between data vector and each model
vector on the map. In the next step, the weight vector of the
BMU and those of its grid neighbours are moved closer to the input
vector X using the Kohonen learning rule. The result of such reor-
ganization is that similar weight vectors are brought closer to each
other while leaving apart the dissimilar ones. Implementing this
procedure iteratively forces the randomly initialized weight vec-
tors to mimic the distribution of input data patterns in the output
space.
A visual inspection of the trained SOM is provided by a widely
used method described below. The unified distance matrix
(UDM) provides significant information in the form of distances
between nodes of the SOM grid. In this method, a matrix of dis-
tances (called ‘‘U-matrix’’) between the d-dimensional weight vec-
tors of neighbouring nodes of the two-dimensional SOM is
computed. The U-matrix distances can be used to unravel the
structure of the data clusters present in the data set under study.
The density of the weight vectors is illustrative of the density of
the input data patterns. Accordingly, the UDMmeasuring distances
between the weight vectors is indicative of the said density and a
proper representation such as grey level or colour imaging can be
designed to interpret the distances between two neighbouring grid
nodes. The optimum size of the two-dimensional SOM grid is se-
lected by training the SOM with different grid sizes and pre-spec-
ified number of training iterations. The optimum grid size obtained
thereby contains an array of [10 � 10] nodes. Here, the SOM algo-
rithm is run for 10,000 training iterations. The results of the SOM-
based classification are portrayed in the form of a U-matrix plot in
Fig. 5. In this figure, the actual data points are also plotted as dark
coloured hexagons. In the U-matrix plot, a dark coloured node indi-
cates that its weight vector is at a higher distance from those of the
adjoining light coloured nodes.
The Labeled matrix is shown in Fig. 6.
Both Figs. 5 and 6 show distinct differentiation of the 5 classes
of sandalwood oils based on their origin.
4. Conclusion
The results of this research confirm that near infrared spectros-
copy can be used for the qualitative as well as quantitative analysis
of compositions of sandalwood oils. Sandalwood oils of the same
species produced by various extraction methods, either developed
or under developed and collected from different regions (origin)
can be easily discriminated by the difference in their NIR spectra.
Near infrared spectroscopy assisted by multivariate chemometric
techniques viz. principal component analysis, hierarchical cluster
analysis or self organising map can be successfully applied to the
classification of sandalwood oils according to their quality. Quanti-
fication of the constituents, especially adulterants if any, in sandal-
wood oils is achieved by support vector machine regression with
proper combinations of data pretreatments.
In a nutshell, NIR spectroscopy technique has high potential to
determine the qualitative differences and to quantify simulta-
neously the compositions of essential oils.
Acknowledgements
The authors are grateful to Mr. Xavier T. S., Research Scholar,
Center for Molecular and Biophysics Research. Mar Ivanios College,
Fig. 4. HCA dendrogram of sandal oils using Mahalanobis distances: (A–E indicate various classes of oils).
Fig. 5. Unified distance matrix or SOM neighbour distances.
Fig. 6. Labeled matrix or SOM sample hits.
S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218 217
Author's personal copy
Thiruvananthapuram, Kerala, India for the valuable suggestions
and also to STIC, Cochin University of Science and Technology (CU-
SAT), Kerala, India for the technical assistance and help.
References
Armenta, S., Garrigues, S., & Guardia, M. I. (2007). Determination of edible oilparameters by near infrared spectrometry. Analitica Chimica Acta, 596(2),330–337.
Bewig, K. M., Clarke, A. D., Roberts, C., & Unklesbay, N. (1994). Discriminant analysisof vegetable oils using near infrared spectroscopy. Journal of American OilChemists Society, 71, 195–201.
British Pharmaceutical Codex, (1949). The Pharmaceutical Society, London, pp. 611–612.
CoE (2000), Santalum album L., Natural sources of flavorings, Council of Europepublishing, Strasbourg Cedex, France, pp. 235–236.
Esbensen, K., Schonkopf, S., & Midtgaard, T. (1994). Multivariate analysis in practice.CAMO AS, Wennbergs Trykkeri AS, Trondheim, Norway.
FCC (2003). Sandalwood oil, East Indian type, Food Chemicals Codex, (5th ed.),National Academy Press, Washington DC, pp. 395–397.
Fox, J. E. D. (2000). Sandalwood: The royal tree. Biologist (London), 47(6), 31–34.Hourant, P., Baeten, V., Morales, M. T., Meurens, M., & Aparicio, R. (2000). Oil and fat
classification by selected bands of near infrared spectroscopy. AppliedSpectroscopy, 54(8), 1168–1174.
ISO 3518 (2002), Oil of sandalwood (Santalum album L.), (2nd ed.), InternationalOrganization for Standardization.
Jones, C. G., Ghisalberti, E. L., Plummer, J. A., & Barbour, E. L. (2006). Quantitative co-occurrence of sesquiterpenes; a tool for elucidating their biosynthesis in Indiansandalwood Santalum album. Phytochemistry, 67(22), 2463–2468.
Jones, C. G., Plummer, J. A., & Barbour, E. L. (2007). Non-destructive sampling ofIndian sandalwood, Santalum album L. for oil content and composition. Journalof Essential Oil Research, 19, 157–164.
Kim, T. H., Ito, H., Hayashi, K., Hasegawa, T., Machiguchi, T., & Yoshida, T. (2005).Aromatic constituents from the heartwood of Santalum album L. Chemical
Pharmaceutical Bulletin, 53, 641–644.Kohonen, T., Oja, E., Simula, O., Visa, A., & Kangas, J. (1996). Engineering applications
of the self organizing map. Proceedings of the IEEE, 84(10), 1358–1384.
Kumar, S., & Maddan, T. R. (1979). A rapid method for detecting adulteration inessential oils. Research and Industry, 24, 180–182.
Kuriakose, S., Thankappan, X., Joe, H., & Venkataraman, V. (2010). Detection andquantification of adulteration in sandalwood oil through near infraredspectroscopy. Analyst, 135, 2676–2681. http://dx.doi.org/10.1039/C0AN00261E.
Martin, M. J., Pablos, F., & Gonzalez, A. G. (1996). Application of pattern recognitionto the discrimination of roasted coffees. Analytica Chimica Acta, 320, 191–197.
Massart, D.L., Vandeginste, L.M.C., Buydens, S. Jony, P.J.D., & Lewi, S.V. (1997).Handbook of Chemometrics and Qualimetrics: Part a, Amsterdam, Elsevier.
Matlab 7.0.1, (2010), Natick, USA, Mathworks Inc.Piggott, M. J., Ghisalberti, E. L., & Trengove, R. D. (1997). Western Australian
sandalwood oil extraction by different techniques and variations of the majorcomponents in different sections of a single tree. Flavour and Fragrance Journal,12, 43–46.
Radomiljac, A.M., Ananthapadmanabha, H.S., Welbourn, R.M., & Rao, K.S. (1998),Eds., Sandal and its products, ACIAR Proceedings No. 84, Australian Centre forInternational Agricultural Research, Canberra.
Savizky, A., & Golay, M. J. E. (1964). Smoothing and differentiation of data bysimplified squares procedures. Analytical Chemistry, 36, 1627–1639.
Schulz, H., Prews, H. H., & Kruger, H. (1999). Rapid nirs determination of qualityparameters in leaves and isolated essential oils of menthe species. Journal ofEssential Oil Research, 11, 185–190.
Smola, A. J. (1996). Regression estimation with support vector learning machines.Master’s thesis: Technische, Universitot, Miinchen.
Vapnik, V. N. (1995). The nature of statistical learning theory. Verlag, New York:Springer.
Verghese, J., Sunny, T. P., & Balakrishnan, K. V. (1990). (+)-a-santalol and (ÿ)-b-santalol (Z) concentration, a new quality determinant of East Indiansandalwood oil. Flavour and Fragrance Journal, 5, 223–226.
Westad, F., Schmidt, A., & Kemit, M. (2008). Incorporating chemical bandassignment in near spectroscopy regression models. Journal of Near Infrared
Spectroscopy, 16, 265–273.Wise, B.M., Gallagher, N.B., Bro, R., & Shaver, J.M. (2010). PLS Tool box 5.8.2,
Eigenvector Research.Wold, S., Esbensen, K., & Geladi, P. (1987). Principal component analysis-a tutorial.
Chemoetrics and Intelligent Laboratory Systems, 2, 37–52.
218 S. Kuriakose, H. Joe / Food Chemistry 135 (2012) 213–218
Feasibility of using near infrared spectroscopy to detect
and quantify an adulterant in high quality sandalwood oil
Saji Kuriakose 1, I. Hubert Joe ⇑
Centre for Molecular and Biophysics Research, Department of Physics, Mar Ivanios College, Thiruvananthapuram 695 015, Kerala, India
h i g h l i g h t s
� Sequential spectra approach is used
to observe the wavelength relevance
on prediction.
� Synergic and descriptive relations
among consecutive wavelength
regions are studied.
� Oil authenticity and adulteration can
be detected with less than 0.0001%
error.
g r a p h i c a l a b s t r a c t
a r t i c l e i n f o
Article history:
Received 20 March 2013
Received in revised form 13 June 2013
Accepted 19 June 2013
Available online 1 July 2013
Keywords:
Oil authenticity
Economic adulteration
NIR spectroscopy
PLSR
LWR
a b s t r a c t
Determination of the authenticity of essential oils has become more significant, in recent years, following
some illegal adulteration and contamination scandals. The present investigative study focuses on the
application of near infrared spectroscopy to detect sample authenticity and quantify economic adultera-
tion of sandalwood oils. Several data pre-treatments are investigated for calibration and prediction using
partial least square regression (PLSR). The quantitative data analysis is done using a new spectral
approach – full spectrum or sequential spectrum. The optimum number of PLS components is obtained
according to the lowest root mean square error of calibration (RMSEC = 0.00009% v/v). The lowest root
mean square error of prediction (RMSEP = 0.00016% v/v) in the test set and the highest coefficient of
determination (R2 = 0.99989) are used as the evaluation tools for the best model. A nonlinear method,
locally weighted regression (LWR), is added to extract nonlinear information and to compare with the
linear PLSR model.
Ó 2013 Elsevier B.V. All rights reserved.
Introduction
Sandalwood oil, a volatile essential oil obtained by the steam
distillation of the dried wood from the trunk and roots of a small
hemi-parasitic tree, is an economically important product. The
main constituent of the oil (70–90%) is an alcohol predominantly
a – santalol and b – santalol. It also contains resins; santalenes
and many other minor sesquiterpenoids [1]. The a – santalol and
b – santalol are responsible for the unique sandalwood fragrance.
Sandalwood oil serves as a fixative for many perfumes. It is used
in cosmetic products and as a flavor component in many food
products. This essential oil is found to have antiviral anti-carcino-
genic and bactericidal activity. Hence the oil is used medicinally for
common colds, fever bronchitis, inflammation of the mouth and
pharynx, infection of urinary tract, liver and gall bladder com-
plaints and other maladies [2].
In recent years, almost all of the sandalwood oil traded interna-
tionally is called East Indian Sandalwood oil. It is distilled from the
trunk and roots of Santalum album (family – santalaceae). Santalum
1386-1425/$ - see front matter Ó 2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.saa.2013.06.076
⇑ Corresponding author. Tel.: +91 471 2531053; fax: +91 471 2530023.
E-mail addresses: [email protected] (S. Kuriakose), hubertjoe@gmail.
com (I.H. Joe).1 Present address: St. Thomas H.S.S., Pala, Kerala 686 575, India.
Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573
Contents lists available at SciVerse ScienceDirect
Spectrochimica Acta Part A: Molecular andBiomolecular Spectroscopy
journal homepage: www.elsevier .com/locate /saa
spicatum sandalwood and its oil are obtained from a small, wild
growing shrub. Although similar to East Indian sandalwood oil,
top notes of S. spicatum sandalwood oil are characteristically
different [3].
The Indian sandalwood oil (species – S. album) contains
71.2–90% of total santalol (46.6–59.9% of a – santalol and
24.6–30.1% of b – santalol and the sandalwood oil from S. spicatum
contains 38.7–46.1% total santalol [4] (27.9–35.3% of a – santalol
and 10.8% of b – santalol. The minimum level of total santalol con-
tent stipulated by regulatory agencies is nearly 90%. Sandalwood
oils having total santalol content around this level are considered
as superior or high quality oils and those much below this level
are considered as inferior or low quality oils.
Due to its sensory quality, extensive use and steep rise in the
price, sandalwood oil is subjected to two types of adulteration
[5,6]. The first is the blending of higher grade, high cost sandal-
wood oil with lower grade sandalwood oil (containing low percent
santalol content). The second is the mixing of sandalwood oil with
synthetic or semi synthetic substitutes such as sandaloreÒ. Adulte-
ration with low cost or low quality oils may afford considerable
profits from the economic point of view. It is a serious problem
for regulatory agencies, oils suppliers and can be a threat to the
health of consumers. The issue of oil authenticity is mainly related
to improper labeling and substitution, in part or whole; of cheaper
products for high cost ones. Low cost oil substitution has become
recently the main form of sandalwood oil adulteration (as per de-
tails collected from local vendors).
The determination of authenticity for sandalwood oils is tradi-
tionally a time – consuming and laborious process typically using
chromatographic methods like HPLC, and GC. Near infrared spec-
troscopy (NIRS) aided by chemometric methods can be an effective
analytical technique to verify the authenticity of oils due to its sim-
plicity, rapidity and easiness in sample preparation. NIRS has been
used in the quantitative measurement of adulteration of virgin ol-
ive oils by different vegetable oils or by olive pamace oils [7–10].
The results of the study by Christy et al. revealed that the models
could predict the adulterants, corn oil, sun flower oil, soya oil, wal-
nut oil and hazelnut oil involved in olive oil with error limits ±0.57,
±1.32, ±0.96, ±0.56 and ±0.57% weight/weight, respectively.
Although sandalwood oil has been used for centuries in cosmetics,
food and flavors, no studies other than ours evaluating its adulte-
ration level using NIRS [11,12] are found in the scientific literature
so far.
The aim of this study is to investigate the use of near infrared
spectroscopy combined with chemometric data analysis to address
the issue of oil authenticity and economic adulteration in sandal-
wood oil. Partial least square regression (PLSR), a factor based lin-
ear multivariate calibration method, is used to resolve the spectral
data into ‘loadings’ and ‘scores’ and to build the corresponding cal-
ibration model from the new variables. Also, locally weighted
regression (LWR) using local PLS algorithm (a nonlinear method)
is used to detect and quantify the adulteration with other low cost
oils in high quality, high cost sandalwood oils.
Materials and methods
Materials
Sandalwood oils of two different species (higher grade and low-
er grade) were procured. Indian sandalwood oil (S. album) was ob-
tained from Mysore Sandalwood Industries, Karnataka state, India.
This oil sample was highly concentrated, expensive and satisfies
the conditions stipulated by Indian regulatory authorities. The san-
dalwood oil of S. spicatum was sponsored by a local vendor. It was
thin, colorless, odorless and less expensive. Both the oil samples
were stored at 4 °C in aluminum bottles and protected from light
until they were analyzed. The oils were analyzed without any prior
treatment. The sandalwood oil of S. spicatum is chosen as the adul-
terant because of its year round availability and low cost as com-
pared to S. album. (This is as per details collected from local
vendors at ‘Marayoor’, a place in Idukki district, Kerala, India where
sandal trees are greatly cultivated and harvested).
Samples preparation
Prior to sample preparation and spectroscopic measurement,
both the oils are brought to ambient temperature (20 °C). Each
oil is diluted to 10% (v/v) with a solvent like carbon tetrachloride2
in separate flasks and homogenized using an electromagnetic
stirrer. Stringent protocols viz. normal temperature and pressure,
non-exposure to direct light, accurate measurement using very
precise instruments etc. have been applied to avoid any sample
loss or change during preparation. Samples are prepared by adding
percentile low cost oil (S. spicatum) in solvent with high cost oil
(S. album) in the same solvent. Relative S. spicatum oil fraction (%
v/v) in the samples varies from 0% to 100%. The samples prepared
represent sandalwood oil species, neat S. spicatum (taken as 100%
adulteration) and authentic, neat S. album (taken as 0% adultera-
tion) and a range of blended samples containing 1%, 5–95% of S.
spicatum oil in 5% increment according to volume. Thus a set of
22 classes of blended oil samples are prepared and are used for cal-
ibration (class 0: 0% S. spicatum and 100% S. album, class 1:1% S.
spicatum and 99% S. album, class 5: 5% S. spicatum and 95% S. album
and so forth until class 22: 100% S. spicatum and 0% S. album.). An
independent set of 22 classes of sandalwood oil samples collected
from vendors at different regions of ‘Marayoor’ is used for predic-
tion. The samples are labeled as ‘training set’ (n = 22) and ‘test set’
(n = 22) separately. All the samples are kept in glass bottles. All
measurement is carried out in 20 °C in air conditioned rooms.
Near infrared spectra acquisition
NIR spectra of 44 samples over 900–2000 nm spectral regions at
1 nm interval are recorded, for a total of 1100 wavelengths using a
NIR spectrophotometer JASCO, V – 570 (JASCO INTERNATIONAL
CO., Ltd., 4–21, Sennin–cho 2 – chome, Hachioji, Tokyo 193-0835,
Japan) with a resolution of 0.05 nm. The blending of sandalwood
oils is carefully carried out in a 4 mm quartz cuvette, starting with
pure sandalwood oil in the cuvette first. Oil samples at 20 °C are
scanned and the spectrum per sample (an average of eight scans)
is recorded. All the spectra are recorded in absorbance units.
Chemometrics
Since spectral data contain noise and extra information irrele-
vant to the calibration, a robust model is necessary to extract rel-
evant information for the prediction of adulteration percentage
(response variable). Several spectral data pre-treatments like first
and second derivatives using Savitzky–Golay algorithm, smoothing
(window pt – 15, order – 2), mean centre, autoscale, smooth-
ing + mean centre and smoothing + autoscale are explored and
the results are compared to assess the best pre-treatment and
regression model combination for resolving the oil authenticity.
PLSR [13] is performed using algorithm from PLS Toolbox 7.0.3
[14] supported by Matlab environment [15] to develop the model.
Two spectral approaches (An investigation on the significance of
combining some wavelengths (synergic), though not necessarily
2 Disclaimer: The authors disclaim any obligation, responsibility or liability which
concerns or relates to the handling of the chemical.
S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573 569
important by themselves, to that containing problem dependent
information (descriptive wavelengths) to improve the model per-
formance.) are studied to build the calibration models; full spec-
trum and a sequential spectrum in which 100 nm wavelengths
are added sequentially to the previous model to determine the best
optimal solution. The wavelengths are added ‘sequentially’ to the
previous model so as to check the improvement in the perfor-
mance of the new model over the just previous model and also
to determine the effective wavelength range of the best, reliable
model.
The effects of the data pre-treatment methods on the perfor-
mance of the PLS models are evaluated in terms of root mean
square error of calibration (RMSEC), how well the selected model
fits the calibration data; the root mean square error of prediction
(RMSEP), the error expected when the model is used in future pre-
dictions and the coefficient of determination for calibration (R2
cal.) and prediction (R2 pred., the relationship between measured
and predicted values.
Generally, the NIR spectral data contain both linear and nonlin-
ear information. The PLSR model extracts only the linear informa-
tion from the spectral data. The nonlinear method, LWR (locally
weighted regression) is added to extract nonlinear information
and to compare with the PLSR model.
LWR [15] calculates a single locally weighted regression model
using the given number of principal components to predict a
dependent variable Y, adulteration percentage from a set of inde-
pendent variables X, spectral data. LWR model is useful for per-
forming predictions when the dependent variable y, has a
nonlinear relationship with the measured independent variables,
x. This model works by choosing a subset of the calibration data
to create a local model for a given new sample. Once the samples
are selected, one of the three algorithms namely global principal
component regression/principal component regression/partial
least square regression is used to calculate the local model. In this
study, the raw data of the selected samples are used to create a
weighted PLS model (Local PLS). The models are more adaptable
to highly varying nonlinearity.
Results and discussion
NIR spectra of sandalwood oils
The absorbance spectra for pure and blended mixtures of
S. album and S. spicatum (used as adulterant) over the spectral
range 900–2000 nm are shown in Fig. 1. (Spectra of 22 sample clas-
ses with different relative fractions 0–100% v/v of S. spicatum in S.
album oil).
Overall, a lower absorbance is visible for samples containing
100% S. album where as the higher absorbance values represent
samples containing 100% S. spicatum oil. As the percentage of adul-
teration increases, the sample absorbance also increases at every
wavelength.
According to our former studies performed on sandalwood oils,
the near infrared spectra (900–2000 nm) of the analyzed oil sam-
ples are dominated by overtones and different combinations of
C–H stretching and bending vibrations [11,12]. Fig. 2 shows the
representative spectra of pure and neat oil of S. album and adulter-
ant oil.
It is observed that the peaks are present at 1152 nm, 1205 nm
(due to adulterant only), 1365 nm, 1411 nm, 1650 nm, 1693 nm,
1712 nm, 1755 nm (due to adulterant only), 1836 nm and
1891 nm. The 1152 and 1205 nm are related to methylene, methyl
and ethenyl C–H stretch (2nd overtone bands); the 1365 and
1411 nm are incorporated to methylene, methyl C–H stretch and
bending combination (2nd overtone and 2n combination
bands);the 1650, 1693, 1712, 1755 and 1836 nm are attributed
to, methylene, methyl and ethenyl asymmetric stretching (1st
overtone C–H stretch bands);the 1891 nm is a combination band
from moisture (H2O), but has poor information for the character-
ization of oils [16,17]. Hence this peak 1891 nm is excluded from
the wavelength selection. When any of these specific peaks alone
is targeted for calibration, the performance of the models is seen
worse. This means that the information incorporated into the mod-
els is due to the combined effect of wavelengths that contain rele-
vant information. This requires the necessity of applying a ‘spectral
approach’ for selecting effective wavelength range.
Partial least square regression (PLSR)
The orthoganality property of the latent variables removes the
collinearity among spectral variables without eliminating spectral
information.
Fig. 1. NIR absorbance spectra of 22 classes of sandalwood oils – relative fraction
0–100% v/v of S. spicatum in S. album oil. Samples are named according to % of
adulteration of S. spicatum oil. (0% = 0% S. spicatum and 100% S. album; 1% = 1% S.
spicatum and 99% S. album; 100% = 100% S. spicatum and 0% S. album; 5–95%
according to 5% increment in adulteration percentage of S. spicatum). The bottom
spectrum represents 0% class and the topmost spectrum represents 100% class. The
intermediate spectra represent 1%, 5–95% classes. (1836 nm indicates biomarker
peak).
Fig. 2. Representative spectra of neat high quality oil of S. album and adulterant oil.
570 S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573
Spectral approach for wavelength selection and data pre-treatment
Two approaches are investigated in each data pre-treatment
(none or untreated data, 1st and 2nd derivatives, smoothing
(Savitzky–Golay), mean centre, autoscale, smoothing (window –
15 pt, order – 2) + mean centre and smoothing + auto scale) to
observe wavelength relevance on model prediction. The ‘full spec-
trum approach’ uses the full wavelength region, 1850–1100 nm
(750 data points). In the ‘sequential spectrum approach’, a
100 nm region is added to the previous consecutive spectral region
to determine a new model. However, the ‘‘sequential spectrum’’
method seems to be identical to Interval PLS (iPLS), it is quite dif-
ferent from the standard iPLS approach [18]. The first model is
developed based on 1850–1800 nm wavelength range. The
remaining models are developed by adding a 100 nm wavelength
range at a time, starting from the higher wavelength side since
prominent peaks are seen there. Hence the second model corre-
sponds to 1850–1700 nm; the third corresponds to 1850–1600
nm etc. until the 8th model which uses the 1850–1100 nm range.
The wavelength region (region of spectra having the strongest cor-
relations with the property of interest) corresponding to ‘the mod-
el’ (out of the final 8 models developed) that produces the least
errors (RMSEC and RMSEP) is selected and given much significance.
Therefore, the spectral region 1850–1800 nm (containing the bio-
marker peak 1836 nm) is chosen as the best wavelength region
for this study as is observed and confirmed in the results and dis-
cussion. If 100 or 50 wavelength region is added starting from the
shorter wavelength side, the performance of the models is compar-
atively low.
Optimum number of latent variables in model development
If the entire set of latent variables is used, there is no clear distinc-
tion between structure part (problem–dependent information) and
the noise. Hence to determine the optimum number of latent vari-
ables that is to be used in the regressionmodel, the variance ‘RMSEC
vs. number of latent variables’ is evaluated. The number of latent
variables corresponding to minimum RMSEC or maximum percent
variance is selected as the optimum number of latent variables.
Quantification of oil authenticity and adulteration
The results based on RMSEC and RMSEP of all the models ((8
models for 8 partitioned spectra � 7 pre-treatments) + 8 models
for untreated data = 64 models) of the various data pre treatments
are prepared and compared (not shown).
Table 1 shows the results of the models with least errors, opti-
mum number of latent variables and maximum coefficients of
determination for both full spectrum and partitioned spectrum of
each pretreatment. The first row of the table indicates the results
of the untreated data.
The model with the least errors in terms of RMSEC and RMSEP
of each data pre-treatment for full spectrum and partitioned spec-
trum is then compared with the corresponding results of the
untreated data (see Table 1). The first and second derivatives do
not improve the model performance when the RMSEC and RMSEP
values are compared to those collected from the untreated data.
In general, the values of the RMSEC of the models range from
0.00009% v/v (15 pt smoothing + mean centre of a model using
1850–1800 nm range) to 0.04435 (2nd derivative using the
1850–1600 nm range). Selecting an effective data pre-treatment is
problem dependent and depends on the wavelength range used in
the model calibration. If the full spectrum 1850–1100 nm is used,
the mean centre (RMSEC = 0.00028) performs almost equally with
the smoothing + mean centre (RMSEC = 0.00029) at 1850–1100
nm range, but higher than none or the untreated data (RMSEC =
0.00079). Here the number of latent variables required is 3. If the
sequential/partitioned spectrum is used, the lowest RMSEC
(0.00009) across the spectrum is generated using smoothing. (15
pt, order 2) + mean centre at a wavelength range of 1850–
1800 nm. Here the optimum number of latent variables is only 2.
The RMSEP of 15 pt smoothing + mean centre data pre-treat-
mentmethod shows a similar trend to that observed in correspond-
ing RMSEC. For the full spectrum (1850–1100 nm), the mean centre
method produces aminimumRMSEP of 0.00049. The lowest RMSEP
of 0.00016% v/v is observed using smoothing + mean centre for the
sequential spectrum in the 1850–1800 nm wavelength range. The
first and the second derivativemethods in this study performworse
where as all other pre-treatment methods generate almost equally
but comparatively low RMSEP values in all regions.
Table 1 also shows the ‘best model’ performance in terms of
RMSEC, RMSEP and the coefficient of determination (R2 cal. or R2
pred.).
For the full spectral approach at 1850–1100 nm, the mean cen-
tre pre-treatment provides the best model performance in terms of
RMSEC (0.00028), RMSEP (0.00049) and the number of latent vari-
ables (3) in the model. The R2 prediction value is 0.99894. The best
model for the sequential spectral approach is generated using
smoothing + mean centre in the 1850–1800 nm wavelength ranges
in terms of RMSEC (0.00009%), RMSEP (0.00016%), and the number
of latent variables (2). The R2 prediction is 0.99989. (Downey et al.
[9] could predict the adulterant content in extra virgin olive oils
from the eastern Mediterranean only with a standard error equal
to 0.8% w/w using 1st derivative data between 1100 and
2498 nm. Of the spectroscopic methods studied on oil adulteration
by Yang et al. [10] the selected Fourier transform-Raman spectros-
copy could give a correlation coefficient (R2 prediction) of 0.997
and a standard error of prediction of 1.72%.) As observed, the first
and second derivative methods do not have a better model perfor-
mance for both the approaches. This exception aside, the models
using sequential/partitioned spectrum produce lower RMSEC and
RMSEP values than those using the full spectrum (see Table 1). This
phenomenon suggests that the addition of synergic wavelengths
with predictive wavelengths enhances sample authenticity and
model performance. Even though the untreated, smoothing, mean
centre, autoscale, and smoothing + autoscale pre-treatment meth-
ods generate lower RMSEC and RMSEP values on the full spectrum
i.e. 1850–1100 nm range, the 15 pt, order 2 smoothing + mean cen-
tre data pre-treatment on the sequential spectrum 1850–1800 nm
range is preferred because it generates the lowest RMSEC by using
minimum number of factors (2) in order to obtain the lowest
RMSEP (% v/v) with the highest R2 pred. (0.99989) value.
To ensure and verify that the ‘bestmodel’ chosen (NIR sequential
spectrum in the 1850–1800 nmwavelength range using smoothing
(15 window pt, order – 2) + mean centre data pre-treatment) is able
to describe and predict the newdata, the residual variancemust also
be considered. To determine the optimal number of latent variables
to be included in the best model, the lowest RMSEC value is consid-
ered. Accordingly, for this particularmodel, the optimumnumber of
latent variables is 2. The cumulative variance for the first two latent
variables is 99.99%; LV1 explains 99.72% and LV2 explains 0.27%.
Bias = 0; Prediction Bias = ÿ3.4003eÿ005.
Fig. 3 confirms the accuracy and robust performance of the best
model.
The coefficient of determination is the intensity measure of the
correlation between the measured values and the values predicted
by the chosen model. The model presented here has a very good
correlation with R2 pred. equal to 0.99989, showing a good linear
fit (the closer the R2 pred. value to +1, the higher the correlation
and hence the accuracy).
Locally weighted regression (LWR)
LWR is performed on the NIR spectral data in the 1850–1800
nm wavelength range by using smoothing (15 window pt, order
S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573 571
2) + mean centre data pre treatment. 22 samples are taken as train-
ing set and another 22 independent samples (new raw data) as test
set. The model is built from local PLS algorithm. The number of
principal components used in the model is 2 since it provides the
lowest RMSEC value; the number of similar samples selected from
the calibration set for the local regression model is 7 because of the
maximum percent variance and minimum errors when 7 samples
are selected.
When the model is built, the first two principal components ac-
count for 99.99% of the total variance; component 1 explains
99.75% and component 2 explains 0.24%. The residual variance
RMSEC = 0.00004 (R2 cal. = 0.99999). Bias: ÿ2.31857eÿ006; Pre-
diction Bias:ÿ4.79679eÿ005.
The estimated Y values from the LWR model are basically iden-
tical to those of the PLS model. The low RMSEP (0.00022) and high
R2 (0.99982) indicate that the results of the LWR model are also
reliable.
Comparison of models developed
On closer inspection of both the models (PLSR and LWR),
in terms of RMSEC, RMSEP, concerned R2 values and reduced
complexity, the ‘best model’ to detect sample authenticity is the
PLSR model that uses sequential/partitioned spectrum approach
i.e. 1850–1800 nm wavelength range with biomarker peak
1836 nm using smoothing (window pt 15, order – 2) + mean centre
as data pre-treatment. The ‘best model’ developed in the present
study to detect and quantify adulteration in oils achieves a com-
plete accuracy in terms of evaluation tools better than the most
successful models available in scientific literatures.
Conclusion
Adulteration of high quality and high valued essential oils (for
example, sandalwood oils) is common problem affecting the qual-
ity and commercial value of the product. The overall results of this
study demonstrate the feasibility of utilizing NIR spectroscopy
associated to chemometric techniques to detect sample authentic-
ity and economic adulteration of sandalwood oils.
In the present research, a spectral data approach ‘full or sequen-
tial/partitioned spectrum’ is used in generating the robust regres-
sion model. The mean centre data pre-processing generates a low
error for full spectrum (1850–1100 nm) approach (RMSEC =
0.00028 and RMSEP = 0.00049) and the highest coefficient of deter-
mination (R2 = 0.99894). Using the sequential/partitioned spectral
approach, a data set containing the 1850–1800 nm wavelength
region with the smoothing (Savitzky–Golay) + mean centre pre-
treatment generates the ‘best model’ to determine sample authen-
ticity and to quantify the adulteration with very much less than
1% error (RMSEC = 0.00009%, RMSEP = 0.00016%, R2 = 0.99989.).
The nonlinear method LWR for multivariate y using the local
PLS algorithm is employed to detect and quantify the adulteration
and to compare with the PLS model. This nonlinear model is also
stable, defined by two principal components with low errors and
high prediction values.
In brief, the results from this study (Figs. 1–3, and Table 1 can be
chosen as a basis for a new exercise) indicate that it is possible to
detect sample authenticity and counterfeits by exploring near
infrared spectral data in the wavelength range 1850–1800 nmwith
prominent biomarker peak 1836 nm that are attributed to methy-
lene, methyl and ethenyl asymmetric stretching (1st overtone C–H
stretch bands) of sandalwood oil or any other essential oil samples.
Acknowledgements
The authors wish to thank the efforts of Mr. Joseph Kuriakose,
Bloomfield, Gillen, Alice spring, NT 0870, Australia for his contribu-
tions to this manuscript. The authors are also grateful to Mr. Sony
George, Research scholar, Department of Photonics, Cochin Univer-
sity of Science and Technology (CUSAT), Kochi, Kerala, India for the
technical assistance and help. The authors also express their sin-
cere thanks to Manonmaniam Sundaranar University, Thirunelveli,
Table 1
Model performance for full and sequential spectrum wavelength data for each data pre-treatment. Best models are in bold. Feasibility of using near infrared spectroscopy to
detect and quantify an adulterant in a high quality sandalwood oil.
Data pretreatment Full spectrum Partitioned/sequential spectrum
Wavelength
range (nm)
aL V bRMSEC (%v/v)d(R2 cal.)
cRMSEP (%v/v)e(R2 pred.)
Wavelength
range (nm)
aL V bRMSEC (%v/v)
(R2 cal.)
cRMSEP (%v/v)
(R2 pred.)
None/untreated 1850–1100 3 0.00079 (0.99635) 0.00131 (0.99194) 1850–1700 3 0.00025 (0.99964) 0.00021 (0.99987)
1st Deri. 1850–1100 3 0.04380 (0.20036) 0.04379 (0.25413) 1850–1600 3 0.04380 (0.21798) 0.04379 (0.26757)
2nd Deri. 1850–1100 3 0.04435 (0.20736) 0.04433 (0.26782) 1850–1600 2 0.04435 (0.21672) 0.04433 (0.26506)
Smoothing 1850–1100 4 0.00079 (0.99636) 0.00132 (0.99193) 1850–1800 2 0.00062 (0.99988) 0.00069 (0.99987)
Mean centre 1850–1100 3 0.00028 (0.99953) 0.00049 (0.99894) 1850–1800 2 0.00010 (0.99993) 0.00017 (0.99987)
Autoscale 1850–1100 3 0.00032 (0.99939) 0.00049 (0.99896) 1850–1800 2 0.00011 (0.99993) 0.00018 (0.99986)
Smoothing + mean
centre
1850–1100 3 0.00029 (0.99952) 0.00050 (0.99893) 1850–1800 2 0.00009 (0.99995) 0.00016 (0.99989)
Smoothing + autoscale 1850–1100 3 0.00032 (0.99939) 0.00053 (0.99891) 1850–1800 2 0.00013 (0.99992) 0.00019 (0.99980)
a LV – Latent Variables.b RMSEC – Root Mean Square Error of Calibration.c RMSEP – Root Mean Square Error of Prediction.d R2 cal. – Coefficient of Determination for Calibration.e R2 pred. – Coefficient of Determination for Prediction.
Fig. 3. Measured versus predicted adulteration levels; PLSR model with 2 factors
showing the prediction data (pre treatment smoothing + mean centre, spectrum
1850–1800 nm range).
572 S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573
Tamil Nadu, India for having given an opportunity to do the
research.
References
[1] C.G. Jones, E.L. Ghisalberti, J.A. Plummer, E.L. Barbour, Phytochemistry 67(2006) 2463–2468.
[2] P.D.R. Herbal, Sandalwood, Santalum album, PDR for Herbal Medicine, third ed.,Medical Economics Company, Montvale, NJ, 2004.
[3] Anonymous, Sandalwood, in: C. Dombek (Ed.), The Review of NaturalProducts, Wolters Kluwer Co., St. Louis, MO, 1993.
[4] B.M. Lawrence, Recent progress in essential oils, Perfum. Flavor. 16 (1991) 49–58.
[5] D.P. Anonis, Sandalwood and sandalwood compounds, Perfum. Flavor. 23(1998) 19–20.
[6] R.E. Naipawer, in: B.M. Lawrence, B.D. Mookherjee, B.J. Willis (Eds.), Flavorsand fragrances: A world Perspective Proceedings of the 10th InternationalConference of Essential Oils, Flavor fragrance, Elsevier, Amsterdam, 1988, p.805.
[7] A.A. Christy, S. Kasemsumran, Y.P. Du, Y. Ozaki, Anal. Sci. 20 (2004) 935–940.[8] I.J. Wesley, R.J. Barnes, A.E.J. Mc Gill, J. Am. Oil Chem. Soc. 72 (1995) 289–292.[9] G. Downey, P. Mc Intyre, A.N. Davies, J. Agric. Food Chem. 50 (2002) 5520–
5525.[10] H. Yang, J. Irudayaraj, J. Am. Oil Chem. Soc. 78 (2001) 889–895.[11] S. Kuriakose, T. Xavier, Hubert Joe, V. Venkataraman, Analyst 135 (2010) 2676–
2681.[12] S. Kuriakose, Hubert Joe, Food Chem. 135 (2012) 213–218.[13] R.G. Brereton, Chemometrics Data Analysis for the Laboratory and Chemical
Plant, Wiley, Chichester, 2003.[14] B.M. Wise, N.B. Gallagher, R. Bro, J.M. Shaver, PLS Toolbox 7.0.3, Eigenvector
Research, 2013.[15] MATLAB 7.10.0.499 (R 2010a) MathWorks Inc., Natick, USA 2010.[16] B.G. Osborn, T. Fearn, P.H. Hindle, Practical NIR Spectroscopy with Applications
in Food and Beverage Analysis, in: Conference, second ed., Longman Scientificand Technical, Harlow, Essex, UK, 1993.
[17] J. Workman Jr., L. Weyer, Practical Guide to Interpretive Near-InfraredSpectroscopy, CRC Press, Boca Raton, 2007.
[18] Norgaard et al., Appl. Spec. 54 (3) (2000) 413–420.
S. Kuriakose, I.H. Joe / Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 115 (2013) 568–573 573