A spatially weighted principal component analysis for ... · A spatially weighted principal...

8
A spatially weighted principal component analysis for multi-element geochemical data for mapping locations of felsic intrusions in the Gejiu mineral district of Yunnan, China Qiuming Cheng a,c,n , Greame Bonham-Carter b , Wenlei Wang a , Shengyuan Zhang a,d , Wenchang Li e , Xia Qinglin c a Department of Earth and Space Science and Engineering, Department of Geography, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3 b Geological Survey of Canada, 601 Booth St., Ottawa, Ontario, Canada K1A 0E8 c State Key Lab of Geological Processes and Mineral Resources, China University of Geosciences, China d Institute of Resources and Environment, Shijiangzhuang University of Economics, China e Yunnan Academy of Geological Survey, Kunming 650051, Yunnan, China article info Article history: Received 12 December 2009 Received in revised form 7 October 2010 Accepted 2 November 2010 Keywords: Fuzzy mask Geochemical data Spatially weighted principal component analysis Stream sediment samples Tin mineralization abstract Principal component analysis (PCA) is frequently used in geosciences for information extraction. In many applications, masking PCA has been used to create subsets of samples or sub-areas to enhance the effect of the main objects of interest. In this paper we suggest how the representativeness of samples or pixels can be quantified using a fuzzy membership function based on fuzzy set theory. In this new method, the relative importance of pixels or samples can be taken into account using a multivariate statistical method such as PCA. A Fuzzy Masking PCA is proposed and implemented in GeoDAS GIS on the basis of a spatially weighted PCA (SWPCA). This paper introduces the mathematical treatment of the fuzzy masking PCA and follows a case study of identifying the locations of intrusive bodies from geochemical data in the Gejiu mineral district in Yunnan, China. Power-law functions based on the inverse distance from mapped felsic intrusions are applied as weighting functions in FMPCA. The results indicate that fuzzy mask PCA increases the signal-noise ratio of the component representing igneous intrusions and decreases the influence of sedimentary rocks. The areas delineated as potential areas for new intrusions (including buried intrusions) are valuable guides for Sn mineral prospecting. & 2010 Elsevier Ltd. All rights reserved. 1. Introduction Principal component analysis (PCA) has become a standard statistical approach for image processing and geochemical data analysis for the following two reasons: (1) to reduce the number of correlated image bands or variables, forming a small number of uncorrelated principal components that represent most of the variability carried by the multiple image bands or variables, and (2) to enhance the interpretability of the components as combina- tions of multiple bands or variables (Cheng et al., 2006). PCA has been frequently applied in processing geochemical and other types of geoscience data (e.g. Grunsky, 1997; Harris et al., 1997; Chandrjith et al., 2001; Garrett and Grunsky, 2001; Cheng et al., 2006). Several varieties of PCA can be found in the literature, including PCA applied to raw data, to pre-processed data (Xu and Cheng, 2001) and to subsets of data with masks (Ma et al., 1990). The foundation of PCA is the correlation (covariance) matrix, which measures the interrelationships among multiple image bands (variables). The concepts of PCA and its relevant terminol- ogies can be found in many references, including the book by Davis (2002). When applying PCA to spatial data in geographic informa- tion systems (GIS) and in image processing, a number of improve- ments and modifications can be applied to the definition of the correlation (covariance) matrix (Cheng, 1999, 2002, 2007; Cheng et al., 2006). For example, Cheng (2002) shows how a high-order correlation coefficient can be defined on the basis of multifractal modeling and Cheng et al. (2006) proposes that this correlation coefficient can be used to construct a correlation matrix with an optimum order property. Other examples of attempting to enhance the principal components include the use of band or variable ratios to enhance the information from particular bands (Frazer and Green, 1987), and the application of masking techniques to restrict the analysis to subgroups of pixels (e.g., Ma et al., 1990; Cheng, 1997, 2000). Specific multispectral bands have also been chosen to contain the feature information of the targets (e.g., Chavez and Kwarteng, 1989; Crosta and Moore, 1989). In order to optimize feature detection and to avoid using irrelevant pixels in the Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/cageo Computers & Geosciences 0098-3004/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cageo.2010.11.001 n Corresponding author at: Department of Earth and Space Science and Engineer- ing, Department of Geography, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3. E-mail address: [email protected] (Q. Cheng). Please cite this article as: Cheng, Q., et al., A spatially weighted principal component analysis for multi-element geochemical data for mapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.2010.11.001 Computers & Geosciences ] (]]]]) ]]]]]]

Transcript of A spatially weighted principal component analysis for ... · A spatially weighted principal...

Page 1: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Computers & Geosciences ] (]]]]) ]]]–]]]

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

n Corr

ing, Dep

Canada

E-m

Pleasmap

journal homepage: www.elsevier.com/locate/cageo

A spatially weighted principal component analysis for multi-elementgeochemical data for mapping locations of felsic intrusions inthe Gejiu mineral district of Yunnan, China

Qiuming Cheng a,c,n, Greame Bonham-Carter b, Wenlei Wang a, Shengyuan Zhang a,d,Wenchang Li e, Xia Qinglin c

a Department of Earth and Space Science and Engineering, Department of Geography, York University, 4700 Keele Street, Toronto, Ontario, Canada M3J 1P3b Geological Survey of Canada, 601 Booth St., Ottawa, Ontario, Canada K1A 0E8c State Key Lab of Geological Processes and Mineral Resources, China University of Geosciences, Chinad Institute of Resources and Environment, Shijiangzhuang University of Economics, Chinae Yunnan Academy of Geological Survey, Kunming 650051, Yunnan, China

a r t i c l e i n f o

Article history:

Received 12 December 2009

Received in revised form

7 October 2010

Accepted 2 November 2010

Keywords:

Fuzzy mask

Geochemical data

Spatially weighted principal component

analysis

Stream sediment samples

Tin mineralization

04/$ - see front matter & 2010 Elsevier Ltd. A

016/j.cageo.2010.11.001

esponding author at: Department of Earth and

artment of Geography, York University, 4700 K

M3J 1P3.

ail address: [email protected] (Q. Cheng).

e cite this article as: Cheng, Q., et alping.... Computers and Geosciences (

a b s t r a c t

Principal component analysis (PCA) is frequently used in geosciences for information extraction. In many

applications, masking PCA has been used to create subsets of samples or sub-areas to enhance the effect of

the main objects of interest. In this paper we suggest how the representativeness of samples or pixels can

be quantified using a fuzzy membership function based on fuzzy set theory. In this new method, the

relative importance of pixels or samples can be taken into account using a multivariate statistical method

such as PCA. A Fuzzy Masking PCA is proposed and implemented in GeoDAS GIS on the basis of a spatially

weighted PCA (SWPCA). This paper introduces the mathematical treatment of the fuzzy masking PCA and

follows a case study of identifying the locations of intrusive bodies from geochemical data in the Gejiu

mineral district in Yunnan, China. Power-law functions based on the inverse distance from mapped felsic

intrusions are applied as weighting functions in FMPCA. The results indicate that fuzzy mask PCA

increases the signal-noise ratio of the component representing igneous intrusions and decreases the

influence of sedimentary rocks. The areas delineated as potential areas for new intrusions (including

buried intrusions) are valuable guides for Sn mineral prospecting.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Principal component analysis (PCA) has become a standardstatistical approach for image processing and geochemical dataanalysis for the following two reasons: (1) to reduce the number ofcorrelated image bands or variables, forming a small number ofuncorrelated principal components that represent most of thevariability carried by the multiple image bands or variables, and(2) to enhance the interpretability of the components as combina-tions of multiple bands or variables (Cheng et al., 2006). PCA hasbeen frequently applied in processing geochemical and other typesof geoscience data (e.g. Grunsky, 1997; Harris et al., 1997;Chandrjith et al., 2001; Garrett and Grunsky, 2001; Cheng et al.,2006). Several varieties of PCA can be found in the literature,including PCA applied to raw data, to pre-processed data (Xu andCheng, 2001) and to subsets of data with masks (Ma et al., 1990).

ll rights reserved.

Space Science and Engineer-

eele Street, Toronto, Ontario,

., A spatially weighted princ2010), doi:10.1016/j.cageo.

The foundation of PCA is the correlation (covariance) matrix,which measures the interrelationships among multiple imagebands (variables). The concepts of PCA and its relevant terminol-ogies can be found in many references, including the book by Davis(2002). When applying PCA to spatial data in geographic informa-tion systems (GIS) and in image processing, a number of improve-ments and modifications can be applied to the definition of thecorrelation (covariance) matrix (Cheng, 1999, 2002, 2007; Chenget al., 2006). For example, Cheng (2002) shows how a high-ordercorrelation coefficient can be defined on the basis of multifractalmodeling and Cheng et al. (2006) proposes that this correlationcoefficient can be used to construct a correlation matrix with anoptimum order property. Other examples of attempting to enhancethe principal components include the use of band or variable ratiosto enhance the information from particular bands (Frazer andGreen, 1987), and the application of masking techniques to restrictthe analysis to subgroups of pixels (e.g., Ma et al., 1990; Cheng,1997, 2000). Specific multispectral bands have also been chosen tocontain the feature information of the targets (e.g., Chavez andKwarteng, 1989; Crosta and Moore, 1989). In order to optimizefeature detection and to avoid using irrelevant pixels in the

ipal component analysis for multi-element geochemical data for2010.11.001

Page 2: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]]2

calculation, masks are often included in PCA. For example, in orderto identify alteration related to mineralization, masks can be usedto eliminate pixels covered by water, snow and ice, or heavyvegetation (Ma et al., 1990; Cheng, 1997, 2000). Other types ofmasks include desert masks (Olson et al., 1983), cloud masks(Saunders and Kriebel, 1998), and sunlight masks (Arino et al.,1993). Most of these masks are binary, and PCA is calculated eitherwith or without certain pixels. However, there are some practicalsituations where the inclusion of a pixel cannot be decided on thebasis of a ‘‘yes or no’’ scheme. For example, when vegetation coverin an image is used to identify alteration related to mineralization,the proportion of vegetation cover in a pixel can be used to definewhether the pixel is fully or only partially occupied by vegetation. Abinary mask for this situation is usually defined with a thresholdvalue of vegetation coverage. The decision on the threshold oftenaffects the definition of the mask. If the threshold is set too high, themask will be too small, and if the threshold is set too low, the maskwill be too large. Alternatively, a fuzzy mask can be utilized torepresent the relative importance of pixels on the basis of vegeta-tion coverage. To implement the fuzzy mask in PCA, the mathe-matical model needs to be altered. Construction of fuzzy masks isalso a necessary task in the implementation of fuzzy mask PCA(FMPCA). This paper will introduce a new way to implementFMPCA. The method will be validated using a case study of mappinglocations of felsic igneous intrusions, an essential factor for Sn–Cumineralization, using stream sediment geochemical data in theGejiu mineral district in southern Yunnan, China.

2. Fuzzy mask PCA

Using multivariate statistical analysis methods such as PCA tocharacterize spatial patterns and extract components often requiresselection of samples that are representative of certain populations. Forexample, if the study is mainly for characterizing certain rock types,pixels should be chosen from the areas where these types of rocks arelikely to occur. The pixels in the areas where the rock types are knownnot to occur should be eliminated from the PCA model in order toreduce the influence of noise from irrelevant pixels and a binary maskis often used to deal with this situation. Most commercial GIS andimage processing packages have two standard methods for selectingpixels. The first method is to set a rectangular window so thatprocessing only applies to pixels within the window. The othermethod sets a polygonal mask, again restricting processing to pixelswithin the mask. Both of these methods use a binary mask with twovalues: 1 and 0. In the current paper, this idea will be extended toa more general form by weighting the contributions of the samples(or pixels) with a spatial weighting factor. A brief introduction to theprinciples of the method is given next.

Let A, B, and C be the images (variables) to be processed, andtheir values at location (i,j) are Aij, Bij, and Cij, respectively. Aweighting factor is defined as an image W (variable) with valuesfrom 0 to 1, with 0rWijr1 (Cheng, 2000). The weighted correla-tion coefficient between images A and B is defined as (Cheng, 2000)

RðA, BÞ ¼

PWijðAij�AÞðBij�BÞffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

WijðAij�AÞ2q ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiP

WijðBij�BÞ2q ð1Þ

where A and B are the weighted mean values of Aij and Bij

(A¼P

WijAij=P

Wij, B¼P

WijBij=P

Wij). The effect of applying theweight W to the correlation coefficient can be seen from the followingproperties:

1.

Pm

R(A,B) is a symmetrical correlation index,

2. �1rR(A,B)r1, and R(A,B)¼1 or –1 if and only if A and B have a

positive or negative linear relationship,

lease cite this article as: Cheng, Q., et al., A spatially weighted princapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

3.

ipa201

if Wij¼constant, R(A,B) reduces to the ordinary correlationcoefficient, and

4.

if Wij is a binary image with two values 1 and 0, then W isequivalent to the ordinary mask.

The first three properties ensure that definition (Eq. (1)) meetsthe basic properties required for a correlation coefficient. The lasttwo properties demonstrate that definition Eq. (1) is a generalizedform compared with the ordinary correlation coefficient and thecorrelation coefficient with a binary mask applied. The ordinarymask treatment becomes a special case of the spatial weightingapproach. Similarly, the eigenvalues, eigenvectors, and loadings ofeach image on all components can be calculated from the weightedcorrelation (covariance) matrix. Unlike traditional mask PCA,which excludes samples from not only the calculation of loadingsand components but also from calculation of scores, the newmethod only excludes the samples from the calculation of loadingsand components but not from calculation of scores. Spatial weightsare only applied to the determination of the eigenvectors. Aftereigenvectors determined then the scores of sample on the eigen-vectors are calculated as the projection of the original variablevalues on the eigenvectors. In this step, the scores of the images arecalculated without further use of spatial weights. For example, in aspecial case if ordinary mask (with 1 or 0 values) applied to PCA, theprinciple components (loadings and components) are calculatedaccording to the correlation (covariance) matrix constructed onlyusing a subset of samples with weighting factor value 1. Thereforethe components are contributed only by samples with weightingfactor value 1. Based on the component factors one can calculatethe projected values (scores) of all samples including those withweighting factor value 0 and not being included in the calculationof components. This way an entire scores map can be constructed.New method has been implemented in GeoDAS GIS (Cheng, 2000).Using GeoDAS, one can define a weighting image with valuesrepresenting the relative importance of pixel locations. The valueson the weighting image can be, for example, the distance from oredeposits, density of ore deposits, distance from contacts or faults,concentration values of trace elements, etc. The scores of images ona principal component could also be taken as a weighting factor.The spatially weighted PCA or FMPCA enhances the influence ofpixels with large weights (weights close to 1) and reduces the effectof pixels with small weights (weights close to 0). This method hasbeen used to analyze geochemical anomalies for mineral explora-tion (Bonham-Carter and Cheng, 2001). In the current paper, themethod will be used in a case study of mapping locations of igneousrocks related to Sn–Cu mineralization using stream sedimentgeochemical data in the Gejiu mineral district in southernYunnan, China.

3. Defining the fuzzy mask

Technically, defining a fuzzy mask on the basis of pixel locationcan be done using some basic GIS functions. For example, a fuzzymask can be defined according to the distance to objects of specialinterest such as environmental impact sites or mineral deposits. Ifthe study is related to such distances, then a fuzzy mask can beconstructed according to the distance from these point objects. Thecloser the pixels to these point sites, the closer the weights are to 1,and the farther the pixels are from these sites, the closer theweights are to 0. Certain decay functions can be applied to assignfuzzy weight values to distance. A fuzzy mask can also be definedaccording to a measure of density, such as deposit density andalteration intensity. Pixels located in an area with a high density ofmineral deposits or high intensity of alteration can be given high

l component analysis for multi-element geochemical data for0.11.001

Page 3: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]] 3

weights, and those located in areas with a low density of mineraldeposits or alteration intensity can be given low weights.

Besides defining fuzzy masks with respect to images, fuzzymask PCA can be applied to point sample data, with sample weightsbeing defined for point features instead of pixels. GeoDAS providestwo modes for implementing FMPCA: image and point modes, witheither images or tables of attribute data, respectively, being used torun PCA. One advantage of using images as inputs is that the imageresolution does not have to be uniform; another is that missingvalues for some variables, so often a problem in PCA of table data,can first be dealt with by interpolation of each variable to a grid.

4. Mapping the locations of igneous rocks related to Sn–Cumineralization in Gejiu mineral district, Yunnan, China

The area chosen for this study was the Gejiu mineral districtlocated in southern Yunnan, approximately 200 km south of thecity of Kunming, the capital of Yunnan Province (Fig. 1). The area isknown for its world-class Sn mineral deposits and Sn production. Alocation of study area and a simplified geology map is shown inFig. 1. The geological units in the main study area consist primarilyof a sequence of Paleozoic–Mesozoic sedimentary (Gejiu Formationand other formations) and igneous rocks. The Proterozoic low-grade metamorphic sand-shale rocks are mainly distributed in thesouthern part of the study area. The Paleozoic strata of carbonatesedimentary rocks are well developed with an extensive coveragein the study area. Mesozoic strata are poorly developed with alimited coverage. Tertiary strata are also scattered throughout thestudy area (Zhuang et al., 1996).

Two main types of igneous rocks are mapped: Paleozoic volcanicrocks and Mesozoic intrusive rocks. The former are mainly basalts,including the Ailaoshan basalts extensively spread over the studyarea (Zhuang et al., 1996). Mesozoic igneous rocks are mainlyintrusive rocks. The intrusions are dominated by granite andgranitoid rocks including biotitic granite, biotitic monzograniteand biotitic plagiogranite. Mafic and ultramafic intrusive rocks arescattered throughout the study area. There are several batholithoutcrops in the area. The Gejiu Batholith is a controlling factor for

Fig. 1. Geology of study area is simplified from four map sheets of 1:200,000 scales

(308 Geological Exploration Team, 1984). Legends: 1 – pink color represents

Proterozoic metamorphic rocks, 2 – yellow color for Gejiu Formation of Paleozoic

carbonate sedimentary rocks, 3 – grey color for other sedimentary rocks, 4 – brown

for mafic igneous rocks including basalts, 5 – red color for felsic intrusive rocks,

6 – black triangles and green circles, represent Sn and Cu mineral deposits and

mineral occurrences, and 7 – solid lines for faults systems, respectively. Star in the

inset China map indicates location of study area. (For interpretation of the

references to colour in this figure legend, the reader is referred to the web version

of this article).

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

Sn mineralization, and several large Sn mineral deposits are locatednear the Gejiu Batholith. The Gejiu Batholith is a granitoid complexlocated in the center of the study area with an outcrop area of about450 km2. The Gejiu Formation is dominated by limestone withminor dolomites and serves as the main country rock hosting mostof the discovered Sn deposits.

The area has a long history of tectonic and complex structuralactivity resulting in fault and fold systems at various scales. Themain faults and folds in the central parts of the study area have N–Sand E–W orientations. The main orientations of the faults and foldsin the western and eastern parts of the study area strike NE–SW orNW–SE. These fault systems control the general configuration ofthe mineralization and distribution of ore bodies in the area. Themain trend of the mineralization is in NNE–SSW orientation in thecentral area, but the ore fields are concentrated along the inter-sections of NNE–SSW and E–W faults.

The mineralization is associated with sedimentary countryrocks (Gejiu Formation), structures and igneous activities. The Snpolymetallic system of mineralization in the area is believed to berelated to Mesozoic intrusions including the granitoid Gejiu Bath-olith and Bainiuchang Batholith, which intruded into the foldedlimestone deposited during the Middle Triassic (e.g., the GejiuFormation). Enrichment of Sn, Cu and other metals occurred in andnear the contact zone between the granite and wall rock throughmetamorphism, contact-metasomatic and filling processes. Thetypes of ores found in the contact zone or in the wall rocksurrounding the batholith include skarn ores, interlayer ores inthe wall rocks, vein-type ores in the fractures and placer ores on thepaleosurface. Although granite intrusion-related mineralization isnot the only possible model for the concentration of Sn, theinfluence of intrusions (e.g., the Gejiu Batholith) on mineralizationis indisputable. Delineation and characterization of the contacts ofbatholiths are therefore important for Sn mineral exploration. Thegeneral geology and mineral deposits of the study area have beendescribed previously by Southwest Metallurgy ExplorationCompany (1984), Yu et al. (1988) and Zhuang et al. (1996).

The Gejiu area has been explored for Sn and Cu over the entirelast century, and significant exploration has been performed bylocal geologists since the 1950s. This exploration led to thediscovery of several large mineral deposits, including the Laochangand Kafang deposits in the central part of the study area. The areahas become well known throughout the world as a Sn mineralprovince.

The area chosen for this study is covered by about 7349 evenlydistributed stream sediment samples, each covering a 2 km�2 km(4 km2) area. The samples were collected and analyzed by the ChineseNational Geochemical Mapping Project as part of the RegionalGeochemistry National Reconnaissance (RGNR) Project, which wasinitiated in 1979 (Xie et al., 1997). For each sample, the concentrationsof 39 geochemical elements and 7 oxides were measured. The dataused in this paper are geochemical concentration values of 7 majoroxides. Further details about the sampling and analysis of the streamsediment data can be found in Xie et al. (1997). The trace elements andtheir associations with Sn mineralization in the area were pre-viously studied (Cheng, 2007; Cheng and Agterberg, 2009; Chenget al., 2009a, b, 2010). The main objective of the current study is todelineate felsic intrusions, including unknown intrusions, using thestream sediment geochemical data.

The Chinese literature includes substantial research andexploration efforts on the mineralogy, petrology, geochemistryand isotope geochemistry of the Gejiu area. An excellent studyusing a geophysical model for mineral exploration was carried outby Xiong and Shi (1994), who summarized the main geophysicalproperties of various rock types in the Gejiu area and proposed ageophysical-geological model for applying geophysical surveytechniques to map igneous intrusions and for delineating areas

ipal component analysis for multi-element geochemical data for2010.11.001

Page 4: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]]4

for mineral exploration. Cheng et al. (2009a, b) used trace elementdata and geophysical data (gravity and aeromagnetic data) tointerpret the subsurface extensions of the Gejiu Batholith. Thisstudy demonstrates how to use fuzzy mask PCA to process thestream sediment geochemical data to map felsic intrusions.

To reduce the compositional closure of geochemical data, thefollowing log-ratio transformations (e.g. Aitchison, 1986) wereapplied prior to PCA.

X�ij ¼ lnXij�ln 1=7X7

j ¼ 1

Xij

0@

1A ð2Þ

where Xij is the concentration of element oxide j for sample i, and X*ij

is the log-ratio transformed data. The correlation coefficients of thetransformed elements are shown in Table 1. Ordinary PCA wasapplied to the log-ratio transformed data. Seven principal compo-nents (PCs), their component variances (eigenvalues), and theloadings of elements on each of the first three components werecalculated and are shown in Fig. 2(A), Table 2 and Fig. 2(B)–(D),respectively.

The first three components account for about 80% of the totalvariance of the data. The general element loadings on the first threecomponents show that they mainly represent different rock types.For example, both loadings and the spatial distribution of thecomponent scores suggest that the first two components (Fig. 2Band C) mainly represent the sedimentary rocks, and metamorphicand volcanic rocks. Since these components are not the maininterests of the study the results are not shown here. The third

Table 1Correlation coefficients of log-ratio transformed element oxides.

Si2O Na2O MgO K2O Fe2O3 CaO Al2O3

Si2O 1 0.00 0.16 0.39 �0.87 0.35 0.86

Na2O 0.00 1 0.43 0.35 0.00 0.30 0.00

MgO 0.16 0.43 1 0.33 0.18 0.54 0.01

K2O 0.39 0.35 0.34 1 0.43 0.04 0.31

Fe2O3 0.87 0.09 0.18 0.42 1 0.26 0.74

CaO 0.35 0.30 0.54 0.04 0.26 1 0.11

Al2O3 0.86 0.09 0.01 0.31 0.74 0.11 1

Loadings on PCA2

Al2O3

CaO

FeO

K2O

MgONa2O

Si2O

Fig. 2. Results obtained by ordinary PCA to seven log-ratio transformed element oxide

logarithmic; and (B)–(D) Loadings on first, second and third components, respectively.

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

component (Fig. 2D), that accounts for about 10% of the totalvariance, is dominated by Na2O, K2O and Al2O3, which havenegative loadings, and CaO, which has a positive loading. Thiscomponent mainly represents felsic intrusions. The distributions ofnegative scores on the third component are shown in Fig. 3. Theareas with large negative score values represent not only themapped intrusions on the ground but also areas with possibleunknown intrusions, including buried intrusions.

From these results one can see that the third component mainlyrepresenting the igneous intrusions only accounts about 10% of thetotal variance, whereas the first and the second componentsaccount for 43% and 28%, respectively. This confirms that the mainrock types in terms of area proportion of the total study area aresedimentary rock and metamorphic and volcanic rocks. Since themain goal of this study is to characterize and to identify spatialdistribution of igneous intrusions, it is necessary to enhance therelative importance of the third component and reduce theinfluences of the other rock types. Therefore, weighting functionscan be applied to the samples according to their relationship toigneous intrusions. Several masks were defined and accordinglyweighting functions were used as fuzzy masks in spatiallyweighted PCA. First a binary mask gives all samples fall in theoutcropped igneous intrusions weight 1 and rest of the samplesweight 0. The effect of the binary mask is that the samples with zeroweight are not included in the calculation of the covariance matrixand the loadings. Considering the influence of igneous intrusions,especially shallowly buried intrusions, may be beyond the mappedoutcrops, the areas closer to intrusions are more likely to be

Loadings on PCA1

Al2O3

CaO

FeO

K2O

MgO

Na2O

Si2O

Loadings on PCA3

Si2O

Na2O

MgO

K2O

FeO

CaO

Al2O3

5 6

61

7

s. (A) Eigenvalues corresponding to seven principal components. Vertical scale is

Loadings plotted in (D) are reversed for comparison.

Table 2Loadings on principal components.

Component Loading PCA1 PCA2 PCA3

Si2O 0.56 �0.04 0.14

Na2O 0.04 0.53 �0.26

MgO �0.08 0.60 0.07

K2O 0.30 0.37 �0.60

Fe2O3 �0.53 �0.00 �0.08

CaO �0.22 0.46 0.60

Al2O3 �0.50 �0.08 �0.41

ipal component analysis for multi-element geochemical data for2010.11.001

Page 5: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Fig. 3. Scores of samples on third principal component. Areas with negative high

values of scores or yellow to red coloured areas highlight locations of felsic

intrusions. Black polygons are mapped intrusions. (For interpretation of the

references to colour in this figure legend, the reader is referred to the web version

of this article).

Fig. 4. (A) Weighting functions defined with distance from intrusions as variable.

Functions are, w¼(1�d/66)�b, where d is distance from intrusions, and b¼0, 1, 8,

16, N, respectively. (B) Eigenvalues obtained using SWPCA with weighting

functions as shown in (A).

Loadings on PCA3

Si2O

Na2O

MgO

K2O

Fe2O3

CaO

Al2O3

Fig. 5. Loadings on third principal component calculated using SWPCA with various

weighting functions defined in (A).

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]] 5

affected by intrusions than areas farther away. Therefore, thedistance from intrusions could be a factor to represent the relativeimportance of samples in support of identifying both known andshallowly buried igneous intrusions. A weighting scheme wastaken to define weighting functions on the basis of distance fromintrusions. For comparison, we used the ‘no mask’ case and a‘binary mask’ case as references to evaluate the effectiveness oflinear and non-linear distance-based weighting functions.

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

Considering that the influence of heat from intrusions usuallydecays exponentially as distance from the intrusions, here wesuggest the use of power-law functions to define weighting factors.As shown in Fig. 4, the weighting functions are expressed as power-law functions of the distance d from the sample (in pixels) to theintrusions: w¼(1�d/66)b, where d is the distance from theintrusions with a maximum range of 66 km and b is the exponentdetermining the decay rate of the power-law function. For example,the following five decay functions were defined with differentexponents: (1) a constant function with w¼1 and b¼0; (2) a lineardecay function with value 0rwr1 and b¼1; (3) and (4) a non-linear function from 1 to 0 following power-law relations and b¼8and 16, respectively; and (5) a binary function where 1 representsbeing within an intrusion and 0 represents being outside of anintrusion, which corresponds to b¼N. As the value of b increases,the weighting function for the intensity of the influence ofintrusions increases. These weighting functions are used in FMPCAto calculate the correlation coefficient matrix and then theeigenvalues and component loadings. Therefore, the eigenvaluesand loadings of elements on principal components are influencedby the mask. However, using these components to recalculate theprojected scores of the samples on these components will no longeruse the mask. The use of the mask affects only the determination ofthe combination of components (loadings) and their relativeimportance (eigenvalues) and does not affect the calculation ofscores. This effect is the main difference between the fuzzy maskPCA introduced in the current paper and the ordinary mask used intraditional image processing. In traditional image processing, themask removes samples from all aspects of data processing.

The results obtained using the binary masks are shown inFigs. 3–6. Fig. 4(A) shows the weighting functions, and Fig. 4(B)displays the eigenvalues of FMPCA with various weighting func-tions applied to the samples. Fig. 5 shows the loadings of elementson the third component (the felsic intrusions component). Fig. 6displays the distributions of scores of samples on the thirdcomponent (Fig. 5). The variances (eigenvalues) of the thirdcomponent with or without the masks are significantly different.For example, the second and third eigenvalues calculated without amask is 2.01 and 0.78, respectively, but the second and thirdeigenvalues obtained with a binary mask become 1.56 and 1.35,respectively (Fig. 4b). The eigenvalues of the third component wasincreased; those of the second component decreased, but firstcomponents calculated with the binary mask were slightlyreduced. This result indicates that the main influence of the binarymask is an increase in the third component and decrease in the firsttwo main components especially the second component, whichconfirms that the third component is associated with felsicintrusions. The eigenvalues calculated using linear weighting

ipal component analysis for multi-element geochemical data for2010.11.001

Page 6: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Fig. 7. Scores of samples on third principal component calculated using non-linear

weighting function (b¼16). Areas with negative high values of scores or yellow to

red coloured areas highlight locations of felsic intrusions. Black polygons are

mapped intrusions. (For interpretation of the references to colour in this figure

legend, the reader is referred to the web version of this article).

Fig. 8. Differences between scores on third principal component calculated using

non-linear weighting function (b¼16) and using ordinary PCS without mask. Black

polygons are mapped intrusions.

Fig. 6. Scores of samples on third principal component calculated using SWPCA

with binary mask. Areas with negative high values of scores or yellow to red

coloured areas highlight locations of felsic intrusions. Black polygons are mapped

intrusions. (For interpretation of the references to colour in this figure legend, the

reader is referred to the web version of this article).

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]]6

functions are similar to those calculated without a mask, implyingthat the linear weighting function has an insignificant influence onthe results. The loadings of the elements on the third componentcalculated with non-linear weighting functions show systematicchanges, and even the signs of the loadings are changed. Forexample, the loading signs of Fe2O3 and CaO changed between themask-less PCA and the binary mask PCA. The composites of thirdcomponents calculated with or without fuzzy masks are Na2O, K2Oand Al2O3 with positive loadings and CaO, MgO, Fe2O3 and Si2Owith negative loadings. The reason for Si2O showing negativeloading might be due to multiple sources such as granites andsedimentary rocks (silt and sand stones and river fluvial deposits).The composites of third component clearly indicate the factor ofgranitic intrusions. In term of influence of fuzzy masks we canfurther check the relative changes of loadings of elements on thethird component in the following sections.

The spatial distribution of scores on the third componentcalculated using a binary mask (Fig. 6) shows that the areas withhigh scores represent the locations of felsic intrusions. The resultsobtained using a binary mask highlight most of the knownintrusions and some areas in the metamorphic rocks (Figs. 3 and6). The coverage of the high scores is relatively small comparedwith that delineated on the scores obtained using the ordinary PCAwithout a mask.

In order to compare the effectiveness of other types of weightingfactors defined and shown in Fig. 4(A), FMPCA was applied to thesame log-ratio transformed data. The loadings obtained with andwithout weighting factors are shown in Fig. 6. When the decayintensity of the weighting function is increased from a constantfunction (b¼0) to more rapid decay rates (b¼8, 16, N), theloadings increase positively for K2O, Na2O and Al2O3, increasenegatively for Si2O, MgO and Fe2O3, and decrease for CaO (Fig. 4A).These changes demonstrate that the non-linear weighting func-tions defined in Fig. 4(A) enhance the patterns indicating felsicintrusions and reduce the influence of sedimentary carbonaterocks. The scores on the third component (Fig. 7) calculated usingthe non-linear weighting functions also confirm that enhancedscores are mainly in the northern part of the study area where thesedimentary rocks of the Gejiu Formation dominate.

The differences between the scores calculated with or withoutthe non-linear mask (Fig. 8) illustrate that in the northern areasscores are enhanced, whereas in the rest of study area, the scoresare negative or depleted. Therefore, SWPCA with a non-linearweighting function can provide more information for delineation ofintrusions, especially in areas with the sedimentary rocks of the

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

Gejiu Formation. This is significant for mineral exploration in thearea since felsic intrusions in the Gejiu Formation are essential forSn mineralization. Most of the large Sn mineral deposits have beenfound in the outer contact zones of the intrusions and the GejiuFormation.

To interpret the spatial associations between the score dis-tributions and other geological features, a comprehensive map wascreated with scores as background and other geological featuressuperimposed (Fig. 9). The top graph in Fig. 9 shows the high scorevalues in yellow to red colors. These areas mainly representintrusions. Most areas with high scores are surrounded by highfault density, but these faults do not pass through these ‘‘intru-sions’’. Most Sn minerals occur in the fault systems close tointersections with the ‘‘intrusions’’. When dragged over the digitalelevation model (DEM) at 90 m resolution, these ‘‘intrusions’’generally show positive topographical responses (bottom graphin Fig. 9). The 3D graph was created using ArcGIS 3D Analyst withcolors defined according to the values of scores as seen in the topgraph and the ground elevation from DEM.

5. Discussion and conclusions

It has been shown that spatially weighted PCA provides a newway of implementing PCA with constraints on samples according to

ipal component analysis for multi-element geochemical data for2010.11.001

Page 7: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Fig. 9. Top graph: superimposed with faults (black lines), Sn and Cu mineral

deposits (black triangles and light blue dots), and intrusions as polygons. (For

interpretation of the references to colour in this figure legend, the reader is referred

to the web version of this article). Bottom graph: scores map dragged over digital

elevation model DEM. Black lines are faults and polygons are intrusions. DEM data is

in 90 m resolution.

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]] 7

their spatial properties in relation to the objective of study. Thedefinition and construction of weighting functions are flexibledepending upon the properties of samples. Also, fuzzy masks areaffected subjectively by user-defined weighting functions based onopinions or experience. Since the choice of weighting functions areso flexible it is impossible to thoroughly evaluate which weightingfunctions will be the most effective. One recommendation is todefine weighting functions based on physical meaning and theresults should be compared. In the case study introduced in thispaper, we defined a set of distance-based weighting functions andtheir effectiveness was compared by investigating the changes onrelative importance of components (eigenvalues), contribution ofvariables (loadings) and spatial distribution of scores. These areonly some examples of how the results can be evaluated.

Application of a distance-based fuzzy mask in conjunction withPCA in this study generated some interesting results showing goodfit to the known outcropped igneous intrusions, but more impor-tantly the potential distributions of igneous intrusions especiallywithin the Gejiu formation of sedimentary rocks. Although theseresults still need field verification by mapping and/or drilling in thecase of hidden intrusions, the results have already providedguidelines for further mapping. Some of the results have beenused by the local geological survey for planning further mappingand mineral prospecting activities.

This study has demonstrated that FMPCA can utilize weightingfunctions to represent the relative representativeness of samples orpixels. These weighting functions affect the calculation of thecorrelation coefficient matrix. The variances (eigenvalues) andloadings of components are calculated from the adjusted correla-tion coefficient matrix. Scores of samples or pixels are calculatedbased on the principal components. The FMPCA was applied to thecase study of mapping intrusions in Gejiu mineral district. Various

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

weighting functions were constructed based on the inverse dis-tance from mapped intrusions and validated by creating compo-nents for mapping locations of felsic intrusions, especiallyunknown intrusions. The defined non-linear decay functions wereeffective for enhancing the third component for mapping felsicintrusions. This technique enhanced the variance (eigenvalue) ofthe component, decreased the loadings of CaO, and increased theloadings for compounds associated with felsic intrusions (Si2O,Na2O, MgO, Fe2O3 and Al2O3). This method is expected to become acommon tool for analyzing geochemical data and other types ofimages. Factors such as inverse values of CaO, gravity field values,inverse scores of other components, and the density of faults can beused as weighting functions.

Application of statistical methods such as multivariate statisticalanalysis often involves experiments design and sampling. Some ofthe experimental designs are control designs so that the samplescollected have high quality and can be used equally in statisticalanalysis. But in other cases when the design involves naturalexperiments with uncontrolled factors and the sample collectionsare constrained so that some of the samples may not be represen-tative. In this case, one chooses a subset of samples and removes therest from the subsequent analysis. In general, if samples arerepresentative of the population of study then these samples shouldbe included in statistical analysis. One of the main advantages ofusing standard statistical analysis methods is its objective results.However, in many real-world applications, sample collections arelimited and constrained by factors such as availability and costs. Thesamples may have variable quality, for instance, some samples withpoor data quality or outliers have to be filtered from those used inthe statistical analysis. This is the reason that in practical workexploratory data analysis is often needed to explore the data beforethey can be used in multivariate analysis. Besides the data quality ofsamples, representativeness of samples for the primary objective isalso a consideration central to statistical analysis. The processes ofselecting samples can be objective or subjective depending uponmodeler’s knowledge about the samples. The results are influencedby the determination of the samples. To utilize properties ofsamples in statistical analysis is essential for enhancing the results.The fuzzy mask method introduced in this paper provides an optionfor users to define weights to represent the relative representa-tiveness of samples in statistical analysis. It is true that weightsdefined and applied to each sample will alter the calculation of thePCA, so that the results are influenced by how weights are defined.This might be one of the disadvantages of using fuzzy concepts instatistics: for example, the results obtained by fuzzy mask PCA areindeed influenced by the weighting function. This type of methodshould be used with caution and the weighting functions must bedefined meaningfully.

Acknowledgements

The authors sincerely thank the two anonymous reviewers fortheir constructive comments which have improved the manu-script. The research was financially supported by a DistinguishedYoung Researcher Grant (40525009), a Strategic Research Grant(40638041) awarded by the Natural Science Foundation of China, aHigh-Tech Research and Development Grant (2009AA06Z110,2008AA121103) by the Ministry of Science and Technology ofChina and Grants from Ministry of Education of China (Nos.IRT0755 and 104244).

References

Aitchison, J., 1986. The statistical analysis of compositional data. Chapman and Hall,London, New York, 416 pp.

ipal component analysis for multi-element geochemical data for2010.11.001

Page 8: A spatially weighted principal component analysis for ... · A spatially weighted principal component analysis for multi-element ... (or pixels) with a spatial weighting factor. A

Q. Cheng et al. / Computers & Geosciences ] (]]]]) ]]]–]]]8

Arino, O., Melinotte, J.M., Calabresi, G., 1993. Fire, cloud, land, water: the IoniaAVHRR CD-browser of ESRIN. Earth Observation Quarterly 41, 6–7.

Bonham-Carter, G., Cheng, Q., 2001. Spatially weighted principal componentanalysis. In: proceedings of the IAMG2001 Meeting, Cancun, Mexico, September6–12, 2001, 8 pp. on CD.

Chandrajith, R., Dissanayake, C.B., Tobschall, H.J., 2001. Application of multi-elementrelationships in stream sediments to mineral exploration: a case study ofWalawe Ganga Basin, Sri Lanka. Applied Geochemistry 16 (3), 339–350.

Chavez Jr., P., Kwarteng, A., 1989. Extracting spectral contrast in Landsat ThematicMapper image data using selective principal component analysis. Photogram-metric Engineering and Remote Sensing 65, 339–348.

Cheng, Q., 1997. Fractal/multifractal modeling and spatial analysis. In: Proceedingsof the International Association for Mathematical Geology (IAMG) Conference,22–27 September 1997, Barcelona, Spain (Barcelona: CIMNE), vol. 1, pp. 57–72.

Cheng, Q., 1999. Multifractality and spatial statistics. Computers and Geosciences25, 949–961.

Cheng, Q., 2000. GeoDAS Phase I: User’s Guide & Exercise Manual. Unpublishednotes, York University, 298 pp.

Cheng, Q., 2002. New versions of principal component analysis for imageenhancement and classification. Geoscience and remote sensing symposium,IGARSS ’02. In: Proceedings of the IEEE International, 24–28 June 2002, vol. 6,pp. 3372–3374.

Cheng, Q., 2007. Mapping singularities with stream sediment geochemical data forprediction of undiscovered mineral deposits in Gejiu, Yunnan Prov., China. OreGeology Reviews 32 (1–2), 314–324.

Cheng, Q., Agterberg, F.P., 2009. Singularity analysis of ore-mineral and toxic traceelements in stream sediments. Computers and Geosciences 35, 234–244.

Cheng, Q., Jing, L., Panahi, A., 2006. Principal component analysis with optimumorder sample correlation coefficient for image enhancement. InternationalJournal of Remote Sensing 27 (16), 3387–3401.

Cheng, Q., Xia, Q., Li, W., Zhang, S., Chen, Z., Zuo, R, Wang, W., 2010. Density/areapower-law models for separating multi-scale anomalies of ore and toxicelements in stream sediments in Gejiu mineral district, Yunnan Province, China.Biogeosciences 7, 3019–3025.

Cheng, Q., Zhao, P., Chen, J., Xia, Q., Chen, Z., Zhang, S., Xu, D., Xie, S., Wang, W., 2009a.Application of singularity in mineral deposit prediction in Gejiu district:information extraction. Earth Science 34 (2), 232–242 (In Chinese with Englishabstract).

Cheng, Q., Zhao, P., Zhang, S., Xia, Q., Chen, Z., Chen, J., Xu, D., Wang, W., 2009b.Application of singularity in mineral deposit prediction in Gejiu district:information integration and delineation of target areas. Earth Science 34 (2),243–252 (in Chinese with English abstract).

Please cite this article as: Cheng, Q., et al., A spatially weighted princmapping.... Computers and Geosciences (2010), doi:10.1016/j.cageo.

Crosta, A., Moore, J.Mc.M., 1989. Enhancement of Landsat Thematic Mapper imageryfor residual soil mapping in SW Minais Gerais State, Brazil: a prospecting casehistory in Greenstone belt terrain. In: Proceedings of the Seventh ERIM ThematicConference: Remote Sensing for Exploration Geology, 2–6 October 1989,Calgary, Alberta, Canada, pp. 1173–1187.

Davis, J.C., 2002. Statistics and Data Analysis in Geology, 3rd ed. John Wiley & SonsInc., New York, 550 pp.

Fraser, S.J., Green, A.A., 1987. A software defoliant for geological analysis of bandratios. Journal of Remote Sensing 8, 525–532.

Garrett, R.G., Grunsky, E.C., 2001. Weighted sums—knowledge based empiricalindices for use in exploration geochemistry. Geochemistry: Exploration Envir-onment Analysis 1, 135–141.

Grunsky, E.C., 1997. Strategies and methods for the interpretation of geochemicaldata. In: Current Topics in GIS and Integration of Exploration Datasets, ShortCourse, Exploration’97 Workshop, September 1997, 145 pp.

Harris, J.R. Grunsky, E.C., Wilkinson, L., 1997. Developments in the effective use oflithogeochemistry in regional exploration programs: application of GIS tech-nology. In: Gubins A.G. (Ed.), Proceedings of the Exploration’97, FourthDecennial International Conference on Mineral Exploration, pp. 285–292.

Ma, J.W., Slaney, V.R., Harris, J., Graham, B., Ballantyne, B.B., Harris, D.C., 1990. Use ofLandsat TM data for the mapping of limonitic and altered rocks in the SulphuretsArea, Central British Columbia. In: Proceedings of the 14th Canadian Symposiumon Remote Sensing, Calgary, Alberta, Canada (ISPRS), pp. 419–422.

Olson, J., Watts, J., Allison, L., 1983. Carbon in live vegetation of major worldecosystems. Report ORNL-5862, Oak Ridge National Laboratory, Oak Ridge, TN,397 pp.

Saunders, R.W., Kriebel, K.T., 1998. An improved method for detecting clear sky andcloudy radiances from AVHRR data. International Journal of Remote Sensing 9,123–150.

Southwest Metallurgy Exploration Company, 1984. Geology of tin mineral depositin Gejiu. Metallurgy Industry Press, Beijing, 127 pp.

Xie, X., Mu, X., Ren, T., 1997. Geochemical mapping in China. Journal of GeochemicalExploration 60 (1), 99–113.

Xiong, G., Shi, S., 1994. Physico-geologic model of the Gejiu Tin district and itsapplication. Geological Review 40 (1), 19–27.

Xu, Y., Cheng, Q., 2001. A fractal filtering technique for processing regionalgeochemical maps for mineral exploration. Geochemistry: Exploration, Envir-onment Analysis 1 (2), 147–156.

Yu, C.W., Tang, Y.J., Shi, P.F., 1988. Dynamic System of Gejiu Polymetallic Miner-alization Processes. Chinese University of Geosciences Press, Wuhan, 394 pp.(in Chinese).

Zhuang, Y., Wang, R., Yang, S., Yi, J., 1996. Geology of Gejiu Tin–Copper PolymetallicDeposit. Earthquake Publishing House, Beijing, China, 189 pp.

ipal component analysis for multi-element geochemical data for2010.11.001