A variable precision rough set approach to the remote sensing land use/cover classification

Computers & Geosciences 36 (2010) 1466–1473

Contents lists available at ScienceDirect

Computers & Geosciences

0098-30

doi:10.1

n Corr

E-m

journal homepage: www.elsevier.com/locate/cageo

A variable precision rough set approach to the remote sensing landuse/cover classification

Xin Pan a,b,c, Shuqing Zhang a,n, Huaiqing Zhang d, Xiaodong Na a,b, Xiaofeng Li a

a Northeast Institute of Geography and Agricultural Ecology, Chinese Academy of Sciences, Changchun 130012, Chinab Graduate University of Chinese Academy of Sciences, Beijing 100039, Chinac School of Electrical and Information Technology, Changchun Institute of Technology, Changchun 130012, Chinad Institute of Forest Resources Information, Chinese Academy of Forestry, Beijing 100091, China

a r t i c l e i n f o

Article history:

Received 16 May 2008

Received in revised form

13 October 2009

Accepted 22 November 2009

Keywords:

Remote sensing classification

Knowledge discovery

Overlapping data

Variable precision rough sets

VPRS

04/$ - see front matter & 2010 Elsevier Ltd. A

016/j.cageo.2009.11.010

esponding author.

ail address: [email protected] (S. Zh

a b s t r a c t

Nowadays the rough set method is receiving increasing attention in remote sensing classification

although one of the major drawbacks of the method is that it is too sensitive to the spectral confusion

between-class and spectral variation within-class. In this paper, a novel remote sensing classification

approach based on variable precision rough sets (VPRS) is proposed by relaxing subset operators

through the inclusion error b. The remote sensing classification algorithm based on VPRS includes three

steps: (1) spectral and textural information (or other input data) discretization, (2) feature selection,

and (3) classification rule extraction. The new method proposed here is tested with Landsat-5 TM data.

The experiment shows that admitting various inclusion errors b, can improve classification

performance including feature selection and generalization ability. The inclusion of b also prevents

the overfitting to the training data. With the inclusion of b, higher classification accuracy is obtained.

When b¼0 (i.e., the original rough set based classifier), overfitting to the training data occurs, with the

overall accuracy¼0.6778 and unrecognizable percentage¼12%. When b¼0.07, the highest classifica-

tion performance is reached with overall accuracy and unrecognizable percentage up to 0.8873% and

2.6%, respectively.

& 2010 Elsevier Ltd. All rights reserved.

1. Introduction

Land cover information has been identified as one of the crucialdata components for many aspects of global change studies andenvironmental applications. The development of remote sensingtechnology has increasingly facilitated the acquisition of suchinformation (Ouyang and Ma, 2006). How to extract accurate andtimely knowledge about land use/cover from remote sensingimagery relies upon not only the data quality and resolution, butalso the classification techniques used. Therefore, improvement ofremote sensing classification accuracy is always a concern. Muchdata mining technology, e.g., per-pixel based maximum likelihood,fuzzy classifications, object-oriented multi-resolution segmentation,artificial neural networks, decision tree-based classification, andrule-based classification, has been used in supervised or unsuper-vised remote sensing classification (Leung et al., 2007). But spectraluncertainty or vagueness caused by spectral confusion between-class and spectral variation within-class remains a challenge to thedata mining based remote sensing classification techniques.

Rough set theory (RST) proposed by Pawlak (1982) is an extensionof conventional set theory that supports approximations in decision

ll rights reserved.

ang).

making. It has also been conceived as a mathematical approach toanalyze and conceptualize various types of data, especially to dealwith vagueness or uncertainty (Pawlak, 1982, 1999, 2004). Rough settheory has been successfully used in diverse fields such as decisionsupport systems, machine learning, and automated knowledgeacquisition and so on (Han et al., 1993; Pawlak, 1999; Yasdi, 1996).

Ahlqvist et al. (2000, 2003) and Ahlqvist (2005) have applied therough set method for spatial classification and uncertainty analysis. Inremote sensing classification, rough set theory is an objective way tounravel decision rules from information systems with incomplete andqualitative data and provides an effective methodology to optimallyselect features (Leung et al., 2007). Pal and Mitra (2002) utilized roughsets to enhance an unsupervised method for remote sensing imageclassification, improving the speed of convergence and avoiding thelocal minimum problem. Ouyang and Ma (2006) introduced atolerant rough set neighborhood classifier for land cover classification.Their results show that a tolerant rough set neighborhood classifier issignificantly better than the minimum distance classifier (MDC).Leung et al. (2007) proposed a rough set remote sensing classificationapproach based on an interval-valued decision table. Their approachcan effectively discover the optimal spectral bands and optimal ruleset for a classification task in remotely sensed data. It is also capableof unraveling critical spectral band(s) discerning certain classes. Leiet al. (2007) put forward a discrete rough set method to extract themost useful features for classifying remotely sensed paddy fields.

www.elsevier.com/locate/cageo

dx.doi.org/10.1016/j.cageo.2009.11.010

mailto:[email protected]

dx.doi.org/10.1007/11526018_22.3d

X. Pan et al. / Computers & Geosciences 36 (2010) 1466–1473 1467

They demonstrated that the overall accuracy of the discrete rough setfeature extraction method was better than the conventional PCA.

The ability of the conventional rough set model in classificationand feature selection is based on the lower and upper approxima-tion (Shen and Jensen, 2007). But even a relatively small inclusionerror in a similarity class results in rejection of that class from thelower approximation. A small inclusion degree of error can alsolead to an excessive increase of the upper approximation (Rolkaet al., 2004). These properties can be important, especially in thecase of overlapping remote sensing data. A too small lowerapproximation and a too large upper approximation can lead toineffective feature selection and classification-rule discovery, andmay also result in the overfitting to training data.

The variable precision rough set model (VPRSM) (Ziarko, 1993,2001; Kryszkiewicz, 1995) is an expansion of the basic rough setmodel. It attempts to improve upon rough set theory by relaxingthe subset operator. It was proposed to analyze and identify datapatterns that represent statistical trends rather than functionalones. The main idea of variable precision rough sets (VPRS) is toallow objects to be classified with an error smaller than a certainpredefined level (Shen and Jensen, 2007).

This paper presents a VPRS approach to remote sensing dataclassification. The remainder of the paper is organized as follows: (1)preliminaries of original rough set theory and variable precisionrough set theory are introduced in Section 2, (2) discretization,feature selection, and rule extraction algorithm in Section 3, (3)experimental results in Sections 4, and (4) finally conclusions.

2. Rough sets and variable precision rough sets

In this section, we will review some basic concepts such asinformation systems, decision tables, original rough sets, and variableprecision rough sets.

2.1. Rough sets

According to Pawlak (1982, 1999, 2004), an information system(S) can be viewed as a table of data, consisting of objects (rows in thetable) and attributes. It can be defined by a pair S¼(U, A), where

(1)
U is a nonempty finite set of objects called the universe ofdiscourse,
(2)
A is a nonempty finite set of attributes, and (3) for every aAA, there is a mapping a: U-Va, where Va is called
the value set of a.
A decision table is an information system of the form S¼(U,
A[{d}), where deA is a distinguished attribute called a decisionattribute.

With any PDA[{d} there is an associated indistinguishablerelation IND(P) given by

INDðPÞ ¼ fðx,yÞAU298aAP, aðxÞ ¼ aðyÞg: ð1Þ

This corresponds to the indiscernible relation for which twoobjects are equivalent if and only if they have the same vectors ofattribute values for the attributes in P, i.e., If (x, y)AIND(P), then x

and y are indiscernible by attributes from P. The partition of U,determined by IND(P) is denoted by U/P or U/IND(P), which is theset of equivalence classes generated by IND(P):

U=P¼ � fU=INDðfagÞ9aAPg, ð2Þ

where A�B¼{X\Y98XAA, 8YAB, X\Yaf}. The equivalence classesof the indistinguishable relations with respect to P are

½x�p ¼ fyAU9ðx,yÞA INDðPÞg: ð3Þ

Based on the indistinguishable relation, we can define lower andupper approximations. Let XAU, X can be approximated using

only the information contained within R:

R-lower approximation:

RX ¼ fx9½x�RDXg ð4Þ

R-upper approximation:

RX ¼ fx9½x�R \ Xafg: ð5Þ

If RXaRX then the pair ðRX, RXÞ is called a rough set. With thelower approximation and the upper approximation we can definethe positive, negative, and boundary regions for a set XAU:

POSRðXÞ ¼ RX ð6Þ

NEGRðXÞ ¼ 1�RX ð7Þ

BNDRðXÞ ¼NEGRðXÞ�POSRðXÞ: ð8Þ

An important notion in rough set is dependencies betweenattributes. Attribute Q depending on R (attribute dependency,denoted as g) is defined by

gRðQ Þ ¼

Cardð [X � INDðQ Þ

POSRðXÞÞ

CardðUÞ, ð9Þ

where Card(*) is the cardinality of a set. Q depends totally on R ifg¼1, partially on R if 0ogo1 and not on R if g¼0.

Based on the rough set, from a positive region and a boundaryregion we can reveal certain and possible decision rules in a decisiontable (Pawlak, 2004). Attributes dependency g can give us a usefulcriterion to evaluate the relation between an attribute and thedecision. Based on this measurement, we can select the most usefulattributes for the classification (Shen and Jensen, 2007). As theoriginal rough set model’s lower and upper approximations are toostrict, any errors in classification are not permitted. Therefore, if thedecision table includes errors or the attributes are overlapped to acertain degree, the original rough set analysis might be ineffective(Masahiro, 2005). Such a case may be evident in remote sensingclassification, due to the remote sensing spectral confusion between-class and spectral variation within-class. Some errors in trainingstage for remote sensing classification should hence be admitted.

2.2. Variable precision rough sets (VPRS)

In order to overcome the above-mentioned shortcomings oforiginal rough sets, variable precision rough sets (VPRS) that canrelax the subset operator are proposed. The main idea of VPRS isto allow objects to be classified with an error smaller than acertain predefined level (Ziarko, 1993).

Let A, BDU the inclusion error of A in B is defined by

eðA,BÞ ¼ 1�CardðA \ BÞ

CardðAÞ, ðAafÞ ð10Þ

where Card(*) is the cardinality of a set. For the Eq. (10), e(A, B)¼0if and only if ADB.

A degree of inclusion can be achieved by allowing a certainlevel of inclusion error b. We say that the set A is included in theset B with an inclusion error b:

ADbB iff eðA,BÞrb, 0rbr0:5: ð11Þ

Using inclusion error b, the b-upper and b-lower approximationsof a set X can be defined as

RbX ¼ fxAU9eð½x�R,XÞrbg ð12Þ

RbX ¼ fxAU9eð½x�R,XÞr1�bg ð13Þ

The positive, negative, and boundary regions can be denoted,respectively, as follows:

POSR,bðXÞ ¼ RbX ð14Þ

X. Pan et al. / Computers & Geosciences 36 (2010) 1466–14731468

NEGR,bðXÞ ¼ 1�RbX ð15Þ

BNDR,bðXÞ ¼NEGR,bðXÞ�POSR,bðXÞ: ð16Þ

The dependencies between attributes P and Q extend to

gR,bðQ Þ ¼

Cardð [X � U=Q

POSR,bðXÞÞ

CardðUÞð17Þ

Note that when b¼0, VPRS is equal to the original rough sets. Withthe increase of b value, the enlargement of the inclusion error will beadmitted. The optimal b value can be determined by trial and error.

3. Remote sensing classification method based on VPRS

As illustrated in the flow chart in Fig. 1, the remote sensingclassification process can be divided into several steps: (1) toconstruct a decision table using training samples (pixels) with theirspectral bands and texture features as attributes, and their class asthe decision attributes, (2) to establish a discretized decision tableusing gray-level thresholds obtained by a discretization algorithm,(3) to acquire remote sensing classification rules through featureselection and rule extraction algorithm under the heuristic of VPRS,and (4) to classify the remote sensing imagery using the well-established classification rules.

3.1. Discretization algorithm

A decision table constructed by remote sensing data may containinteger-valued attributes (e.g., spectral bands) or real-valued attri-butes (e.g., texture information). These attributes usually have a largenumber of values (called continuous attributes). If the decision tablewith a large number of attribute values (relative to the number ofobjects in U) is analyzed, then there is a very low chance that a newobject will be properly recognized by matching its attribute valuevector with the rows of this table (Nguyen and Skowron, 1995).Therefore, discretization, which is a process that quantizes thenumeric data into intervals, and assigns each interval a discretevalue, is necessary to achieve a higher quality of classification.

Discretization transforms a continuous attribute’s values into afinite number of intervals and associates with each interval anumerical, discrete value. Discretization can be broken into twotasks. The first task is to find the number of discrete intervals. Thesecond task is to find the width, or the boundaries of the intervals,given the range of values of a continuous attribute. Lots of discreti-

Training samples (pixels)

Spectral bands and

texture features

Decision Table

Classification

1

4

Fig. 1. Flow chart of cla

zation algorithms have been proposed, for example, equal width,equal frequency, statistical-based, and entropy-based discretization.

In this paper, we use class-attribute interdependence max-imization (CAIM), a supervised discretization algorithm (Kurganand Cios, 2004). The CAIM algorithm automatically selects anumber of discrete intervals and, at the same time, finds thewidth of every interval based on the interdependency betweenclasses and attribute values.

A supervised classification task requires a training data setconsisting of M examples, where each example belongs to onlyone of S classes. F indicates any of the continuous attributes fromthe mixed-mode data. Next, there exists a discretization scheme D

on F, which discretizes the continuous domain of attribute F into n

discrete intervals bounded by the pairs of numbers:

D : f½d0,d1�,ðd1,d2�, :::, ðdn�1,dn�g,

where d0 is the minimal value and dn is the maximal value ofattribute F. The algorithm of the CAIM criterion which measuresthe dependency between the class variable C and the discretiza-tion variable D for attribute F, is defined as

CAIMðC,D9FÞ ¼Pn

r ¼ 1 Maxr2=Mþ r

n, ð18Þ

where qir is the total number of continuous values belonging tothe ith class that are within the interval (dr�1,dr], M+ r the totalnumber of continuous values of attribute F that are within theinterval (dr�1,dr], and Maxr the maximum value among all qir

values. The algorithm based on Eq. 18 was implemented followingthe methodology offered by Kurgan and Cios (2004).

3.2. Variable precision attribute reduction

In multi-spectral remote sensing imagery classification, differ-ent spectral bands and diverse texture characters may bring usmore information. But not all of this information is helpful to thesupervised classification, huge amounts of irrelevant additionaltexture information may result in a chaotic state and this leads touncertainty in the classification process (Lei et al., 2007).Experiments show that irrelevant attributes will deteriorate theperformance of the learning algorithms for the curse of dimen-sionality, and will increase training and test time (Kwak and Choi,2002). Therefore, it is often necessary to maintain a concise form ofthe information system and select the most useful features(attributes) for classification. In this paper, a variable precisionrough set method to search for a minimal reducts, is implemented.

Gray-level thresholds

Discretized decision table

Selected features

Classification rules

2

3

ssification method.


Reducts are particular subsets of attributes that provide thesame information for classification purposes as the full set ofattributes. For a decision table S¼(U, A[{d}), a reduct is formallydefined as a subset R of the conditional attribute set A such thatgR(d)¼gA(d). A given data set may have many attribute reductsets, and the collection of all reducts is denoted by

R¼ fX : X � A,gXðdÞ ¼ gAðdÞg: ð19Þ

Several methods finding reducts have been proposed in rough setresearch. Most of these methods are extended from the discernablematrices method (Skowron and Rauszer, 1992) or the QuickReductalgorithm (Chouchoulas and Shen, 2001). In our research, we proposea novel reduct finding algorithm derived from VPRS and QucikReductalgorithm, called VPRS-QuickReduct (briefly denoted as VPRS-Q)algorithm. VPRS-Q improves the current QuickReduct algorithm byselecting the biggest dependency attribute in each iteration, and byusing VPRS’s dependency Eq. 17 as a criterion which admits somesmall errors. The algorithm can be described as follows:

VPRS-Q algorithm

Input: Given a decision table S¼(U, A[{d}) and inclusion errorparameter b.

Output: The selected attributes R.

Step 1.
Initialize the selected attribute set (R) and thecandidate attribute set (T): R’{ } and T’A
Step 2.
Set the dependency degree (dtarget) with wholedecision table’s dependency degree: dtarget¼gA,
b({d})
Step 3. Find the attribute (amax) with biggest dependency
degree (dbigest) in T:
dbigest¼0
For each a in T do
IF gR[{a}, b({d})4dbigest Then
dbigest’gR[{a}, b({d});
amax’a
End IF
End For
Step 4.
IF NOT (gR ({d})odbigest) then Could not find an attribute in attribute set T
that can increase R’s dependency degree: gotostep 7
End IF
Step 5.
Once amax that can increase R’s dependencydegree has been found in attribute set T, removeamax from T, and add amax it into R:
T’T�{amax}

R’R[{amax}

Step 6.
IF gR ({d})odtarget then R’s dependency degree less than the whole
decision table’s dependency degree:
goto step 3
End IF
Step 7. Output R
3.3. Variable precision classification rule extraction

Rough set can be used in different stages of the process of ruleinduction and data processing (Kryszkiewicz, 1999; Shen andChouchoulas., 2002). The LEM2 algorithm proposed by Grzymala-Busse (1992) is one of the most widely used algorithms and manyreal-world applications use this algorithm to extract rules. Therule extraction algorithm raised here is called VPRS ruleextraction algorithm, denoted as VPRS-RE. VPRS-RE improvesLEM2 by taking the inclusion errors into account.

Given a decision table S¼(U, A[{d}) decision classes can beobtained by U/{d} based on Eq. (2) and each class can producepositive and boundary regions. A positive region itself canproduce classification rules directly. For the boundary regions,however, further partitions are needed. Suppose that there arethree decision classes U/{d}¼{D1, D2, D3} in a decision table and B

is a subset of condition attributes. The boundary region of D1(BND(D1)) with respect to B consists of three disjoint subsets:

BNDðD1Þ ¼ ðBD1 \ BD2�BD3Þ [ ðBD1 \ BD3�BD2Þ [ ðBD1 \ BD2 \ BD3Þ

The approximate decision rules will be induced independently fromeach of these three subsets. All of these positive regions andboundary region subsets can be denoted by Y. For each KAY (thedecision connecting with region K is D), a heuristic strategy is used inour algorithm to extract a minimum set of rules. Given C¼c14c24y4cn is the conjunction of n elementary conditions, the objects inthe decision table covered by C can be expressed as follows:

[C] is the cover of rule C on the decision table,½C�þK ¼ ½C� \ K is the positive cover of C on K, and½C��K ¼ ½C� \ ðU�KÞ is the negative cover of C on K.

A decision rule, denoted by r, can be described as follows: IF C

Then D. The procedure of the rule extraction is summarized in thefollowing:

VPRS rule extraction algorithm (denoted as VPRS-RE)Input: K, each of the positive regions or the boundary regions

subset in Y (KAY); b, the allowed inclusion error for VPRS.Output: R, set of rules extracted from region K.

Step 1.
G’K; R’f Step 2. While Gaf do Step 3. Begin Step 4. Initialize the condition set of a rule with an
empty set: C’f
Step 5. Initialize CCurrent with all the attributes and
their values in G:

CCurrent’fc : ½c� \ Gafg
Step 6. Iteratively add conditions to C, until set [C] is
included in the set K with an inclusion error

threshold b obtained by Eq. (10):

While (C¼f) Or ( e([C],K)rb ) do
Begin
Select aACCurrent, such that card([a]\G) ismaximum. If ties occur, select a with the smallestcard([a]); and if further ties occur, select the firsta from the list;

Add condition a into C: C’C[{a}
Update G: G’[a]\G
Eliminate the conditions have be alreadyused:

CCurrent’fc : ½c� \ Gafg,CCurrent’CCurrent�C

End
Step 7. Delete the redundant conditions:
For each aAC do

if e( [C-{a}],K) rb then C’C-{a}
End For
Step 8.
Create a rule r based on C, and put the rule into R:R’R[{r}
Step 9.
Remove the objects covered by R from G: G’G–[rAR[r]
Step 10.
End Step 11. Delete the redundant rules:
For each rAR do
if KD[sA(R-{r})[s] then R’R-{r}
End For
Step 12. Output R

Fig. 2. Study area image with composite of bands 4, 3, 2.


This algorithm can extract the minimum set of the decision rulesthat admit some inclusion error and avoid overfitting the trainingdata set. These rules can be used to classify new objects. This isperformed by matching the feature of a new object with thecondition of a decision rule. Three cases may occur: Case 1, thenew object matches exactly one rule, classification suggestion forthe object is clear; Case 2, the new object matches more than onerule, suggestion may be ambiguous and the rule’s suggestion withthe biggest support should be used:

SupportðrÞ ¼½r�þK½r�

,

and Case 3, the new object does not match any of the rules, in thiscase, the object is considered to be unrecognized.

4. Experimental results

4.1. Study area and data sets

Our study area covers the whole of Honghe National NatureReserve (HNNR), which is located in the Sanjiang Plain. SanjiangPlain is the biggest freshwater wetland area, located in the northeastregion of China (Shuqing et al., 2009). The land use/cover categoriesin the study area include Marsh Land (ML), Forestland (FL), Meadow(MD), Dry Farmland (DF), and Paddy Field (PF). The detaileddescription of these land use/cover types is listed in Table 1.

One cloud-free scene Landsat-5 TM image acquired on October30, 2006 (orbit number 114/26, image size 684�844 pixels), withHonghe National Natural Reserve (HNNR) in the middle of theimage was chosen for our experiment. As it was just the growingseason in our study area, each land use/cover type is relativelyspectrally distinct (Fig. 2).

Six spectral bands were used (including blue (Band 1), green(Band 2), red (Band 3), near-infrared (Band 4), and two mid-infrared (Band 5 and 7)). The texture information in each band isderived from the gray level co-occurrence matrix (GLCM) meanmeasurement with 3�3 and 5�5 window. Thermal band TM6was excluded because it is less informative for vegetationclassification and has a larger pixel size than the other bands.The mean texture measurement in the two window sizes (3�3and 5�5) is included, as our previous experiment shows that thespatial measurement is relevant to the study area. The image wasregistered to the Gauss projection (identical to that of the digitaldistrict map) using ERDAS software based on 55–66 groundcontrol points, which are collected from topological maps. Theregistration procedure achieved an accuracy of less than 0.5 pixelrms. error (RMSE).

Table 1Land cover classification scheme.

Class Description

Forestland Broad leaf trees (Populus davidiana, Betula platyphylla Suk., and

Quercus mongolica) and Shrub (S. rosmarinifolia var. brachypoda, S.

myrtilloides, Alnus sibirica, B. fruticosa, and Spiraea Salicifolia)

Meadow Deyeuxia angustifolia, Rosmarinifolia var. brachypoda, etc. may

contain scattered low shrubs

Marsh Wet sedges (C. lasiocarpa, C. schmidtii, C. meyeriana, and C. pseudo-

curaica); Aquatic macrophytes (G. spiculosa, Phragmites communis

and T. angustifolia)

Dry

farmland

Dryland

Paddy

field

Paddy land

4.2. Results remote sensing classification with VPRS

Based on field experience aided by high-resolution aerialphotographs, two sets of independent samples were extracted inthe experiment. The first set was used for training purpose witheach class comprising 300 independent pixels randomly selected.The second set was used for testing with each class comprising300 independent pixels randomly selected.

Decision table S¼(U, A[{d}) was obtained from the randomlyselected training samples. Each pixel in training set wasrepresented by a row U¼{ui:i¼1 y 1500}. A¼{ai:i¼1 y 18}correspond to the features of the decision table including spectralbands and texture information, and {d} to land cover types. Byutilizing the CAIM discretization algorithm mentioned in Section3.1, the gray-level thresholds for each feature in the decision tablewere computed (Table 2).

With these thresholds, a discretized decision table was derivedfrom the original decision table. The most useful features wereselected with our VPRS-Q algorithm. The classification rules werefinally extracted based on VPRS-RE. In the experiment, the valueof the inclusion error (b) was changed from 0.00 to 0.25 withincrement steps of 0.01. The trend of the numbers of the featuresselected with different b value is demonstrated in Fig. 3a. It can beseen in the figure that the number of the features selected (fromthe original 18 features) can be affected by b.

The features being selected with different b values are listed inTable 3. When b¼0.00, all 18 features were selected. In such acase, VPRS is exactly the same as the original rough set. Whenb¼0.21 or 0.22, only four features including Band 1 (5�5Texture), Band 4 (5�5 Texture), Band 5 (3�3 Texture), andBand 5 (5�5 Texture) were selected. Besides the number offeatures selected, b also affects the number of classification rules.Fig. 3b shows how the number of classification rules change withdifferent b values. There is a clear trend that the larger the b valueis, the fewer the classification rules will be.

The remote sensing classification accuracy with different bvalues was assessed using two indices: overall accuracy and


unrecognizable (Fig. 4). In the process of accuracy assessment, theunrecognizable pixels were not eliminated and they are regardedas misclassified pixels. According to the figure, when b¼0.0 (VPRSbecomes original rough set), the number of the unrecognizabletesting pixels was largest; however, b¼0.25, only 4 testing pixelswere unrecognizable. It is clear that, with the increase of b value,the unrecognizable testing pixels become fewer and fewer, whichmeans the addition of b value reinforces the generalization abilityof the extracted classification rules. Both large and small b can allresult in the decrease of classification accuracy. If b is too small,overfitting of the training data occurs; however, if b is too large,the detailed information in the training data may be ignored.Hence, a relevant b value should be obtained by trial and error. Inthis experiment, the highest classification accuracy appearedwhen b¼0.07. The largest percent correct in the test data set isconsidered as the criterion for the proper b value selection.

Fig. 5a and b are classified images based on the original roughsets and VPRS, respectively. It can be seen that a great number of

Table 2Gray-level thresholds for each feature in decision table.

Features Band/texture Gray-level thresholds

a1 Band 1 {[0, 81], (81, 104], (104, 174], (174, 197], (197,

255]}

a2 Band 2 {[0, 47], (47, 79], (79, 111], (111, 175], (175,

255]}

a3 Band 3 {[0, 49], (49, 147], (147, 186], (186, 245], (245,

255]}

a4 Band 4 {[0, 4], (4, 83], (83, 162], (162, 233], (233, 255]}

a5 Band 5 {[0, 3], (3, 69], (69, 98], (98, 171], (171, 255]}

a6 Band 7 {[0, 28], (28, 63], (63, 110], (110, 156], (156,

255]}

a7 Band 1 3�3

texture

{[0, 27], (27, 98], (98, 133], (133, 204], (204,

255]}

a8 Band 2 3�3

texture

{[0, 49], (49, 77], (77, 164], (164, 254], (254,

255]}

a9 Band 3 3�3

texture

{[0, 34], (34, 133], (133, 173], (173, 234], (234,

255]}

a10 Band 4 3�3

texture

{[0, 90], (90, 188], (188, 230], (230, 237], (237,

255]}

a11 Band 5 3�3

texture

{[0, 3], (3, 45], (45, 88], (88, 168], (168, 255]}

a12 Band 7 3�3

texture

{[0, 13], (13, 50], (50, 108], (108, 145], (145,

255]}

a13 Band 1 5�5

texture

{[0, 37], (37, 122], (122, 149], (149, 195], (195,

255]}

a14 Band 2 5�5

texture

{[0, 46], (46, 83], (83, 158], (158, 254], (254,

255]}

a15 Band 3 5�5

texture

{[0, 39], (39, 130], (130, 165], (165, 253], (253,

255]}

a16 Band 4 5�5

texture

{[0, 91], (91, 181], (181, 224], (224, 240], (240,

255]}

a17 Band 5 5�5

texture

{[0, 5], (5, 34], (34, 92], (92, 171], (171, 255]}

a18 Band 7 5�5

texture

{[0, 7], (7, 31], (31, 100], (100, 151], (151, 255]}

40

60

80

100

120

140

160

180

200

0 0.05 0.1 0.15 0.2 0.25Level of inclusion error �

Num

ber

of r

ules

Num

ber

of f

eatu

res

Fig. 3. Number of features (a) and r

unrecognizable pixels (the white pixels) are densely distributed inFig. 5a. In Fig. 5b, there are relatively better visual results withfewer unrecognizable pixels.

5. Conclusion and discussion

VPRS has been first proposed for the remote sensingclassification in this paper. In VPRS based remote sensingclassification approach, three major algorithms are involved: (1)class-attribute interdependence maximization (CAIM), (2) VPRS-QuickReduct (VPRS-Q), and (3) VPRS rule extraction (VPRS-RE).

The experiment shows that the addition of inclusion error bcan improve the original rough set in feature selection andclassification performance for remote sensing data, which con-tains uncertainty or vagueness, caused by spectral confusionbetween-class and spectral variation within-class. For the originalrough set, this spectral confusion may result in the exclusion ofindistinguishable relation (Eq. (1)) from lower approximation,which causes the lower approximation to become too small, andthe dependency degree (gR(Q)), based on Eq. (9), will approximateor be equal to 0. In such a case, it is hard to perform featureselection as proved by Eq. (19). This explains why experimentalresults show that the number of the features selected (from the

0 0.05 0.1 0.15 0.2 0.2502468

101214161820

Level of inclusion error �

ules (b) with different b value.

Table 3

Features selected with different b values.

b Features selected (the features listed according to the sequence

appeared in VPRS-Q. The meaning of a1–a18 is listed in Table 2).

0.00 a1–a18 all of the feature are selected

0.01 a4, a7, a17, a18, a12, a9, a10, a16, a3, a5, a1, a2, a14, a15, a6, a13

0.02 a4, a18, a11, a12, a9, a16, a1, a15, a6, a13

0.03 a4, a7, a18, a11, a12, a9, a10, a16, a3, a5, a1, a14, a15, a6, a13

0.04 a4, a8, a7, a18, a11, a12, a9, a10, a16, a3, a5, a1, a2, a14, a15, a6, a13

0.05 a4, a8, a7, a17, a18, a12, a9, a10, a16, a3, a5, a15, a6

0.06 a4, a8, a7, a17, a9, a10, a16, a3, a5, a2, a14, a15, a6, a13

0.07 a4, a8, a7, a17, a18, a11, a12, a9, a10, a16, a3, a5, a2, a15, a6, a13

0.08 a15, a4, a7, a8, a17, a11, a12, a9, a10, a5, a6, a2, a3, a1, a16, a13, a14

0.09 a9, a8, a11, a10, a12, a14, a13, a17, a16

0.10 a9, a3, a4, a1, a8, a11, a12, a14, a13, a6, a18, a16

0.11 a9, a7, a3, a4, a11, a10, a13, a18, a17, a16

0.12 a9, a13, a3, a4, a1, a11, a12, a6, a16, a18, a17

0.13 a9, a13, a7, a3, a4, a1, a11, a12, a6, a16, a18, a17

0.14 a9, a13, a7, a3, a4, a1, a12, a6, a16, a18, a17.

0.15 a15, a3, a9, a13, a4, a8, a12, a11, a10, a14, a6, a16, a1.

0.16 a7, a9, a3, a13, a5, a4, a1, a8, a10, a12, a11, a14, a6, a16, a18.

0.17 a7, a4, a8, a11, a16, a17.

0.18 a7, a16, a4, a11, a8, a17.

0.19 a7, a16, a4, a11, a14, a8, a17.

0.20 a7, a13, a16, a4, a11, a17, a8.

0.21 a13, a16, a11, a17.

0.22 a13, a16, a11, a17.

0.23 a13, a7, a18, a12, a10, a16, a6, a5, a4, a17, a8.

0.24 a13, a4, a16, a14, a17, a8.

0.25 a13, a12, a4, a10, a16, a6, a5, a11, a8, a17

0

5

10

15

0 0.05 0.1 0.15 0.2 0.25


Unr

ecog

niza

ble

(%)

0 0.05 0.1 0.15 0.2 0.25


0 0.05 0.1 0.15 0.2 0.25


50

60

70

80

90

100

Acc

urac

y (%

)

50

60

70

80

90

100

Acc

urac

y (%

)

Fig. 4. Classification results of VPRS with different b: (a) overall accuracy for testing set; (b) overall accuracy for training set; (c) unrecognizable pixels for testing set.

Forestland MarshMeadow

Paddy Field Unrecognizable Dry Farmland

Fig. 5. Classification results: (a) classification result by b¼0.0, i.e., original rough

set method and (b) classification result by b¼0.07.


original 18) change with different b values and that there is ageneral trend that the larger the b is, the smaller is the number ofthe features selected. Furthermore, the frequency of the selectedfeatures with different b values may help us understand what themost important features might be. Here, the most frequentfeatures being selected is Band 4 (5�5 texture) (the NIR band)that appears in all 26 b values. This is a coincident with the

previous studies that showed that the Landsat TM NIR band is themost useful remote sensing band for land use/cover remotesensing classification in densely vegetated areas due to its strongreflective spectral character (Brian et al., 2005).

Fig. 4 shows that the classification accuracy for training data isup to 0.9913 with the original rough set classifier. But theclassification accuracy reduces greatly for the test data, and moreobjects are unrecognizable. This means without considering theinclusion error (b), there is overfitting in the training data and theclassification generation degrades. The inclusion of b highlights theclassification generalization ability. Along with the increase of bvalue, the number of classification rules become fewer and fewer(as demonstrated in Fig. 3b). In such a case, the generalizationability of each classification rule will increase. The number ofunrecognizable pixels will decrease. But this does not mean thatthe larger the b is, the higher the classification performance willalways be. Thus, there is a trade-off between the generalizationability and fitting to the training data. Various methods have beenproposed to prevent overfitting in other classifiers (e.g., methodsfor decision tree classifier). Here VPRS plays a similar function inprevent overfiting and improve classification generalization ability.

The classification results were improved by VPRS method to acertain degree, compared to the original rough set method. ButVPRS method might be improved when taking these three reasonsinto account: first, discretization transforming a continuousattribute’s values into a finite number of intervals, may lead toinformation loss; second, the decision of parameter b is more orless subjective; third, the method may trap in local optima in theprocess of feature selection. Fuzzy set theory and geneticalgorithm will be adopted to overcome these problems in ourfuture work. Meanwhile, the classifier’s ability in specifying ofquantity and location (Pontius, 2000) should better be assessedfor the accuracy assessment. This work will also be considered infuture work, after carefully reclassifying the unrecognizable pixelsusing the fuzzy distance between rules and pixels.

Acknowledgements

This research was supported in part by the National NaturalScience Foundation of China (No. 40871188), the Knowledge


Innovation Program of Chinese Academy of Sciences (ZCX2-YW-Q10-1-3). We thank Prof. Patricia Dale of Griffith School ofEnvironment of Australia Griffith University, for her carefulrevision and fruitful comments on the manuscript.

References

Ahlqvist, O., 2005. Using uncertain conceptual spaces to translate between landcover categories. International Journal Geographical Information Science 19(7), 831–857.

Ahlqvist, O., Keukelaar, J., Oukbir, K., 2000. Rough classification and accuracyassessment. International Journal Geographical Information Science 14 (5),475–496.

Ahlqvist, O., Keukelaar, J., Oukbir, K., 2003. Rough and fuzzy geographical dataintegration. International Journal Geographical Information Science 17 (3),223–234.

Brian, L.B., David, P.L., Jiaguo, Q., 2005. Identifying optimal spectral bands fromin situ measurements of Great Lakes coastal wetlands using second-derivativeanalysis. Remote Sensing of Environment 97 (2), 238–248.

Chouchoulas, A., Shen., Q., 2001. Rough set-aided keyword reduction for textcategorization. Applied Artifcial Intelligence 15 (9), 843–873.

Grzymala-Busse, J.W., 1992. LERS—a system for learning from examples based onrough sets. In: Slowinski, R. (Ed.), Intelligent Decision Support: Handbook ofApplications and Advances of the Rough Sets Theory. Kluwer AcademicPublishers, Dordrecht, pp. 3–18.

Han, J., Cai, Y., Cercone, N., 1993. Data-driven discovery of quantitative rules inrelational databases. IEEE Transactions on Knowledge and Data Engineering 5(1), 29–40.

Kryszkiewicz, M., 1995. Maintenance of reducts in the variable precision roughsets model. In: Proceedings of the 1995 ACM Computer Science Conference,USA. pp. 355–372.

Kryszkiewicz, M., 1999. Rules in incomplete information systems. InformationSciences 113 (3–4), 271–292.

Kurgan, L.A., Cios, K.J., 2004. CAIM discretization algorithm. IEEE Transactions onKnowledges and Data Engineering 16 (2), 145–153.

Kwak, N., Choi, C.H., 2002. Input feature selection for classification problems. IEEETransactions on Neural Networks 13 (1), 143–159.

Lei, T.C., Wan, S., Chou, T.Y., 2007. The comparison of PCA and discrete rough setfor feature extraction of remote sensing image classification—a case study onrice classification. Computational Geosciences 12 (1), 1–14.

Leung, Y., Fung, T., Mi, J.S., Wu, W.Z., 2007. A rough set approach to the discovery ofclassification rules in spatial data. International Journal of GeographicalInformation Science 21 (9), 1033–1058.

Masahiro, I., 2005. Several approaches to attribute reduction in variable precisionrough set model, Modeling Decisions for Artificial Intelligence. Springer, Berlin.doi:10.1007/11526018_22 pp.215-226.

Nguyen, H.S., Skowron, A., 1995. Quantization of real values attributes, rough setand Boolean reasoning approach. In: Proceedings of the Second JointConference on Information Sciences, Wrightsville Beach, NC, USA. pp. 34–37.

Ouyang, Y., Ma, J.W., 2006. Land cover classification based on tolerant rough set.International Journal of Remote Sensing 24 (14), 3041–3047.

Pal, S.K., Mitra, P., 2002. Multispectral image segmentation using the rough-set-initialized EM algorithm. IEEE Transactions on Geoscience and Remote Sensing40 (11), 2495–2501.

Pawlak, Z., 1982. Rough sets. International Journal of Computer and InformationSciences 11, 341–356.

Pawlak, Z., 1999. Rough classification. International Journal of Human–ComputerStudies 51, 369–383.

Pawlak, Z., 2004. Some issues on rough sets. Transactions on Rough Sets 1, 1–58.Pontius Jr., R.G., 2000. Quantification error versus location error in comparison of

categorical maps. Photogrammetric Engineering and Remote Sensing 66 (8),1011–1016.

Rolka, A.M., Rolka, L., 2004. Variable precision fuzzy rough sets. Transactions onRough Sets 1, 144–159.

Shen, Q., Chouchoulas, A., 2002. A rough-fuzzy approach for generating classifica-tion rules. Pattern Recognition 35 (11), 2425–2438.

Shen, Q., Jensen, R., 2007. Rough sets, their extensions and applications.International Journal of Automation and Computing 04 (3), 217–228.

Shuqing, Z., Xiaodong, N., Bo, K., Wang, Z.M., Jiang, H.X., Yu, H., Zhao, Z.C., Li, X.F.,Liu, C.Y., Dale, P., 2009. Identifying wetland change in China’s Sanjiang plainusing remote sensing. Wetlands 29 (1), 302–313.

Skowron, A., Rauszer, C., 1992. The discernibility matrices and functions ininformation systems, Intelligent Decision Support—Handbook of Applicationsand Advances of the Rough Sets Theory. Kluwer Academic Publishers, Boston,pp. 331–362.

Yasdi, Y., 1996. Combining rough sets learning and neural learning: method to dealwith uncertain and imprecise information. Neural Computing 7 (1), 61–84.

Ziarko, W., 1993. Variable precision rough set model. Journal of Computer andSystem Sciences 46 (1), 39–59.

Ziarko, W., 2001. Probabilistic decision tables in the variable precision rough setmodel. Computational Intelligence 17 (3), 593–603.

dx.doi.org/10.1007/11526018_22.3d

A variable precision rough set approach to the remote sensing land use/cover classification

Documents

Transcript of A variable precision rough set approach to the remote sensing land use/cover classification