The Supervised Normalized Cut Method for Detecting, Classifying ...

INFORMS Journal on ComputingArticles in Advance, pp. 1–14ISSN 1091-9856 (print) � ISSN 1526-5528 (online) http://dx.doi.org/10.1287/ijoc.1120.0546

© 2013 INFORMS

The Supervised Normalized Cut Method forDetecting, Classifying, and Identifying

Special Nuclear Materials

Yan T. YangDepartment of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720,

[email protected]

Barak FishbainFaculty of Civil and Environmental Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel,

[email protected]

Dorit S. HochbaumDepartment of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720,

[email protected]

Eric B. Norman, Erik SwanbergDepartment of Nuclear Engineering, University of California, Berkeley, Berkeley, California 94720

{[email protected], [email protected]}

The detection of illicit nuclear materials is a major tool in preventing and deterring nuclear terrorism. Thedetection task is extremely difficult because of physical limitations of nuclear radiation detectors, shielding

by intervening cargo materials, and the presence of background noise. We aim at enhancing the capabilities ofdetectors with algorithmic methods specifically tailored for nuclear data. This paper describes a novel graph-theory-based methodology for this task. This research considers for the first time the utilization of supervisednormalized cut (SNC) for data mining and classification of measurements obtained from plastic scintillationdetectors that are of particularly low resolution. Specifically, the situation considered here is for when bothenergy spectra and the time dependence of such data are acquired.

We present here a computational study, comparing the supervised normalized cut method with alternativeclassification methods based on support vector machine (SVM), specialized feature-reducing SVMs (i.e., 1-normSVM, recursive feature elimination SVM, and Newton linear program SVM), and linear discriminant analysis(LDA). The study evaluates the performance of the suggested method in binary and multiple classificationproblems of nuclear data. The results demonstrate that the new approach is on par or superior in terms ofaccuracy and much better in computational complexity to SVM (with or without dimension or feature reduction)and LDA with principal components analysis as preprocessing. For binary and multiple classifications, theSNC method is more accurate, more robust, and is computationally more efficient by a factor of 2–80 than theSVM-based and LDA methods.

Key words : nuclear material detection; supervised learning; normalized cut; support vector machine;multiclassification

History : Accepted by Karen Aardal, Area Editor for Design and Analysis of Algorithms; received March 2011;revised July 2011, March 2012; accepted June 2012. Published online in Articles in Advance.

1. IntroductionThe detection of illicit nuclear materials is of greatinterest in the efforts to deter and prevent nuclearterrorism. Today’s typical approaches to specialnuclear material (SNM) detection primarily employfixed inspection portals, installed at national bor-ders, sea-ports, and traffic and railway checkpointswithin the national interior. Although one can detectthe presence of radioactive material using simplegamma-ray counting equipment, such as a Geigercounter, this creates a great deal of false-positive

errors as some legitimate cargoes such as bananas,fertilizers, cat litter, tiles and ceramics (containingpotassium, 40K), smoke detectors (with americium,241Am), and colored glass (containing natural ura-nium) may also generate high radioactivity levels.Therefore, it is important to identify not only thepresence of a radioactive material, but also its iden-tity. One way of identifying the source is by examin-ing the radiation’s spectrum, the number of gammarays detected at each energy interval, and findingthe best match for that spectrum in a set of spectra

1

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Published online ahead of print April 11, 2013

Yang et al.: Supervised Normalized Cut for Identifying Special Nuclear Materials2 INFORMS Journal on Computing, Articles in Advance, pp. 1–14, © 2013 INFORMS

obtained from several known SNMs. With low-resolution detectors this task is challenging, even forhuman experts. It is therefore important to enhancethe capabilities for identifying the nuclear materialbased on the radiation spectrum.

In comparing a given spectrum with that of a set ofknown SNMs the latter is used as a so-called trainingset. As such, the illicit SNM detection problem canbe cast as a machine learning classification problem.The goal is to classify the target material examined byits acquired spectrum, or a set of spectra gathered byvarious sensors or in different time intervals, so as togenerate information about the material and discernwhether it poses a threat in a relatively quick manner.

In real-life scenarios of shipping cargo screening,training sets are usually used in several screeningmethods based on passive radiation counting. In thesecases, the training set consists of spectra acquired bydetectors when the content of the examined cargo isknown. Thus, different containers known to containa specific SNM, as well as containers with benignsubstances (such as bananas, colored glass, or with-out any radioactive material) are placed in front ofthe detector. For each of these materials a set ofspectra is acquired and labeled accordingly. Upon anarrival of a new container with unknown content, anew set of spectra (with unknown labels) is acquired.The purpose of the classification is to group theseunknown spectra with the best matched samples fromthe training set.

There are two methods of detection—passive andactive interrogation. Passive interrogation measures amaterial’s emitted radiation. As such, passive inter-rogation is limited by the rate and energy of naturalradioactivities and their attenuation through shield-ing. Because of these shortcomings, the active inter-rogation alternative was proposed for the nuclearmaterial detection task (Bertozzi and Ledoux 2005,Norman et al. 2004), especially for use on cargoat ports of entry. In active interrogation, the tar-get is irradiated by bremsstrahlung X-rays (Bertozziand Ledoux 2005) or highly penetrating neutrons(Norman et al. 2004) to produce spectra that are char-acteristic to each SNM. Still, even with active inter-rogation the identification of nuclear materials byits acquired spectra is difficult because of physicallimitations of nuclear radiation detectors, the pres-ence of background noise, and intervening shieldingmaterials.

Different types of detectors deliver spectra with dif-ferent merits. High purity germanium (HPGe) gammadetectors have excellent energy resolution. However,they are expensive and require cryogenic cooling,making field use cumbersome. Sodium iodide (NaI)detectors are less expensive and do not require cool-ing, and the quality of the delivered spectra is less.

Plastic scintillators are detectors that do not requirecooling nor high maintenance and as such are morepractical for nuclear field detection applications. Thetrade-off is that these detectors produce low-resolutionspectra that are very challenging to analyze. Even forhuman experts the differentiation between the spectraproduced by plutonium and those produced by ura-nium is subtle. Data mining and pattern recognitiontechniques tailored for nuclear data have the potentialof enhancing the ability to differentiate between dif-ferent SNMs and make up for the hardware shortcom-ings. Several such methods reported in the literatureinclude artificial neural networks (Kangas et al. 2008),naive Bayesian framework classification (Carpenteret al. 2010), support vector machine (Gentile 2010),and graph-theory-based techniques (Mihailescu et al.2010). However, all these tools were used on measure-ments recorded by high-resolution HPGe detectors.Other than the aforementioned references, we find nosystematic efforts in the literature to construct a robustautomated technique to identify nuclear threats. Inaddition to the problems such as the effect of interven-ing cargo on signal distortion, one reason for this isthe lack of a comprehensive data set of SNM spectralsignatures.

Swanberg et al. (2009) have recently acquiredspectral data from a plastic scintillator detector byactive interrogation. They produced the only dataset currently available that presents spectra of SNMsacquired by a low-resolution plastic detector. Thisdata set consists of spectra of plutonium, uranium,latite (rock material), and blank. The challengingtask within the scope of this work is to distinguishbetween plutonium and uranium. Recent studies(Marrs et al. 2008) demonstrated that a classifica-tion of plutonium’s versus uranium’s spectra can beaccomplished when high-resolution HPGe detectorsare employed. Here we show that this classificationtask can be accomplished by employing appropriatedata-mining techniques on low-resolution spectraacquired by plastic detectors.

The data sets obtained by Swanberg et al. (2009) arethe basis for our computational study, which presents,for the first time, a graph-theory-based method forclassifying low-resolution spectra. This provides pre-liminary evidence that the use of the inexpensive andlow-maintenance plastic detectors with data-miningtechniques for the purpose of detecting illicit nuclearmaterial is practical. Furthermore, our results appearto be promising enough to encourage the testing ofthe technique we propose for this task, the supervisednormalized cut, in other areas of data mining and clas-sification contexts as well.

1.1. Taxonomy of Machine Learning ProblemsMachine learning techniques can be categorized eitheras supervised (inductive) learning, i.e., inferring a

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

Yang et al.: Supervised Normalized Cut for Identifying Special Nuclear MaterialsINFORMS Journal on Computing, Articles in Advance, pp. 1–14, © 2013 INFORMS 3

function from supervised training data (Caruana andNiculescu-Mizil 2006), or as unsupervised learning,thus the learner is given only unlabeled examples(Duda et al. 2001).

In supervised learning, one tries to infer a func-tional relation y = f 4x5 from a training set 4Ex11y151 0 0 0 14Exm1ym5, where Ex’s are feature vectors of data points,and the y’s are the known labels assigned to the vec-tors of the training set. In this study, the feature vec-tors represent the spectra from the dynamic decay offission products of 239Pu (plutonium) and 235U (ura-nium), the radioactivities induced in latite, and theradioactivities induced in background materials; they’s are plutonium, uranium, latite, and blank. We seeka classifier f that can accurately predict the labelsym+11ym+21 0 0 0 for future input vectors Exm+11 Exm+21 0 0 0This paradigm is common to many other problemsthat appear in different areas, such as computer vision(Carneiro et al. 2007), geostatistics (Dowd and Pardo-Iguzquiza 2005), credit scoring (West 2000), and bio-metric identification (Zhang et al. 2008).

Unsupervised learning groups data points into dif-ferent clusters, based on some measure of sameness,without using prior training data. The general procedureis to map all the data points into different clusterssuch that some criteria are optimized. The supervisedclassification method proposed here relies on the con-version of an unsupervised technique to a supervisedcontext.

Because the number of illicit radioactive substancesis finite and well defined (International Atomic EnergyAgency 2007), the respective multiclassification prob-lem is to classify into K classes, where K is knownin advance. Here we solve the multiclassificationproblem by repeated calls to a binary classificationsubroutine. This is a common practice in many multi-classification techniques (Crammer and Singer 2002,Goldschmidt and Hochbaum 1994, Kotsiantis 2007).

Examples of criteria for binary clustering, alsoreferred to as bipartitioning, are (i) minimum cut(Ford and Fulkerson 1956), (ii) ratio regions (pre-sented by Cox et al. 1996), (iii) the normalized cut(suggested by Shi and Malik 2000), and (iv) a variantof normalized cut NC′ (studied by Hochbaum 2010b).

We choose to implement the NC′ criterion becauseit provides better quality solutions for image clus-tering and pattern recognition applications thanother, commonly used techniques (Hochbaum 2010b,Sharon et al. 2006). Furthermore, NC′ was shown tobe efficiently solvable (Hochbaum 2010b) and as suchit is a good candidate for solving clustering problemsin short running times.

Because NC′ does not require any prior informa-tion on the data, it is an unsupervised technique. Themethod described here, called the supervised normal-ized cut (SNC), utilizes the NC′ criterion in a super-vised context. The method incorporates training data

and is used for the purpose of detecting, classifying,and identifying spectra acquired by low-resolutionradiation detectors.

The SNC method is compared here with two tradi-tional data mining techniques—three variants of sup-port vector machine (SVM) and linear discriminantanalysis (LDA). The results of this study suggest thatSNC is preferred to SVM (with or without featurereduction) for the task of nuclear material detectionand might be better suited than these techniques forother classification problems.

The paper is organized as follows: Section 2 des-cribes the SNC method. Section 3 describes how thedata were generated and explores different ways topresent the acquired data. Section 4 presents the clas-sification results, both in terms of accuracy and run-ning times. Section 5 concludes the paper.

2. The Variant of Normalized Cut andSupervised Normalized Cut

2.1. Binary ClassificationThe binary classification problem is formalized as agraph bipartitioning problem. An undirected (com-plete) graph G = 4V 1E5 is constructed, where eachnode v ∈ V corresponds to a data point—in our case,a feature vector (see §3.2) associated with a set ofspectra acquired from a material sample. There is anedge in the graph for each pair of nodes i and j associ-ated with a weight wij that corresponds to the similar-ity between the feature vectors associated with nodesi and j . Higher similarity is associated with higherweights.

The following notation facilitates the presentationof the bipartitioning optimization problem: A biparti-tion of the graph is called a cut, 4S1 S̄5 = 86i1 j7 � i ∈ S,j ∈ S̄9, where S̄ = V \S. We denote the capacity of a cut4S1 S̄5 as

C4S1 S̄5=∑

i∈S1 j∈S̄1 6i1 j7∈E

wij 0 (1a)

The capacity of a set, S ⊂ V , is denoted by

C4S1S5=∑

i1 j∈S1 6i1 j7∈E

wij 0 (1b)

The goal of NC′ is to find a nonempty set S∗ ⊂ V , sothat the capacity of the cut, or the similarity betweenS∗ and its complement, divided by the capacity of theset S∗, which is the similarity of the feature vectorswithin the set, is minimized. The set S∗ is called asource set, and its complement is called a sink set.

The mathematical objective of this goal can be writ-ten as (Hochbaum 2010b, Sharon et al. 2006)

NC ′4S∗5= minS⊂V

C4S1 S̄5

C4S1S50 (2)

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


As noted before, optimization problem (2) was shownto be solvable in polynomial time (Hochbaum 2010b)with a minimum cut procedure on an associatedgraph.

The NC′ solution procedure requires to assign,in advance, a single node that will be included inthe source S (or sink S̄) set (see Hochbaum 2010b fordetails). This node is referred to as a seed node. Here,we exploit the seed node mechanism to force a priorithe training data to be in either the source S or in thesink S̄, based on the material from which they wereacquired. Specifically, the input consists of three sets:two sets of nodes, A and B, which are associated withfeature vectors acquired from two different knownmaterials, M1 and M2, and a third set I correspondingto feature vectors acquired from an unknown mate-rial or materials. The goal of the binary classificationproblem is to associate each feature vector in I witheither M1 or M2.

The input to the classification problem is the com-plete graph G = 4V 1E5 defined on the set of objectsV = A ∪ B ∪ I and the similarity weights associatedwith each pair of nodes (edge) 6i1 j7 ∈ E. Two nodes sand t are added to the graph with an arc of infiniteweight from s to each node in A and from each nodein B to t. On this graph we seek a partition that min-imizes the NC′ criterion so that s ∈ S1 t ∈ S̄. The nodesin I that end up in S are classified as A, i.e., acquiredfrom material M1; and nodes in I that end in S̄ areclassified as B, thus acquired from M2. This process isillustrated in Figure 1.

The adjustment of NC′ to a supervised context, asdescribed, is a new supervised classification method-ology that takes advantage of the solvability of NC′

and broadens the application of NC′ to a wider classof problems.

As previously mentioned, the efficiency of NC′

algorithm was established in Hochbaum (2010b),where it was shown that NC′ is solvable in the run-ning time of a minimum s1 t-cut problem, which is

A

A

B

(a) (b)

B

B

B B

B

B

B

B

BA

A A

AA

A

AA

Figure 1 (a) The Input with the Training Sets A (Black) and B (Light Gray) and the Unclassified Nodes C (Gray); (b) The Solution: The Two Sets AreSeparated by a Minimum Cut

Note. The set on the left consists of the nodes, classified as A nodes, forms the set S; and the set on the right of the B nodes is S̄, where the similarity withinS and the dissimilarity between the two sets are high.

strongly polynomial and combinatorial. In the con-text of SNC, the only additional step before perform-ing NC′ solution procedure is to separate and assigntraining data to belong to the source or the sink set,which does not affect the time complexity. Thus SNCis efficient, running in polynomial time.

2.2. MulticlassificationThe binary classification with NC′ is used here asa subroutine for solving multiclassification problemsinvolving three or more different classes. Multiclassifi-cation is more realistic in the context of nuclear threatdetection, as it is necessary to identify, e.g., the con-tents of cargo, as one of an array of possible materials.

For multiclassification we utilize a scheme gener-ally referred to as one-versus-all decomposition (e.g.,Duan and Keerthi 2005, Rifkin and Klautau 2004). Fora problem with K different classes, we create K dif-ferent binary problems. The kth binary problem isto classify the unknown nodes I into two classes—material Mk, or not-Mk, E (stands for Else). Each nodeis classified by K binary classifiers. The label of thenode is determined to be class k, if it was classifiedas Mk. If the node was classified as material more thanonce, all possible materials are reported. If the nodeis classified as E for all k, then the label is undecided.

In the case where all the nodes in I are acquiredfrom the same container, a voting can be used todetermine the final grouping of all nodes in I . In thiscase, the label of a node is determined to be the class kthat has the highest score. If there is more than oneclass with the highest score, the tie is broken arbitrar-ily or both materials are reported as possible classi-fication. If the highest score is zero, then the label ofthe node is undecided.

Figure 2 demonstrates multiclassification to fourpossible materials or classes A, B, C, and D. Threeunclassified data points need to be classified. Train-ing data are provided for each class as shown in Fig-ure 2(i). Figure 2(a)–(d) are the binary classificationsfor each class A–D. Figure 2(r) shows the combined

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


A

A A

A

A

A

A

A

B

B

CC

C C

(i) (a) (b)

(c) (d) (r)

CC

C C

DD

D

BA

E

E

E E

E

E

E E E

E

DD

C

C

CC D

D

D

B

BB

B

D

E E

EE

E

EEE

E

EE E

E E

E E

EE

AA A

AA

E

EE

E

EE E

E E

E

E

E EE

E

E

B

BB

B

EE

E

EE

E

EE

Figure 2 Multiclassification: (i) A, B, C, and D are Four Distinct Materials or Classes. (a)–(d) are A–D Decomposed Binary Classifications. EachClassification Gives a Label to the Unknown Points. (r) The Final Multiclassification Combines All the Labels for a Given Unknown Point

results of all subproblems; two of the unknown pointsare successfully classified to A and B. The middleunclassified node is undecided, because it has E forall its labels.

3. Data and Experimental SetupThe measurement data for the nuclear classificationproblem were acquired in a controlled environmentwith plastic detectors. We use here a data set ofactive interrogation of plutonium and uranium madeavailable by Swanberg et al. (2009). In this experi-ment, a sample of either blank, 0.19 grams of 235U,0.568 grams of 239Pu, or 3 grams of latite, an igneousrock material was placed in a cave and irradiated for30 seconds with neutrons generated by the 88-inchcyclotron at Lawrence Berkeley Laboratory (Kelly1962). When irradiated with neutrons, materials maybecome radioactive or undergo nuclear fission. Acti-vation products and fission products from differentmaterials have different characteristic gamma raysand decay times. These are the characteristics that weuse as the pattern distinguishing a specific nuclearmaterial from others.

The target was exposed to the detectors for a totalof 25 seconds and each 205-second interval yields acumulative energy spectrum measurement for thatinterval. The detector system measured energies inthe range from approximately 100 keV to 14 MeVusing 1,024 channels.

The data set used in our experiments consists ofthe measurements reported in Swanberg et al. (2009)and additional measurements that were made avail-able by the same authors. The additional data include

10 measurements for each run. In total, 275 runs wereconducted: 20 with blank; 22 with latite; 92 with ura-nium 235U; and 140 with plutonium 239Pu; resulting ina total of 2,750 acquired spectra.

3.1. The Detector Live TimeDuring the data acquisition, the operator set thedetector to a nominal run time of 205 seconds. Theactual acquisition time of the detector, however,depends on the particularities of each run. Specifi-cally, when the gamma rays arrive at high frequency,the detector does not process all of them becauseof hardware limitations. Therefore the length of theactual run time, or the so-called live time, is shorterthan the nominal run time or real time. To correct thisinherent hardware bias, we adjust for the live times ofthe detector in each run by rescaling each gamma-raycount by the instrument’s live time produced with thedata. This means that the gamma-ray counts in eachspectrum are scaled (divided) by the live time asso-ciated with that spectrum measurement. The resultspresented here use such scaled data.

3.2. Feature Vectors and Data AnalysisEach target was placed in front of the detector fora run of 25 seconds. The spectrum—the energy his-togram of the gamma ray received by the detector—was recorded every 205 seconds. Each entry in thehistogram corresponds to a different energy band(channel). Hence, 10 spectral measurements takenat consecutive periods of time were produced inevery run.

The obtained data can be regarded as a two-dimen-sional array composed of the gamma-ray counts for

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


1,000

800

600

400

200

0Position in vector

CS

CSnSD

SD

Position in vector

Spectral derivative

Gam

ma-

ray

coun

ts

Gam

ma-

ray

coun

ts

1,000

800

600

400

200

0

–200Column stacking

Gam

ma-

ray

coun

ts

Gam

ma-

ray

coun

ts

First 2.5-second interval

First 2.5-second spectrum

Sixth 2.5-second interval

1,000

900

800

700

600

500

400

300

200

100

–100

–200

0

First 2.5-second spectrum

Spectral difference between 1st and 2nd spectra

Position in vectorPosition in vector

1.0

0.8

0.6

0.4

0.2

0

N

Figure 3 Feature Vectors Produced by Column Stacking (CS), Spectral Difference (SD), Column Stacking and Spectral Difference (CSnSD), andNormalization of the Data (N). For CS, SD, and N, There are 10,240 Indices in the Vectors; for CSnSD, There are 19,456 Indices

each energy channel in each of the 10 given mea-surements. For 1,024 energy channels and 10 consec-utive measurements, each such sample is an arraywith 1,024 columns and 10 rows. Each row vectoris denoted by Esi, for i = 611 0 0 0 1107, where Es1 corre-sponds to the first 205-second interval and Esi corre-sponds to the ith 205-second interval. To convert this2D array into a feature vector, four different methodsare considered:

1. Column stacking (CS). The vectors Es11 0 0 0 1 Es10 areconcatenated to a vector of length 10,240. The fea-ture vector produced is 4Es11 Es21 0 0 0 1 Es105 as in Figure 3(top left).

2. Spectral difference (SD). The purpose of thismethod is to emphasize the radioactive decay cap-tured in the temporal domain. To do that, we con-catenate the first spectrum vector, followed by thedifference between the second spectrum vector andthe first spectrum vector, and in general the ith entryis the difference between the the ith vector and the4i − 15th vector. The feature vector produced is then4Es11 Es2 −Es11 Es3 −Es21 0 0 0 1 Es10 −Es95 as in Figure 3 (top right).

3. Column stacking and spectral difference(CSnSD). Here we join (concatenate) the featurevectors resulting from CS and SD into a single featurevector. The feature vector produced is 4Es11 Es21 0 0 0 1 Es101Es2 − Es11 Es3 − Es21 0 0 0 1 Es10 − Es95 as in Figure 3 (bottom left).

4. Normalization (N). To remove the dependenceon absolute counts of the spectra, which grow withthe sample quantity and weight, while preserving thegeneral patterns, we normalize the CS feature vector.This is done by dividing the entries of the featurevector by the largest entry of that vector. Let smax =

maxi=11 0001103 j=11 0001110244Esi5j , then the N feature vector is4Es1/smax1 Es2/smax1 0 0 0 1 Es10/smax5 as in Figure 3 (bottomright).

The intuition behind using the previous four meth-ods is that they are simple and the first step attemptto incorporate both spatial and temporal dimensionsof our data set. Using these feature vectors servestwo goals: (i) converting the 2D data to 1D, and(ii) capturing the local temporal changes of spec-tra. To achieve these two goals, one could applymore sophisticated signal processing methods, such

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


as pyramid transform, discrete wavelets transform,or discrete cosine transform. A possible advantage ofusing these signal processing transformations is thatthey may result in better posed covariance matricesof the data, which can potentially improve the clas-sification accuracy of principal components analysis(PCA) and LDA. However, these signal transforma-tions require more computation time than the simplemethods utilized here, and the trade-off between theadded computational complexity and better accuracyshould be investigated. Within the scope of this paperwe have utilized only the aforementioned feature-vectors construction methods, which are simple andcomputationally efficient.

4. ResultsIn this section, results concerning different aspectsof our method are presented: Section 4.1 establishesstandards for measuring the quality of a classificationtechnique to compare across different methods. Sec-tion 4.2 describes the classification methods used hereto compare to SNC and addresses practical aspectsof classification methods such as feature selection.In §4.3, classification methods, including various ver-sions of SVM, LDA, and SNC are presented for theSwanberg et al. (2009) nuclear data. Section 4.4 givesa more detailed account on the results of SNC. Sec-tion 4.5 compares the methods in terms of runningtimes. Multiclassification results are shown in §4.6.Finally, §4.7 presents the influence of different con-structions of feature vectors on the different algo-rithms running time.

As described in §1.1, a classification procedureconsists of two stages: “training” stage, where onetries to infer a functional relation (i.e., y = f 4x5)between a training set 4Ex11y151 0 0 0 1 4Exm1ym5 and itsknown labels y’s; and “testing” phase in whichthe labels ym+11ym+21 0 0 0 of unlabeled input vectorsExm+11 Exm+21 0 0 0 are estimated. The outcome of the train-ing phase is a classifier that is used for labeling newdata points in the testing phase. For SVM and PCAthe testing is very quick. We produce an analogoustesting phase for SNC. The output of the trainingphase of SNC is a bipartition of the training andother data points. When a new data point becomesavailable the testing phase assigns that point to theside of the bipartition that increases the least objectivevalue of NC′ criterion (C4S1 S̄5/C4S1S5, the objectivein Equation (2)). This process involves a comparisonof few values and it is at least as fast as the testingphase for PCA and SVM.

Our study establishes that SNC is faster than SVMand PCA in the training phase, and all three methodsare almost instantaneous, and thus on par with eachother, in the testing phase. Because SNC is also at least

as accurate as the other methods, it should be the pre-ferred technique. The speed of the training phase withSNC makes the recalibration of the algorithm withchanging conditions easy and fast, and therefore canbe done more frequently than if one is to use the othertechniques. Therefore with SNC it is possible to retaina more accurate and updated classification model.

4.1. Quality of ClassificationIn the machine learning community, the quality of aclassification method is generally measured by simu-lation applying the method to a known data set. Here,we use an extended version of the data reported inSwanberg et al. (2009) consisting of 275 data pointsin the form of feature vectors, each labeled withits underlying material: blank (20 samples), latite(22 samples), 235U (92 samples), or 239Pu (140 samples).

To study the performance of the method we applyrandom subsampling, which divides the data set intotwo subsets: training and testing. The training dataare used to construct a classifier, and the labels ofthe testing data are hidden, i.e., we pretend that thelabels are unknown. The classifier provides predictedlabels that are then compared to the true labels. Theaccuracy of a classifier is the fraction of the correctpredictions across all testing data. For statistical sig-nificance purposes, this subsampling and classifica-tion are repeated 100 times. Each time the trainingand testing sets are resampled. The accuracy of a classi-fication method on a certain data set can then be definedas the mean accuracy of these 100 runs. In addition,standard deviation and 95% confidence interval ofthese runs can be calculated. These measure the con-sistency of a classification method on a certain data set. Thelower the standard deviation and confidence interval,the more consistent the method.

Random subsampling can involve different training-testing ratios, e.g., 40%–60%. A 40%–60% ratio meansthat 40% of the total data are used for training and theother 60% are used for testing. In our experiments weused the following ratios: 50%–50%, 40%–60%, 30%–70%, 20%–80%, and 10%–90%. As the size of the train-ing data decreases, less information is provided toconstruct the corresponding classifier. Thus the accu-racy of the classifier decreases. A more robust clas-sification method is one that is less affected by thedecreasing size of the training data.

4.2. Classification Methods

4.2.1. Features Selection and Data DimensionReduction. Feature selection approaches try to find asubset of the original variables (also called features orattributes) that can best explain the data. The underly-ing idea is that other variables do not contribute to theunderstanding of the data and introduce noise. There-fore classification done in the reduced space may bemore accurate than in the original space.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


PCA is a routinely used tool for reducing the dataspace’s dimension. The underlying paradigm of PCAis that an orthogonal linear transformation is per-formed on the data to a new coordinate system withthe properties that the first coordinate, the so-calledfirst principal component, contains the greatest vari-ance by any projection of the data; the second coor-dinate contains the second greatest variance and soon. PCA essentially rotates the data points around itsmean and moves variance into the first few dimen-sions as much as possible. The remaining dimensionscontain negligible amounts of variance and are omit-ted, with a relatively small loss of information. ThusPCA is often used as dimensionality reduction.

To evaluate the principal components, one has tocompute the covariance matrix for all acquired spec-tra. Because the feature vectors used here containmore than 11,200 coefficients each (see §3.2), findingthis covariance matrix and its eigenvectors is com-putationally intractable. When evaluating smaller fea-ture vectors (with 5,000 coefficients) the SNC methodis 150 times faster than PCA. In addition, the gain inaccuracy results from applying PCA before applyingSVM is less than 1%. Therefore, for the task at hand,the use of PCA in this context is unlikely to improvethe overall quality of the detection while significantlyslowing down the detection speed.

4.2.2. Support Vector Machine. SVM is a widelyaccepted tool for classification in many fields, includ-ing machine learning (Fung and Mangasarian 2005),communication and mobile computing (Wang et al.2010), biology (Ding et al. 2007), economics (Hua et al.2007), nuclear applications (Carpenter et al. 2010), andX-ray spectrometry (Luo 2006).

For using SVM optimally one has to choose the typeof kernel and its associated parameters. To this endwe compare SNC to SVM with polynomial and radialbasis function (RBF) kernels. The polynomial kernel’sparameter is the degree of the polynomial d, andRBF uses a derivative parameter � . Another param-eter that applies for both kernels is the soft marginpenalty C (see Burges 1998, Cristianni and Shawe-Taylor 2000, for details).

The selection of the SVM’s parameters follows anexhaustive search. The work of Hastie et al. (2004)provides some guidance on how to search for opti-mal parameters of SVM. It has been shown that therange of SVM solutions with respect to the parametersdoes not grow beyond a certain value and degener-ates quite poorly for very large and very small valuesof C. We thus tune the parameters of SVM by settingthem in a grid 82−712−612−51 0 0 0 1261279, the search-ing range used in Mangasarian and Wild (2007, 2008).Note that the SVM parameters tuning times are notincluded in the computation times that are used tocompare the different methods.

The SVM classification is then performed for allpossible parameters’ combinations. The parameters’set that produces the best classification results is theone used for the evaluation. This tuning procedure isdifferent than that of using a separate validation set orcross validation, and gives the best possible accuracyof a particular SVM classifier (Burges 1998, Cristianniand Shawe-Taylor 2000). The implementation packageused is LIBSVM (Chang and Lin 2001). It is impor-tant to note that although these tuning times are notincluded in our run-time comparison, the tuning pro-cess is time consuming.

SVM-based methods that incorporate the featureselection process as an inherent part rather thanrunning a generic PCA as a preprocessing stagecan be found in the literature. These SVM proce-dures not only produce the classifier (as the regularSVM procedure does), but also the subset of fea-tures that are used for constructing this classifier.These include (1) a feature-reducing linear kernel1-norm SVM (SVM-1) (Zhu et al. 2003); (2) a recursivefeature elimination SVM (SVM-RFE) (Guyon et al.2002); and (3) a feature-reducing Newton method forLPSVM (SVM-NLP) (Fung and Mangasarian 2004).These methods are shown to improve SVM’s predic-tion power by removing features that are of the leastrelevance. SVM-1 is implemented by using MATLABfunction svmtrain with 1-norm option. SVM-RFE isrun with modified MATLAB code from Resson et al.(2009) (the original code has extra functionalitiesthat are not used here). LPSVM was downloadedfrom Fung and Mangasarian (2002). All SVM-relatedparameters were tuned according to the suggestionin Fung and Mangasarian (2005, §4.2.2). The SVM-RFE method requires an additional parameter—thenumber of remaining features. We tune this parameterby exhaustive search in the space 8211221231 0 0 0 12139.

4.2.3. Linear Discriminant Analysis. LDA, likePCA, aims at finding a spanning space for the data.PCA constructs the spanning space (i.e., the principalcomponents) so the first principal component has thelargest possible variance (i.e., accounts for as much ofthe variability in the data as possible), and each suc-ceeding component in turn has the highest variancepossible under the constraint that it be orthogonalto (i.e., uncorrelated with) the preceding components.LDA receives as input a data set and a label foreach data point in the set. The goal of LDA is tofind a spanning space that accounts for as much ofthe variability between classes (or labels) rather thanbetween all data points as PCA (Martinez and Kak2001, McLachlan 2004). The resulting linear combina-tion can also be used as a linear classifier. However,because the dimensionality of our data (the combinednumber of energy channels) is far larger than thenumber of data points, Geisinger (2010) suggests, in

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


cases like this, to perform PCA first before using LDAfor classification—we take this approach and combinePCA with LDA.

To compare PCA-LDA, SVMs, and SNC, we usethe four types of feature vectors CS, SD, CSnSD, Ndescribed in §3.2. For PCA-LDA, the reduced versionof PCA—mentioned earlier for the reason of com-putational cost—is used: the PCA is performed firstby centering data points and adding first few impor-tant principal components until the total varianceaccounted for is just above 80%. Then the standardLDA is used for the training classifier and classifyingthe test data. One note here is that the data is notscaled before PCA is applied; this is because by scal-ing data, in many instances, the resulting covariancematrix becomes singular for the training data fromour data set—thus LDA could not proceed. Therefore,the decision is made to use only centered data whileperforming PCA before LDA.

4.3. Binary ClassificationThe performance of the binary classification method,SNC, is tested here. It is compared to two com-mon classification methods: SVM and LDA, as wellas the three specialized feature-reducing SVMs:1-norm SVM (SVM-1), recursive feature eliminationSVM (SVM-RFE), and Newton method LPSVM(SVM-NLP).

To solve SNC, the graph construction is writtenin MATLAB. The resulting minimum cut problemis solved with Hochbaum’s PseudoFlow algorithm(HPF) the implementation of which is downloadedfrom Hochbaum (2010a). The similarity between twofeature vectors vi and vj is quantified by

wij =1

��vi − vj ��2 + �1

for 0 < � � 1.Table 1 displays the accuracy and precision of

the supervised normalized cut for varying training-testing ratios with different types of feature vectors.For 50%–50% ratio, all four types of feature vectorsproduce similar results. These results for the differentfeature vectors are statistically identical as confirmedby ANOVA test failing to reject null hypothesis at 95%significant level for every pair of vectors. However,as the training proportion decreases, the CS featurevector gives the highest accuracy. This behavior alsocharacterized the standard deviation and the 95% con-fidence interval, which are also the best for CS.

Table 2 details the corresponding results for SVMand specialized SVMs when using either RBF orpolynomial kernels. Each of these kernels takes userdefined parameters including a parameter for soft

Table 1 SNC Runs for the Feature Vectors {CS, SD, CSnSD, N} withFive Training-Testing Ratios

CS (%) SD (%) CSnSD (%) N (%)

50%–50%

Mean 99062 99061 99052 99017Std. dev. 0043 0043 0043 208395% CI 0008 0008 0008 0055

40%–60%

Mean 99025 92002 99062 95073Std. dev. 0034 16080 0036 607695% CI 0007 3029 0007 1033

30%–70%

Mean 99062 46048 99056 92038Std. dev. 0030 6089 0028 802795% CI 0006 0008 0008 0055

20%–80%

Mean 99059 42002 98098 80008Std. dev. 0023 1022 3031 604395% CI 0005 0024 0065 1026

10%–90%

Mean 98061 41014 85091 50037Std. dev. 4024 1037 11055 1607795% CI 0083 0027 2026 3029

Notes. Mean is the average accuracy of a prediction based on 100 runs;std. dev. and 95% CI are the standard deviation and the 95% confidenceinterval of the prediction. A higher average indicates a more accurate predic-tion, and a lower standard deviation and a lower confidence interval indicatehigher consistency in the prediction.

margin penalty C. In addition, RBF uses a deriva-tive parameter � . For SVM-RFE, the optimal num-ber of features is also displayed (NumFeat) and forSVM-NLP, another parameter � is included. Table 2lists the best accuracy results for various SVM meth-ods for different training-testing ratios, and the corre-sponding parameters that provide the best accuracyfor those ratios.

Similar to the results of SNC, CS gives high accu-racy in most cases. Unlike the results for SNC, for50%–50% ratio, CSnSD gives better accuracy whenrun with SVM-RFE. Still, using CS feature vectorsgives highly accurate results for both the SNC and theSVM methods. We conclude that CS is the best suit-able of the feature vectors to use. Furthermore, for allmethods involved that take the RBF kernel (SVM andSVM-1), RBF consistently presents better results thanpolynomial kernels.

Among SVM and specialized SVMs, SVM-1 appearsto improve the results of SVM, whereas SVM-RFEimproves the results only in some cases and SVM-NLP does not appear to improve SVM on this setof data.

Table 3 displays the detailed results for PCA-LDA.We observe that the results indicate that SD is a moreappropriate feature construction for PCA-LDA. This

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


Table 2 SVM, SVM-1, SVM-RFE, and SVM-NLP Runs with Five Training-Testing Ratios

SVM SVM-1 SVM-RFE SVM-NLP

50%–50% 99.59% 99.54% 99.38% 98.27(CS, C = 32, RBF, � = 128) (CS, C = 32, RBF, � = 128) (CSnSD, C = 1, Feats = 256) (N, C = 100, � = 256)

40%–60% 99.60% 99.68% 99.10% 94.75%(CS, C = 16, RBF, � = 128) (CS, C = 16, RBF, � = 128) (CS, C = 2, Feats = 128) (CS, C = 100, � = 2−11)

30%–70% 99.46% 99.58% 98.92% 94.56%(CS, C = 128, RBF, � = 128) (CS, C = 64, RBF, � = 128) (CS, C = 32, Feats = 128) (CSnSD, C = 100, � = 1)

20%–80% 99.43% 99.56% 98.88% 94.92%(CS, C = 64, RBF, � = 128) (CS, C = 64, RBF, � = 128) (CS, C = 16, Feats = 512) (CS, C = 100, � = 00031)

10%–90% 98.13% 98.80% 96.74% 93.18%(CS, C = 128, RBF, � = 128) (CS, C = 64, RBF, � = 128) (CS, C = 00008, Feats = 11024) (CS, C = 100, � = 00004)

Notes. The best accuracy result for each method and each ratio is listed along with the optimal parameters. Feats is the optimal number of features forSVM-RFE.

is in contrast to CS as the best and most accurate fea-ture construction for both SNC and SVM. The onlycase for which SD is not best for PCA-LDA is whenthe training-testing percentage is 10%–90%. In thisinstance, N gives better accuracy, but SD still givesthe most consistent results.

Figure 4 summarizes the results of Tables 1–3.In Figure 4, the highest accuracy is presented for thedifferent training-testing ratios. Examining the graphin Figure 4 shows that SNC, in terms of accuracy, ison par or superior both in accuracy and robustness

Table 3 PCA Plus Linear Discriminant Analysis Runs for the Feature-Vectors {CS, SD, CSnSD, N} with Five Training-Testing Ratios

CS (%) SD (%) CSnSD (%) N (%)

50%–50%

Mean 76091 97004 77056 93000Std. dev. 3039 1026 4046 307295% CI 0066 0025 0087 0073

40%–60%

Mean 76021 96096 76003 93042Std. dev. 4002 1046 4010 308695% CI 0079 0029 0080 0076

30%–70%

Mean 76083 96061 75055 93028Std. dev. 4033 1077 5010 402595% CI 0085 0035 1000 0083

20%–80%

Mean 74068 95046 74055 93043Std. dev. 5045 2025 5015 506695% CI 1007 0044 1001 1011

10%–90%

Mean 74013 91034 74076 94000Std. dev. 5090 4086 5039 500595% CI 1016 0095 1006 0099

Notes. “Mean” is the average accuracy of a prediction based on 100 runs;“std. dev.” and “95% CI” are the standard deviation and the 95% confidenceinterval of the prediction, respectively. A higher average indicates a moreaccurate prediction, and a lower standard deviation and a lower confidenceinterval indicate higher consistency in the prediction.

to the methods compared, except SVM-1. SVM-1 for10%–90% ratio has (slightly) higher accuracy whichcomes at a price of a substantial increase in run-ning time (see §4.5). In terms of robustness, whichis measured by the decrease in accuracy as the sam-ple size decreases, SNC presents better results thanmost methods. For example, in the case of 10%–90%training-testing ratio, SNC has a more than 1% accu-racy lead over SVM, 2% lead over SVM-RFE, and4% lead over PCA-LDA.

4.4. SNC Misclassifications AnalysisA confusion matrix is a presentation of the data thathelps to identify the source of prediction errors. In aconfusion matrix, the rows are the true labels of thedata points and the columns are the predictions. Forexample, an entry at position (Pu, U) corresponds tothe average number of data points, over 100 runs, thatare incorrectly labeled as uranium when their trueidentity is plutonium. According to this definition, the

100

99

98

97

96

95

94

9310 20 30 40 50

Training percentage (%)

Acc

urac

y (%

)

SNCSVMSVM-1SVM-RFESVM-NLPPCA-LDA

Figure 4 The Best Classification Accuracy for SNC, SVM, SpecializedSVMs, and PCA-LDA with Different Training Sizes

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


Table 4 Confusion Matrices for Different Training Sizes

50% training 40% training 30% training 20% training 10% training

Pu U Pu U Pu U Pu U Pu U

Pu 69.56 0.44 83.88 0.67 97.38 0.62 111.24 0.76 123.1 2.9U 0 46 0 55 0 65 0 74 0 83

sum of diagonal entries is the number of correctly pre-dicted points. Table 4 is the confusion matrices of theresults of SNC with CS feature vectors for the differ-ent training-testing ratios. Table 4 shows that all sam-ples acquired in the presence of uranium were labeledwith 100% accuracy. It is interesting to observe thatthe only source of error for all matrices is the misclas-sification of plutonium samples as uranium samples.

4.5. Run TimesWe report on the run times of SVM and specializedSVMs, that exclude the time required to find the besttuned parameters, including only run times for train-ing and classification. Figure 5 graphs the runningtimes of SNC, SVM, and PCA-LDA. Because the com-plexity of SVMs depends on the number of trainingdata points, the smaller the training data, the shorterSVMs’ running times. SVM-RFE and SVM-NLP havethe longest running times—about 80 times more thanthat of SNC. SVM-1, whose accuracy result is a bit bet-ter at 10%–90% ratio than SNC, is four times slowerthan SNC at that ratio. The running time of SVM-RFEis influenced not only by the ratio, but also by thenumber of features used. For SNC, the run time ofthe algorithm is dominated by the graph construction,and therefore appears constant regardless of the sizeof the training set. PCA-LDA has the running timeof more than 40 times than that of SNC—this is pri-marily because of using PCA, which is the reduced

100

10

1

Run

ning

tim

e (s

econ

ds)

0.1

SNC SVM

SVM-1

SVM-NLP PCA-LDA

SVM-RFE

4.30

8.77

4.52 4.35 4.35

2.9513

5.41

9.43

10.2 11.2398 13.6038

4.9395

11.8

1.6918

0.84210.830.78

0.600.57720.5886

0.500.39

0.47

2.2741

0.190.100.100.10 0.10

10 20 30

Training percentage (%)40 50

Figure 5 Computation Times of SNC, SVM, Specialized SVMs, andPCA-LDA for Different Training Sizes

Note. The graph is drawn in log scale and the running times are labeled nextto the curves.

version; the full version of PCA takes even longer.Figure 5 clearly shows that SNC is significantly moreefficient than SVMs (factor of 2–80) and PCA-LDA(factor of 40) under the same hardware setup (1.3 GHzIntel SU7300 Core 2 Duo ULV Processor with 1 GB1,066 MHz RAM).

4.6. MulticlassificationIn this section we evaluate the performance of SNCwith respect to SVM and PCA-LDA for solvingmulticlassification problems. As both SNC and SVMuse binary classification as a subroutine for solvingthe multiclassification problem, we apply the votingmechanism described in §2.2 for both methods. Spe-cialized SVMs have less established extension frombinary to multiclassifications and are left for futureinvestigations. For the evaluation process we usethree subsets of the Swanberg et al. (2009) data set:(1) blank, plutonium, and uranium; (2) latite, pluto-nium, and uranium; and (3) blank, latite, plutonium,and uranium. Note that the latter consists of the entiredata set.

The results for SNC of 50%–50% training-testingcase (Table 5) show that SNC-based multiclassificationgives highly accurate and consistent predictions forseveral sets of data. When all four materials are used,the best prediction accuracy is 98.65% under CSnSDfeature vectors. In fact across all permutations, CSnSDis the best feature vector in terms of both accuracyand consistency. We note that the presence of latiteaffects the quality of the classification more adverselythan blank or background noise, as observed in thedifference between the first row and the second rowof Table 5. When all materials are present, the predic-tion improves from that with latite, plutonium, anduranium.

Figure 6 displays the comparison among SNC,SVM, and PCA-LDA, for three different classificationproblems. The results presented in Figure 6, for eachtraining portion, are the best results achieved acrossall processing methods (CS, SD, CSnSD, and N). Ascan be seen in Figure 6 for all classification setups andat each training portion, SNC produces better accu-racy than SVM and PCA-LDA. Furthermore, in termsof robustness, SNC is superior to SVM and PCA-LDA.

Table 5 Multiclassification for Different Permutations of the Data Set(B: Blank; L: Latite; Pu: Plutonium; and U: Uranium) and theFour Different Kinds of Feature Vectors 8CS, SD, CSnSD, N9

CS (%) SD (%) CSnSD (%) N (%)

Std. Std. Std. Std.Mean dev. Mean dev. Mean dev. Mean dev.

L, Pu, and U 94085 3040 98034 0098 98091 0078 84061 0027B, Pu, and U 100000 0000 99088 1021 100000 0000 87037 0054B, L, Pu, and U 98065 1025 98032 1075 98065 1012 86022 0062

Note. The subsampling ratio is 50%–50%.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


100.00

99.00

98.00

97.00

96.00

95.00

94.00

93.00

92.00

91.00

90.0010 20 30 40 50

Training percentage (%)

Acc

urac

y (%

)

SNC-L,U,PuSNC-B,L,U,PuSVM-B,U,Pu

SNC-B,U,PuSVM-L,U,PuSVM-B,L,U,PuPCA-LDA-B,U,PuPCA-LDA-L,U,Pu

PCA-LDA-B,L,U,Pu

Figure 6 The Best Multiclassification Accuracy for SNC, SVM, and PCA-LDA with Different Training Sizes

These results are in agreement with the results ofbinary classification, which are reported in §4.2.1.

Figure 7 displays the run times of SNC, SVM, andPCA-LDA. It is noted that the SVM method requiresextensive tuning of parameters for each data set andeach feature-vector representation method (§3.2). Therunning times of SVM, reported in Figure 7, do notinclude the time it takes for tuning of the variousparameters. Were these times to be included, thesuperiority of SNC’s efficiency would have been evenmore pronounced.

4.7. Constructions of Feature Vectors andRunning Time

We conclude this section with the presentation ofrunning time results for the different feature vectors

10

1

Run

ning

tim

e (s

econ

ds)

0.110 20 30

Proportion of training (%)

40 50

SVM-L,U,Pu

SNC-L,U,Pu

PCA-LDA-L,U,PuPCA-LDA-B,U,PuPCA-LDA-B,L,U,Pu

SVM-B,U,PuSVM-B,L,U,Pu

SNC-B,U,Pu

SNC-B,L,U,Pu

Figure 7 Computation Times of SNC, SVM, and PCA-LDA forDifferent Training Sizes

Note. The running times of B, U, Pu and L, U, Pu overlap for SNC.

presented in §3.2. The major reason for choosing thesemethods is their simplicity: despite the fact that morecomplex derived features such as those from signalprocessing may produce better results, the four meth-ods proposed here are simple and incorporate bothspatial and temporal dimensions of our data set.

Figure 8 displays the running times of variousfeature-vector constructions with different algo-rithms—SVM, SNC, and PCA-LDA for the 50%–50%training-testing ratio of binary classification problems.Overall, Figure 8 shows, as concluded in the previous

0.93

0.86

1.65

0.95

0.19

0.19

0.26

0.21

4.23 4.

98

8.63 5.

54

0.1

1

10

CS SD CSnSD N

Run

ning

tim

e (s

econ

ds)

SVM SNC PCA-LDA

Figure 8 Computation Times of SNC, SVM, and PCA-LDA forDifferent Feature-Vector Constructions {CS, SD, CSnSD, N}for the 50%–50% Training-Testing Ratio of BinaryClassification Problems

Note. The plot is drawn in log scale and individual running times are markedabove their respective bars.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


sections, that SNC is the fastest algorithm involved,SVM is the next fastest, and PCA-LDA is the slow-est. We can also observe how sensitive the runningtimes depend on different feature-vector construc-tions: CSnSD appears to require the most computa-tion than other methods—this is confirmed by the factthat CSnSD gives the longest feature vectors amongall methods. The next method that constructs featurevectors and requires long running times for all meth-ods is N, followed by CS and SD, which are the fastestfor PCA-LDA and SVM, respectively, and tied forSNC. One should note that the SNC algorithm takesthe edge weights as input and thus is not affectedby the length of the feature vectors. The computa-tion times of the weights, however, are proportional tothe feature-vector lengths. SVM and PCA-LDA havedifferent running times when applied on the differ-ent feature vectors. This observation strongly suggeststhat if one considers to replace the method for gener-ating the feature vectors, he should consider not onlythe method’s running times for creating the vectors,but also the total run times with the new vectors.

These observations, combined with the accuracyresults from §4.2.1, conclude that CS is the mostappropriate feature-vector construction method—using feature vectors constructed by CS allows algo-rithms to have fast and accurate classification results.

5. ConclusionsWe present here a new technique for classification andclustering devised for the purpose of enhancing theidentification of nuclear threats with low-resolutiondetectors. The technique builds on a bipartitioningprocedure called normalized cut prime (NC′). Thesolution method proposed here incorporates train-ing data and as such it is a supervised classificationmethod, supervised normalized cut (SNC). We test SNCand compare it with SVM, specialized SVMs, andLDA on data of low-resolution nuclear spectra. Theresults demonstrate that SNC is comparable or supe-rior to SVM methods in terms of accuracy and muchmore superior in terms of efficiency and robustnessfor either the binary or the multiclassification prob-lem of the nuclear data set. The analysis presented inthe paper provides a proof of concept that the SNCapproach is worth investigating further in this contextand with more detailed and advanced data sets. Itshould also be pursued as a complementary approachfor identifying isotopes in spectra obtained even withhigher resolution detectors, complementing the analy-sis of each energy line and their combinations. Futureresearch will test supervised normalized cut on addi-tional nuclear data sets on a vaster variety of SNMsas they become available. It is planned also to extendthe application of SNC beyond the nuclear detection

problems to more general classification and data min-ing problems.

AcknowledgmentsThis research has been supported in part by CBET-0736232and by the U.S. Department of Homeland Security. Theresearch of the third author was also supported in part bythe National Science Foundation award [DMI-0620677].

ReferencesBertozzi W, Ledoux RJ (2005) Nuclear resonance fluorescence imag-

ing in non-intrusive cargo inspection. Nuclear Instruments andMethods Phys. Res. Sect. B: Beam Interactions with Materials andAtoms 241(1–4):820–825.

Burges CJC (1998) A tutorial on support vector machines for pat-tern recognition. Data Mining Knowledge Discovery 2(2):121–167.

Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervisedlearning of semantic classes for image annotation and retrieval.IEEE Trans. Pattern Anal. Machine Intelligence 29:394–410.

Carpenter T, Cheng J, Roberts F, Xie M (2010) Sensor managementproblems of nuclear detection. Pham H, ed. Safety and RiskModeling and Its Applications (Springer, London), 299–323.

Caruana R, Niculescu-Mizil A (2006) An empirical comparison ofsupervised learning algorithms. Cohen W, Moore A, eds. Proc.23rd Internat. Conf. Machine Learn., ICML ’06 (ACM, New York),161–168.

Chang C-C, Lin C-J (2001) LIBSVM: A Library For Support Vec-tor Machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Cox IJ, Rao SB, Zhong Y (1996) Ratio regions: A technique for imagesegmentation. Kropatsch WG, ed. Proc. Int. Conf. Pattern Recog-nition, Vol. B (IEEE Computer Society Press, Los Alamitos, CA),557–564.

Crammer K, Singer Y (2002) On the learnability and design ofoutput codes for multiclass problems. Machine Learn. 47(2–3):201–233.

Cristianni N, Shawe-Taylor J (2000) An Introduction to Support VectorMachines and Other Kernel-Based Learning Methods (CambridgeUniversity Press, Cambridge, UK).

Ding YS, Zhang TL, Chou KC (2007) Prediction of proteinstructure classes with pseudo amino acid composition andfuzzy support vector machine network. Protein Peptide Lett.14(8):811–815.

Dowd PA, Pardo-Iguzquiza E (2005) Estimating the boundary sur-face between geologic formations from 3D seismic data usingneural networks and geostatistics. Geophysics 70(1):1–11.

Duan K-B, Keerthi SS (2005) Which is the best multiclass SVMmethod? An empirical study. Oza NC, Robi P, Josef K, FabioR, eds. Multiple Classifier Systems, Lecture Notes in ComputerScience, Vol. 3541 (Springer, Berlin), 732–760.

Duda RO, Hart PE, Stork DG (2001) Unsupervised learning andclustering. Pattern Classification, Chap. 10, 2nd ed. (Wiley,New York).

Ford LR, Fulkerson DR (1956) Maximal flow through a network.Canadian J. Math. 8(3):399–404.

Fung G, Mangasarian OL (2002) A feature selection Newtonmethod for support vector machine classification. Technicalreport, University of Wisconsin, Madison.

Fung G, Mangasarian OL (2004) A feature selection Newtonmethod for support vector machine classification. Comput.Optim. Appl. 28(2):185–202.

Fung GM, Mangasarian OL (2005) Multicategory proximal supportvector machine classifiers. Machine Learn. 59(1–2):77–97.

Geisinger NP (2010) Classification of digital modulation schemesusing linear and nonlinear classifiers. Ph.D. thesis, Naval Post-graduate School, Monterey, CA.

Gentile CA (2010) US Patent 7,711,661 b2, filed May 2, 2007, issuedMay 4, 2010.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.


Goldschmidt O, Hochbaum DS (1994) A polynomial algorithm forthe k-cut problem for fixed k. Math. Oper. Res. 19(1):24–37.

Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection forcancer classification using support vector machines. MachineLearn. 46(1–3):389–422.

Hastie T, Rosset S, Tibshirani R, Zhu J (2004) The entire regulariza-tion path for the support vector machine. J. Machine Learn. Res.5(1):1391–1415.

Hochbaum DS (2010a) HPF: Hochbaum’s Pseudo-Flow Algo-rithm Implementation. Software available at http://riot.ieor.berkeley.edu/riot/Applications/Pseudoflow/maxflow.html.

Hochbaum DS (2010b) Polynomial time algorithms for ratio regionsand a variant of normalized cut. IEEE Trans. Pattern Anal.Machine Intelligence 32(5):889–898.

Hua Z, Wang Y, Xu X, Zhang B, Liang L (2007) Predicting cor-porate financial distress based on integration of support vec-tor machine and logistic regression. Expert Systems Appl. 33(2):434–440.

International Atomic Energy Agency (2007) Combating illicit traf-ficking in nuclear and other radioactive material. Techni-cal report/reference manual, IAEA Nuclear Security Series 6,International Atomic Energy Agency, Vienna, Austria.

Kangas LJ, Keller PE, Siciliano ER, Kouzes RT, Ely JH (2008) Theuse of artificial neural networks in PVT-based radiation portalmonitors. Nuclear Instruments Methods Phys. Res. Sect. A, Accel-erators, Spectrometers, Detectors Associated Equipment 587(2–3):398–412.

Kelly EL (1962) General description and operating characteristicsof the Berkeley 88-inch cyclotron. Nuclear Instruments Methods18–19(1):33–40.

Kotsiantis S (2007) Supervised learning: A review of classificationtechniques. Informatica 31(1):249–268.

Luo L (2006) Chemometrics and its applications to X-ray spectrom-etry. X-ray Spectrometry 35(4):215–225.

Mangasarian OL, Wild EW (2007) Nonlinear knowledge in kernelapproximation. IEEE Trans. Neural Networks 18(1):300–306.

Mangasarian OL, Wild EW (2008) Nonlinear knowledge-based clas-sification. IEEE Trans. Neural Networks 19(10):1826–1832.

Marrs RE, Norman EB, Burke JT, Macri RA, Shugart HA, Browne E,Smith AR (2008) Fission-product gamma-ray line pairs sensi-tive to fissile material and neutron energy. Nuclear InstrumentsMethods Phys. Res. Sect. A: Accelerators, Spectrometers, DetectorsAssociated Equipment 592(3):463–471.

Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans. PatternAnal. Machine Intelligence 23(2):228–233.

McLachlan G (2004) Discriminant Analysis and Statistical PatternRecognition (Wiley Interscience, New York).

Mihailescu L, Fishbain B, Hochbaum DS, Maltz J, Moore CJ, Rohel J,Supic L, Vetter K (2010) Dynamic stand-off 3D gamma-rayimaging. Wehe DK, ed. 12th Sympos. Radiation MeasurementsAppl. (SORMA) (National Nuclear Security Administration,Washington, DC), 106.

Norman EB, Prussin SG, Larimer R-M, Shugart H, Browne E, SmithAR, McDonald RJ, et al. (2004) Signatures of fissile materials:High-energy [gamma] rays following fission. Nuclear Instru-ments Methods Phys. Res. Sect. A: Accelerators, Spectrometers,Detectors Associated Equipment 521(2–3):608–610.

Resson HW, Varghese RS, Goldman R (2009) Computational meth-ods for analysis of MALDI-TOF spectra to discover pep-tide serum biomarkers. The Protein Protocols Handbook IV:1175–1183.

Rifkin R, Klautau A (2004) In defense of one-vs-all classication.J. Machine Learn. Res. 5(1):101–141.

Sharon E, Galun M, Sharon D, Basri R, Brandt A (2006) Hierarchyand adaptivity in segmenting visual scenes. Nature 442(7104):810–813.

Shi J, Malik J (2000) Normalized cut and image segmentation. IEEETrans. Pattern Anal. Machine Intelligence 22(8):888–905.

Swanberg E, Norman E, Shugart H, Prussin S, Browne E (2009)Using low resolution gamma detectors to detect and differen-tiate 239Pu and 235U fissions. J. Radioanalytical Nuclear Chemistry282(3):901–904.

Wang F, Qian Y, Dai Y, Wang Z (2010) A model based on hybridsupport vector machine and self-organizing map for anomalydetection. Wang C-X, Fan P, Shen X, He Y, eds. Comm. MobileComput., Internat. Conf. Vol. 1 (IEEE Computer Society, LosAlamitos, CA), 97–101.

West D (2000) Neural network credit scoring models. Comput. Oper.Res. 27(11–12):1131–1152.

Zhang J, Zhang XD, Ha S-W (2008) A novel approach using PCAand SVM for face detection. Guo M, Zhao L, Wang L, eds.Fourth Internat. Conf. Nat. Comput., (ICNC ’08) Vol. 3 (IEEEComputer Society, Los Alamitos, CA), 29–33.

Zhu J, Rosset S, Hastie T, Tibshirani R (2003) 1-norm support vectormachines. Thrun S, Saul L, Schölkopf B, eds. Neural InformationProcessing Systems (MIT Press, Cambridge, MA), 16–23.

Copyright:

INFORMS

holdsco

pyrig

htto

this

Articlesin

Adv

ance

version,

which

ismad

eav

ailableto

subs

cribers.

The

filemay

notbe

posted

onan

yothe

rweb

site,includ

ing

the

author’s

site.Pleas

ese

ndan

yqu

estio

nsrega

rding

this

policyto

perm

ission

s@inform

s.org.

The Supervised Normalized Cut Method for Detecting, Classifying ...

Documents

Transcript of The Supervised Normalized Cut Method for Detecting, Classifying ...