SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP...

5
SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa Center for Automation Research, UMIACS, University of Maryland College Park, MD 20742-3275, USA gibert,pvishalm,[email protected] ABSTRACT The condition of railway tracks needs to be periodically monitored to ensure passenger safety. Cameras mounted on a moving vehi- cle such as a hi-rail vehicle or geometry inspection car can generate large volumes of high resolution images. Extracting accurate infor- mation from those images has been challenging due to the clutter in the railroad environment. In this paper we describe a novel ap- proach to visual track inspection using semantic segmentation with Deep Convolutional Neural Networks. We show that DCNNs trained end-to-end for material classification are more accurate than shallow learning machines with hand-engineered features and are more ro- bust to noise. Our approach results in a material classification ac- curacy of 93.35% using 10 classes of materials. This allows for the detection of crumbling and chipped tie conditions at detection rates of 86.06% and 92.11%, respectively, at a false positive rate of 10 FP/mile on the 85-mile Northeast Corridor (NEC) 2012-2013 con- crete tie dataset. Index TermsSemantic Segmentation, Deep Convolutional Neural Networks, Railway Track Inspection, Material Classification. 1. INTRODUCTION Railway tracks need to be regularly inspected to ensure train safety. Crossties, also known as sleepers, are responsible for supporting the rails and maintaing track geometry within safety ranges. Tracks have been historically built with timber ties, but during the last half cen- tury, steel reinforced concrete has been the preferred material for building crossties. Concrete ties have several advantages over wood ties, such being a more uniform product, with better control of tol- erances, as well as being well adapted for elastic fasteners, which control longitudinal forces better than conventional ones. Moreover, by being heavier than timber ties, concrete ties promote better track stability [1]. For all these reasons, concrete ties have been widely adopted, specially in high speed corridors. Although concrete ties have life expectancies of up to 50 years, they may fail prematurely for a variety of reasons, such as the result of alkali-silicone reaction (ASR) [2] or delayed ettringite formation (DEF) [3]. Ties may also develop fatigue cracks due to normal traf- fic or by being impacted by flying debris or track maintenance ma- chinery. Once small cracks develop, repeated cycles of freezing and thawing will eventually lead to a bigger defects. This work was supported by the Federal Railroad Administration under contract DTFR53-13-C-00032. The authors thak Amtrak, ENSCO, Inc. and the Federal Railroad Administration for providing the data used in this paper. Le# Rail Right Rail Ballast Fasteners Cross3e Field side Field side Gage side Track Gage (1,435 mm) Fig. 1. Definition of basic track elements. In the United States, regulations enforced by the Federal Rail- road Administration (FRA) 1 prescribe visual inspection of high speed rail tracks with a frequency of once or twice per week, de- pending on track speed. These manual inspections are currently being performed by railroad personnel, either by walking on the tracks or by riding a hi-rail vehicle at very low speeds. However, such inspections are subjective and do not produce an auditable visual record. In addition, railroads usually perform automated track inspections with specialized track geometry measurement vehicles at intervals of 30 days or less between inspections. These automated inspections can directly detect gage widening conditions. However, it is preferable to discover track problems before they develop into gage widening conditions. The locations and names of the basic track elements mentioned in this paper are shown in Figure 1. Recent advances in CMOS imaging technology, have resulted in commercial-grade line-scan cameras that are capable of captur- ing images at resolutions of up to 4,096×1 and line rates of up to 140 KHz. At the same time, high-intensity LED-based illuminators available in the market, whose life expectancies are in the range of 50,000 hours, enable virtually maintenance-free operation over sev- eral months. Therefore, technology that enables autonomous visual track inspection from an unattended vehicle (such as a passenger train) may become a reality in the not-too-distant future. In our pre- vious work [4, 5] we addressed the problems of detecting and cate- gorizing cracks and defective fasteners. The work described in this paper complements these earlier ones by addressing the problem of parsing the whole track image and identifying its components, as well as finding indications of crumbling and chipping on ties. This paper is organized as follows. In Section 2, we review some related works on this topic. Details of our approach are given in Sec- tion 3. Experimental results on 85 miles of tie images are presented 1 49 CFR 213 – Track Safety Standards

Transcript of SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP...

Page 1: SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa

SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEPCONVOLUTIONAL NEURAL NETWORKS

Xavier Gibert, Vishal M. Patel, and Rama Chellappa

Center for Automation Research, UMIACS, University of MarylandCollege Park, MD 20742-3275, USAgibert,pvishalm,[email protected]

ABSTRACT

The condition of railway tracks needs to be periodically monitoredto ensure passenger safety. Cameras mounted on a moving vehi-cle such as a hi-rail vehicle or geometry inspection car can generatelarge volumes of high resolution images. Extracting accurate infor-mation from those images has been challenging due to the clutterin the railroad environment. In this paper we describe a novel ap-proach to visual track inspection using semantic segmentation withDeep Convolutional Neural Networks. We show that DCNNs trainedend-to-end for material classification are more accurate than shallowlearning machines with hand-engineered features and are more ro-bust to noise. Our approach results in a material classification ac-curacy of 93.35% using 10 classes of materials. This allows for thedetection of crumbling and chipped tie conditions at detection ratesof 86.06% and 92.11%, respectively, at a false positive rate of 10FP/mile on the 85-mile Northeast Corridor (NEC) 2012-2013 con-crete tie dataset.

Index Terms— Semantic Segmentation, Deep ConvolutionalNeural Networks, Railway Track Inspection, Material Classification.

1. INTRODUCTION

Railway tracks need to be regularly inspected to ensure train safety.Crossties, also known as sleepers, are responsible for supporting therails and maintaing track geometry within safety ranges. Tracks havebeen historically built with timber ties, but during the last half cen-tury, steel reinforced concrete has been the preferred material forbuilding crossties. Concrete ties have several advantages over woodties, such being a more uniform product, with better control of tol-erances, as well as being well adapted for elastic fasteners, whichcontrol longitudinal forces better than conventional ones. Moreover,by being heavier than timber ties, concrete ties promote better trackstability [1]. For all these reasons, concrete ties have been widelyadopted, specially in high speed corridors.

Although concrete ties have life expectancies of up to 50 years,they may fail prematurely for a variety of reasons, such as the resultof alkali-silicone reaction (ASR) [2] or delayed ettringite formation(DEF) [3]. Ties may also develop fatigue cracks due to normal traf-fic or by being impacted by flying debris or track maintenance ma-chinery. Once small cracks develop, repeated cycles of freezing andthawing will eventually lead to a bigger defects.

This work was supported by the Federal Railroad Administration undercontract DTFR53-13-C-00032. The authors thak Amtrak, ENSCO, Inc. andthe Federal Railroad Administration for providing the data used in this paper.

Le#  Rail   Right  Rail  

Ballast  

Fasteners  

Cross3e  

Field  side   Field  side  Gage  side  

Track  Gage  (1,435  mm)  

Fig. 1. Definition of basic track elements.

In the United States, regulations enforced by the Federal Rail-road Administration (FRA)1 prescribe visual inspection of highspeed rail tracks with a frequency of once or twice per week, de-pending on track speed. These manual inspections are currentlybeing performed by railroad personnel, either by walking on thetracks or by riding a hi-rail vehicle at very low speeds. However,such inspections are subjective and do not produce an auditablevisual record. In addition, railroads usually perform automated trackinspections with specialized track geometry measurement vehiclesat intervals of 30 days or less between inspections. These automatedinspections can directly detect gage widening conditions. However,it is preferable to discover track problems before they develop intogage widening conditions. The locations and names of the basictrack elements mentioned in this paper are shown in Figure 1.

Recent advances in CMOS imaging technology, have resultedin commercial-grade line-scan cameras that are capable of captur-ing images at resolutions of up to 4,096×1 and line rates of up to140 KHz. At the same time, high-intensity LED-based illuminatorsavailable in the market, whose life expectancies are in the range of50,000 hours, enable virtually maintenance-free operation over sev-eral months. Therefore, technology that enables autonomous visualtrack inspection from an unattended vehicle (such as a passengertrain) may become a reality in the not-too-distant future. In our pre-vious work [4, 5] we addressed the problems of detecting and cate-gorizing cracks and defective fasteners. The work described in thispaper complements these earlier ones by addressing the problem ofparsing the whole track image and identifying its components, aswell as finding indications of crumbling and chipping on ties.

This paper is organized as follows. In Section 2, we review somerelated works on this topic. Details of our approach are given in Sec-tion 3. Experimental results on 85 miles of tie images are presented

149 CFR 213 – Track Safety Standards

Page 2: SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa

9  

9  

1  

48  

64  

256  

10  

stride  2   pooling  

5  

5  5  

5  

1  1  

relu  pooling  

relu  pooling  

input  conv1   conv2  

conv3  conv4  

Fig. 2. Network architecture.

in Section 4. Section 5 concludes the paper with a brief summaryand discussion.

2. PRIOR WORK

During the last two decades, railways have been adopting machinevision technology to automate the inspection of their track. The veryfirst systems that allowed recording of images of the track for hu-man review and were deployed in the late 1990’s[6, 7]. In morerecent years, several vision systems have been developed to addressdifferent types of inspection needs, such as crack detection [8, 9, 4],defective or missing rail fasteners [10, 11, 12, 5], and missing spikeson tie plates [13]. In [14], Resendiz et al. introduced a method forinspecting railway tracks. The system segmented wood ties fromballast using a combination of Gabor filters and a SVM classifier.

The idea of enforcing translation invariance in neural networksvia weight sharing goes back to Fukoshima’s Neocognitron[15].Based on this idea, LeCun et al. developed the concept intoDeep Convolutional Neural Networks (DCNN) and used it fordigit recognition[16], and later for more general optical characterrecognition (OCR)[17]. During the last two years, DCNN havebecome ubiquitous in achieving state-of-the-art results in imageclassification[18, 19] and object detection [20]. This resurgence ofDCNNs has been facilitated by the availability of efficient GPU im-plementations. More recently, DCNNs have been used for semanticimage segmentation. For example, the work of [21] shows how aDCNN can be converted in to a Fully Convolutional Network (FCN)by replacing fully-connected layers with convolutional ones.

3. PROPOSED APPROACH

In this section, we describe the proposed approach to track inspec-tion using material-based semantic segmentation.

3.1. Architecture

Our implementation is a fully convolutional neural network based onBVLC Caffe[22]. We have a total of 4 convolutional layers betweenthe input and the output layer. The network uses rectified linear units(ReLU) as non-linearity activation functions, overlapping max pool-ing units of size 3× 3 and stride of 2. In our experiments we foundthat dropout is not necessary. Since no preprocessing at done in thesensor, we first apply global gain normalization on the raw image toreduce the intensity variation across the image. This gain is calcu-lated by smoothing the signal envelope estimated using a median fil-ter. We estimate the signal envelope by low-pass filtering the imagewith a Gaussian kernel. Although DCNNs are robust to illuminationchanges, normalizing the image to make the signal dynamic rangemore uniform improves accuracy and convergence speed. We also

(a) (b) (c)

(d) (e) (f) (g)

(g) (h) (i)

Fig. 3. Material categories. (a) ballast (b) wood (c) rough concrete(d) medium concrete (e) smooth concrete (f) crumbling concrete (g)lubricator (h) rail (i) fastener

Fig. 4. Filters learned at first convolutional layer (range normalizedfor display).

subtract the mean intensity value calculated on the whole trainingset.

This preprocessed image is the input to our network. The ar-chitecture is illustrated in Figure 2. The first layer takes a globallynormalized image and filters it with 48 filters of size 9× 9. The sec-ond convolutional layer takes the (pooled) output of the first layerand filters it with 64 kernels of size 5× 5× 48. The third layer takesthe (rectified, pooled) output of the second layer and filters it with256 kernels of size 5 × 5 × 48. The forth convolutional layer takesthe (rectified, pooled) output of the third layer and filters it with 10kernels of size 1× 1× 256.

The output of the network contains 10 score maps at 1/16th ofthe original resolution. Each value Φi(x, y) in the score map cor-responds to the likelihood that pixel location (x, y) contains mate-rial of class i. The 10 classes of materials are defined in Figure 3.The network has a total of 493,226 learnable parameters (includ-ing weights and biases), of which 0.8% correspond to the first layer,15.6% to the second layer, 83.1% to the third layer, and correspondto layer and the remaining 0.5% to the output layer.

3.2. Data Annotation

The ground truth data has been annotated using a custom annotationtool that allows assigning a material category to each tie as well asits bounding box. The tool also allows defining polygons enclosingregions containing crumbling, chips or ballast. We used the outputof our fastener detection algorithm[5] to extract fastener examples.

Page 3: SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa

Detected class

Tru

e c

lass

Material identification

96.86

0.49

0.19

0.29

0.08

2.08

0.87

0.22

0.04

0.32

0.28

97.01

0.22

0.43

0.12

0.29

0.78

0.20

0.38

0.19

0.13

0.25

91.05

5.21

0.13

2.52

1.49

0.20

0.03

0.03

0.22

0.44

5.09

85.81

4.28

0.25

1.82

0.11

0.12

0.21

0.01

0.38

0.52

5.94

94.87

0.17

0.78

0.00

0.02

0.03

1.82

0.32

1.49

0.28

0.02

89.84

10.55

0.02

0.02

0.04

0.20

0.11

0.74

0.98

0.46

4.75

83.67

0.01

0.00

0.00

0.22

0.21

0.67

0.17

0.00

0.05

0.00

97.65

1.06

0.16

0.04

0.34

0.01

0.14

0.03

0.00

0.00

1.55

98.17

0.10

0.20

0.45

0.02

0.76

0.01

0.06

0.04

0.05

0.17

98.91

ballast wood rough medium smooth crumbled chip lubricator rail fastener

ballast

wood

rough concrete

medium concrete

smooth concrete

crumbled

chip

lubricator

rail

fastener

0

10

20

30

40

50

60

70

80

90

100

Detected class

Tru

e c

lass

Material identification

88.62

1.46

0.88

1.01

0.80

5.03

4.57

0.34

0.11

1.42

0.68

86.26

0.16

0.49

0.68

0.30

0.63

2.80

1.49

0.94

0.83

0.34

77.79

11.60

1.34

3.34

8.91

0.97

0.10

0.15

0.82

0.81

10.71

9.00

0.91

6.37

2.50

0.31

0.60

0.72

2.40

1.30

9.68

84.17

0.74

4.64

0.37

1.04

0.67

4.13

0.54

2.51

0.66

0.42

77.28

16.47

0.02

0.01

0.32

2.34

0.58

5.43

3.93

2.16

11.84

0.02

0.02

0.29

0.26

4.46

1.03

1.80

0.26

0.03

0.00

90.38

2.86

0.20

0.11

1.08

0.06

0.21

0.43

0.01

0.03

2.37

93.63

0.21

1.49

2.07

0.12

1.00

0.73

0.53

0.85

0.22

0.42

95.21

69.62

57.51

ballast wood rough medium smooth crumbled chip lubricator rail fastener

ballast

wood

rough concrete

medium concrete

smooth concrete

crumbled

chip

lubricator

rail

fastener

0

10

20

30

40

50

60

70

80

90

100

(a) (b)

Detected class

Tru

e c

lass

Material identification

84.49

1.48

1.02

0.82

0.75

4.90

4.07

0.48

0.11

1.24

0.60

91.08

0.14

0.34

0.09

0.19

0.26

0.77

1.14

0.46

1.44

0.18

80.36

12.23

0.97

4.84

8.73

1.22

0.07

0.06

1.12

0.74

9.85

71.47

10.13

1.02

5.59

2.47

0.24

0.37

1.20

1.02

1.04

8.60

84.33

0.85

3.85

0.43

0.56

0.54

5.89

0.53

2.13

0.58

0.40

71.90

20.15

0.05

0.02

0.41

3.48

0.45

4.15

3.32

2.50

15.66

0.01

0.02

0.29

0.41

1.69

1.21

1.78

0.27

0.03

0.00

91.97

3.20

0.21

0.13

1.03

0.07

0.15

0.15

0.02

0.00

2.40

94.21

0.29

1.25

1.80

0.04

0.70

0.42

0.59

1.08

0.21

0.43

96.14

56.27

ballast wood rough medium smooth crumbled chip lubricator rail fastener

ballast

wood

rough concrete

medium concrete

smooth concrete

crumbled

chip

lubricator

rail

fastener

0

10

20

30

40

50

60

70

80

90

100

Detected class

Tru

e c

lass

Material identification

77.18

2.12

2.28

1.01

0.19

4.59

7.14

2.07

0.86

2.32

0.85

82.40

0.20

0.49

0.48

0.38

0.65

0.40

2.09

1.52

3.75

0.66

17.34

1.53

13.00

16.14

1.34

0.26

0.19

1.75

1.18

14.51

13.42

3.35

8.95

0.69

0.53

0.59

0.29

1.41

1.20

11.31

82.76

0.43

1.46

0.07

0.31

0.39

4.59

1.14

6.02

2.04

0.39

16.82

0.15

0.04

0.39

7.07

1.18

6.02

3.26

0.64

15.62

0.12

0.11

0.68

1.45

0.75

0.93

0.38

0.02

0.10

0.06

89.91

3.40

0.58

0.79

5.78

0.41

0.67

0.20

0.02

0.25

4.17

91.41

0.97

2.28

3.37

0.16

1.08

0.37

0.31

1.17

1.09

0.98

92.37

68.28

62.43

62.20

47.35

ballast wood rough medium smooth crumbled chip lubricator rail fastener

ballast

wood

rough concrete

medium concrete

smooth concrete

crumbled

chip

lubricator

rail

fastener

0

10

20

30

40

50

60

70

80

90

100

(c) (d)Fig. 5. Confusion matrix of material classification on 2.5 million 80×80 image patches with (a) Deep Convolutional Neural Networks (b)LBP-HF with FLANN (c) LBPu28,1 with FLANN (d) Gabor with FLANN.

3.3. Training

We train the network using stochastic gradient descent on mini-batches of 64 image patches of size 75 × 75. We do data aug-mentation by randomly mirroring vertically and/or horizontally thetraining samples. The patches are cropped randomly among allregions that contain the texture of interest. To promote a robustnessagainst adverse environment conditions, such as rain, grease or mud,we have previously identified images containing such difficult casesand we automatically resampled the data so that at least 50% of thedata is sampled from such difficult images.

3.4. Score Calculation

To detect whether an image contains a broken tie, we first calculatethe scores at each site as

Sb(x, y) = maxi/∈B

Φi(x, y)− Φb(x, y) (1)

where b ∈ B is a defect class (crumbling or chip). Then we calculatethe score for the whole image as

Sb =1

β − α

∫ β

α

F−1(t)dt (2)

where F−1 refers to the t sample quantile calculated from all scoresSb(x, y) in the image. The detector reports an alarm if S > τ , whereτ is the detection threshold. We used α = 0.9 and β = 1.

4. EXPERIMENTAL RESULTS

We evaluated this approach on the dataset that we introduced in[5]. This dataset consists of 85 miles of continuous trackbed images

collected on the US Northeast Corridor (NEC) by ENSCO Rail’sComprehensive Track Inspection Vehicle (CTIV) between 2012 and2013. The images were collected using 4 line-scan cameras and wereautomatically stitched together and and saved into several files, eachcontaining a 1-mile image. As we did in our previous work, we re-sampled the images by a factor of 2, for a pixel size of 0.86 mm.For the experiments reported in this section, we included all the tiesin this section of track, including 140 wood ties that were excludedfrom the experiments reported in [5].

4.1. Material Identification

We divided the dataset into 5 splits and used 80% of the images fortraining and 20% for testing and we generated a model for each ofthe 5 possible training sets. For each split of the data, we randomlysampled 50,000 patches of each class. Therefore, for each modelwas trained with 2 million patches. We trained the network using abatch size of 64 for a total of 300,000 iterations with a momentumof 0.9 and a weight decay of 5× 10−5. The learning rate is initiallyset to 0.01 and it decays by a factor of 0.5 every 30,000 iterations.

In addition to the method described in Section 3, we have eval-uated the classification performance using the following methods:

• LBP-HF with approximate Nearest Neighbor: The Lo-cal Binary Pattern Histogram Fourier descriptor introducedin [23] is invariant to global image rotations while preserv-ing local information. We used the implementation providedby the authors. To perform approximate nearest neighbor weused FLANN[24] with the ’autotune’ parameter set to a targetprecision of 70%.

• Uniform LBP with approximate Nearest Neighbor TheLBPu28,1 descriptor [25] with FLANN.

Page 4: SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa

10−2

10−1

100

101

102

103

104

0.4

0.5

0.6

0.7

0.8

0.9

1Crumbling tie detection

De

tectio

n R

ate

False Positives per Mile

overall

≥ 10%

≥ 20%

≥ 30%

≥ 40%

≥ 50%

≥ 60%

≥ 70%

10−2

10−1

100

101

102

103

104

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Chipped tie detection

De

tectio

n R

ate

False Positives per Mile

overall

≥ 10%

≥ 20%

≥ 30%

≥ 40%

≥ 50%

(a) (b)Fig. 6. (a) ROC curve for detecting crumbling tie conditions. (a) ROC curve for detecting chip tie conditions. Each curve is generatedconsidering conditions at or above a certain severity level. Note: False positive rates are estimated assuming an average of 104 images permile. Confusion between chipped and crumbling defects are not counted as false positives.

• Gabor features with approximate Nearest Neighbor: Wefiltered each image with a filter bank of 40 filters (4 scalesand 8 orientations) designed using the code from [26]. Aswas proposed in [27], we compute the mean and standarddeviation of the output of each filter and we build a featuredescriptor as f = [µ00 σ00 y01 . . . µ47 σ47]. Then, we per-form approximate nearest neighbor using FLANN with thesame parameters.

The material classification results are summarized in Table 1 andthe confusion matrices in Figure 5.

Table 1. Material classification results.Method Accuracy

Deep CNN 93.35%LBP-HF with FLANN 82.05%LBPu28,1 with FLANN 82.49%Gabor with FLANN 75.63%

4.2. Semantic segmentation

Since we are using a fully convolutional DCNN, we directly transferthe parameters learned using small patches to a network that takesone 4096 × 320 image as an input, and generates 10 score mapsof dimension 252 × 16 each. The segmentation map is generatedby taking the label corresponding to the maximum score. Figure 7shows several examples of concrete and wood ties, with and withoutdefects and their corresponding segmentation maps.

4.3. Crumbling Tie Detection

The first 3 rows in Figure 7 show examples of a crumbling ties andtheir corresponding segmentation map. Similarly, rows 4 through6 show examples of chipped ties. To evaluate the accuracy of thecrumbling and chipped tie detector described in Section 3.4 we di-vide each tie in 4 images and we evaluate the score (2) on each im-age independently. Due to the large variation in the area affectedby crumbling/chip we assigned a severity level to each ground truthdefect, and for each severity level we plot the ROC curve of findinga defect when ignoring lower level defects. The severity levels aredefined as the ratio of the inspect able area that is labeled as a defect.

Figure 6 shows the ROC curves for each type of anomaly. Becauseof the choice of the fixed α = 0.9 in equation (2) the performanceis not reliable for defects under 10% severity. For defects that arebigger than the 10% threshold, at a false positive rate of 10 FP/milethe detection rates are 86.06% for crumbling and 92.11% for chips.

Fig. 7. Semantic segmentation results (images displayed at 1/16 oforiginal resolution)

5. CONCLUSIONS AND FUTURE WORK

Using the proposed fully-convolutional deep CNN architecture wehave shown that it is possible to accurately localize and inspect thecondition of railway components using grayscale images. We be-lieve that the reason our method performs better than traditional tex-ture features is due to the ability of DCNNs to capture more com-plex patterns, while reusing patterns learned with increasing levelsof abstraction that are shared among all classes. This explains whythere is much less overfitting on the anomalous classes (crumbledand chip) despite having a relatively limited amount of training data.

We currently run the network in a feed-forward fashion. In thefuture, we plan to further explore recursive architectures in orderto discover long-range dependencies among image regions with thepurpose of better separate normal regions from anomalous ones.

Page 5: SEMANTIC SEGMENTATION OF RAILWAY TRACK ......SEMANTIC SEGMENTATION OF RAILWAY TRACK IMAGES WITH DEEP CONVOLUTIONAL NEURAL NETWORKS Xavier Gibert, Vishal M. Patel, and Rama Chellappa

6. REFERENCES

[1] J. A. Smak, “Evolution of Amtrak’s concrete crosstie and fas-tening system program,” in International Concrete Crosstieand Fastening System Symposium, June 2012.

[2] M. H. Shehata and M. D. Thomas, “The effect of fly ash com-position on the expansion of concrete due to alkalisilica reac-tion,” Cement and Concrete Research, vol. 30, pp. 1063–1072,2000.

[3] S. Sahu and N. Thaulow, “Delayed ettringite formation inswedish concrete railroad ties,” Cement and Concrete Re-search, vol. 34, pp. 1675–1681, 2004.

[4] X. Gibert, V. M. Patel, D. Labate, and R. Chellappa, “Discreteshearlet transform on GPU with applications in anomaly detec-tion and denoising,” EURASIP Journal on Advances in SignalProcessing, vol. 2014, no. 64, pp. 1–14, May 2014.

[5] X. Gibert, V. M. Patel, and R. Chellappa, “Robust fastenerdetection for autonomous visual railway track inspection,” inIEEE Winter Conference on Applications of Computer Vision(WACV), 2015.

[6] J.J. Cunningham, A.E. Shaw, and M. Trosino, “Automatedtrack inspection vehicle and method,” May 2000, US Patent6,064,428.

[7] M. Trosino, J.J. Cunningham, and A.E. Shaw, “Automatedtrack inspection vehicle and method,” March 2002, US Patent6,356,299.

[8] X. Gibert, A. Berry, C. Diaz, W. Jordan, B. Nejikovsky, andA. Tajaddini, “A machine vision system for automated jointbar inspection from a moving rail vehicle,” in ASME/IEEEJoint Rail Conference & Internal Combustion Engine SpringTechnical Conference, 2007.

[9] A. Berry, B. Nejikovsky, X. Gibert, and A. Tajaddini, “Highspeed video inspection of joint bars using advanced image col-lection and processing techniques,” in Proc. of World Congresson Railway Research, 2008.

[10] F. Marino, A. Distante, P. L. Mazzeo, and E. Stella, “A real-time visual inspection system for railway maintenance: auto-matic hexagonal-headed bolts detection,” Systems, Man, andCybernetics, Part C: Applications and Reviews, IEEE Trans.on, vol. 37, no. 3, pp. 418–428, 2007.

[11] P. De Ruvo, E. Distante, A.and Stella, and F. Marino, “AGPU-based vision system for real time detection of fasteningelements in railway inspection,” in Image Processing (ICIP),2009 16th IEEE International Conference on. IEEE, 2009, pp.2333–2336.

[12] P. Babenko, Visual inspection of railroad tracks, Ph.D. thesis,University of Central Florida, 2009.

[13] Y. Li, H. Trinh, N. Haas, C. Otto, and S. Pankanti, “Rail com-ponent detection, optimization, and assessment for automaticrail track inspection,” Intelligent Transportation Systems, IEEETrans. on, vol. 15, no. 2, pp. 760–770, April 2014.

[14] E. Resendiz, J.M. Hart, and N. Ahuja, “Automated visual in-spection of railroad tracks,” Intelligent Transportation Sys-tems, IEEE Trans. on, vol. 14, no. 2, pp. 751–760, June 2013.

[15] K. Fukushima, “Neocognitron: A self-organizing neural net-work model for a mechanism of pattern recognition unaffectedby shift in position,” Biological Cybernetics, vol. 36, no. 4, pp.93–202, 1980.

[16] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard,W. Hubbard, and L. D. Jackel, “Backpropagation applied tohandwritten zip code recognition,” Neural Computation, vol.1, no. 4, pp. 541–551, 1989.

[17] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedingsof the IEEE, November 1998.

[18] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet clas-sification with deep convolutional neural networks,” in Ad-vances in Neural Information Systems (NIPS), 2013.

[19] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov,D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeperwith convolutions,” arXiv:1409.4842, 2014.

[20] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich featurehierarchies for accurate object detection and semantic segmen-tation,” in Computer Vision and Pattern Recognition (CVPR),2014 IEEE Computer Society Conference on, 2014.

[21] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutionalnetworks for semantic segmentation,” arXiv:1411.4038, 2014.

[22] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-shick, S. Guadarrama, and T. Darrell, “Caffe: Convolutionalarchitecture for fast feature embedding,” arXiv:1408.5093,2014.

[23] T. Ahonen, J. Matas, C. He, and M. Pietikainen, “Rotationinvariant image description with local binary pattern histogramfourier features,” in Image Analysis, pp. 61–70. Springer, 2009.

[24] M. Muja and D.G. Lowe, “Fast approximate nearest neigh-bors with automatic algorithm configuration,” in InternationalConference on Computer Vision Theory and Application VISS-APP’09). 2009, pp. 331–340, INSTICC Press.

[25] Timo Ojala, Matti Pietikainen, and Topi Maenpaa, “Multires-olution gray-scale and rotation invariant texture classificationwith local binary patterns,” Pattern Analysis and Machine In-telligence, IEEE Transactions on, vol. 24, no. 7, pp. 971–987,2002.

[26] M. Haghighat, S. Zonouz, and M. Abdel-Mottaleb, “Identifi-cation using encrypted biometrics,” in Computer Analysis ofImages and Patterns. Springer, 2013, pp. 440–448.

[27] B.S. Manjunath and W.Y. Ma, “Texture features for browsingand retrieval of image data,” Pattern Analysis and MachineIntelligence, IEEE Transactions on, vol. 18, no. 8, pp. 837–842, 1996.