Image Analysis and Classi cation in Scanning Tunneling ...bertozzi/WORKFORCE/Summer2012... · 1...

Image Analysis and Classification in Scanning

Tunneling Microscopy

August 10, 2012

Abstract

In this paper we examine the application of image processing tech-niques to the analysis of STM images. In particular, we apply SpectralClustering and Chan-Vese to images of peptides layered on a graphitesurface, with the goal of detecting and quantitatively describing patternswhich could aid in identifying the peptides. We propose a method thatincorporates many different image processing techniques which can suc-cessfully be used to identify certain meaningful patterns.

1

Contents

1 Introduction 3

2 Preprocessing 32.1 Denoising . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 G-norm TV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Structure Tensor . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 PCA and Mean Diffused Orientation . . . . . . . . . . . . . . . . 5

3 Segmentation 63.1 Chan-Vese . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Spectral Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Fourier transform Methods for Analyzing Segmented Images 7

5 Our Data, Method and Results 85.1 Results of Texture + Structure Decomposition . . . . . . . . . . 105.2 Results of Structure Tensor applied to the Images . . . . . . . . . 115.3 PCA and Mean Diffused Orientation . . . . . . . . . . . . . . . . 115.4 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125.5 Spectral Clustering Metrics . . . . . . . . . . . . . . . . . . . . . 135.6 Analysis of the Segmented Regions . . . . . . . . . . . . . . . . . 13

6 Conclusion 14

7 Appendix 147.1 Wasserstein Distance . . . . . . . . . . . . . . . . . . . . . . . . . 157.2 G-norm TV . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157.3 Bregman and Split-Bregman . . . . . . . . . . . . . . . . . . . . 16

References 20

2

1 Introduction

Scanning Tunneling Microscopy (STM) is a type of scanning probe microscopywhich can be used to obtain high-resolution images of chemical compounds.STM works by moving a tip over a sample which has been applied to a graphiteor gold surface. A small electric potential between tip and surface results in atunneling current which is related to the topography and chemical compositionof the sample. A feedback-loop raises and lowers the sample while keeping thetunneling current constant. This produces an image which reflects the topogra-phy of the sample as well as its chemical composition[9].

Due to the very high resolution of STM images (< 1A) individual moleculescan often be distinguished in STM images. This suggests that STM imagescould be used for classification of chemical compounds. Many properties, suchas spacing and orientation of the molecules on the surface, have been extractedfrom STM images[22].

The STM images which we work with have the drawback that they are notuniform. Rather, they consist of multiple distinct regions on which moleculesare spaced and oriented differently and other regions which are so damaged thatthey contain no useful information. Algorithmically detecting these differentregions and determining which ones contain useful information which needs tobe further processed is our goal. In this paper we propose a method whichincorporates a variety of different images processing techniques to successfullyachieve such a separation. In the first section of this paper we begin witha discussion of the various image processing techniques which we combine tosegment our data. We then describe how we combine each of these methodsand demonstrate the successful segmentation of our images.

2 Preprocessing

Before employing any segmentation algorithms, we need to preprocess the STMimages. This includes removing noise and separating features of different scalesas well as computing the orientations present at each of the different parts of theimages. First we will describe the separation of large and small scale featuresand then we describe the computation of the orientations.

2.1 Denoising

One of the first steps we take in analyzing our images is to denoise them. Theimages which we work with initially contain a large amount of noise. Therewere two approaches that we took in denoising our images, TV minimizationand nonlocal means[5].

Total Variation (TV) minimization denoises an image by minimizing thefollowing functional.

E(u) =

∫Ω

|∇u|dA+ λ‖f − u‖22

Here f is the original image, u is the denoised image, λ is a parameter, andΩ is the image domain. The motivation for this is that minimizing the totalvariation,

∫Ω|∇u|dA, favors a piecewise constant image, while the data fidelity

3

term ‖f −u‖22 keeps the smoothed image as close to the original as possible[23].A major problem with this method is that it removes all of the small-scalestructure which we are interested in. One approach that we took to solve thisproblem is to use the more advanced G-norm TV model, explained in the nextsection.

2.2 G-norm TV

The images that we work with contained information on two different scales.On the one hand, large scale topographical features could be used to determinethe length and large scale structure of the peptide strands. On the other hand,the small scale texture of the images could be used to determine the width andspacing of the peptide strands, and the angular orientation of this texture canalso yield information about the peptide.

The first step in our analysis was the separation of these two scales, whichwe achieved using the G-norm TV image decomposition model first proposedby Meyer[13]. The model decomposes the image by minimizing the followingfunctional.

E(u, v) =

∫Ω

|∇u|dA+ λ‖v‖G + µ‖f − g − v‖L2

Here f : Ω → R is the original image, u consists of the large scale structure, vconsists of the small scale texture and ‖ · ‖G is dual to the BV norm[13]. Wedescribe the G-norm in more detail in the appendix.

2.3 Structure Tensor

The structure tensor extracts information about the local orientation of images.Given an image f , the structure tensor is constructed by first taking the outerproduct of the gradient of f with itself and then convolving this tensor fieldwith a Gaussian to obtain a 2×2 symmetric tensor at every point of the image.

Mathematically, we define the structure tensor as follows.

T = Kσ ∗ (∇f∇fT )

where f is the image and Kσ is a gaussian filter of standard deviation ε.The spectral decomposition of this tensor is then used to extract orientation

information about the image. Specifically, the eigenvector of largest eigenvaluegives the local orientation of the image and the coherency, or measure of howstrong this local orientation is given by (λ1−λ2

λ1+λ2)2 with λ1 ≥ λ2 ≥ 0[4].

An example of the structure tensor applied to test images is shown in thefollowing diagram.

4

Figure 1: Test Image and Structure Tensor Output

Here the orientation is plotted using a color scheme, i.e., regions of thesame color have the same local orientation. Also, the saturation of the colors isproportional to the coherency.

One issue with the structure tensor is that when applied to our STM im-ages, it results in an overall low coherence. This has to do with the fact thatthe features which we are trying to detect have very small scales and are notperfectly resolved. To remedy this situation we apply two different techniquesfor homogenizing our data which are described in the next sections.

2.4 PCA and Mean Diffused Orientation

One of these techniques is called Principal Component Analysis (PCA). Wetake patches around the pixels in our image and associate each pixel with thehistogram of the angles which occur in the patch. Given this data, we findthe orthogonal directions of largest variance among all of the histograms (thePrincipal Components) and project each histogram orthogonally onto these di-rections.

Mathematically, what we do is take a matrix whose columns consist of thehistogram data and perform a singular value decomposition on this matrix. Then eigenvectors of the largest eigenvalues are then the Principal Components.This technique makes our data homogeneous enough for certain segmentationalgorithms, such as the level set based Chan-Vese algorithm.

5

Another approach that we take to homogenize our data is to again choosepatches around each pixel, but instead of taking a histogram of orientation an-gles, we average the angles corresponding to pixels with coherency larger thana threshold value. In order to average the angles, we view them as complexnumbers, add the complex numbers and take the argument of the result. Math-ematically, if Pq is the patch around the pixel q, then the new angle associatedto q is given by

a∗q = arg(Σp∈Pq,coh(p)>te2iap)

where t is the coherency threshold and ap is the angle returned by the structuretensor at the point p. The angles given by the structure tensor are doubledsince orientations which differ by an angle of π should be identified.

Both of these methods produce data which is homogeneous enough to beprocessed using Chan-Vese or Spectral Clustering, both of which are describedin the next section.

3 Segmentation

Once we have preprocessed our data using the methods described above, ournext step is to use segmentation algorithms to partition the images. This par-titioning allows us to isolate regions where the molecules have adhered to thesurface in a uniform way. Once we have isolated these regions we can obtainuseful information using Fourier analysis and other methods which can be usedto classify peptides. In this section we describe the segmentation algorithmswhich we use, namely Chan-Vese and Spectral Clustering.

3.1 Chan-Vese

The Chan-Vese model is a level set model for image segmentation. This meansthat the image is segmented via a function φ whose positive values indicate oneregion of the image and whose negative values indicate the complimentary regionof the image. The Chan-Vese model is formulated in terms of the minimizationof the following functional

E(φ, e1, e2) =

∫φ<0

(f − e1)2dA+

∫φ>0

(f − e2)2dA+ λ · length(φ = 0)

where f is the image, e1 fits f to a constant on the exterior of the segmentedregion, e2 fits f to a constant on the interior of the segmented region, and λis a parameter of the model which determines how important the length of thesegmenting curve is. The Chan-Vese model is designed for segmenting piecewiseconstant images as is indicated by the roles of e1 and e2.

When implementing the functional described above, we use approximationsof the Heaviside and Dirac delta functions to approximate the length of φ = 0and the regions φ > 0 and φ < 0. The actual functional which we minimize is

E(φ, e1, e2) =

∫Ω

(f − e1)2(1−Hε(φ))dA+

∫Ω

(f − e2)2Hε(φ)dA+

λ

∫Ω

δε(φ)|Dφ|dA

6

where Hε and δε are approximations to the Heaviside and delta functions,respectively[3].

The parameters e1 and e2 fit the data in each region to a constant. As a resultof this Chan-Vese works best for piecewise constant data. Our preprocessingtechniques serve to change textural information into approximately piecewiseconstant images which can then be segmented by Chan-Vese.

3.2 Spectral Clustering

Spectral Clustering is a non-local segmentation method. This means that ittakes into account similarity information between regions that are possibly farapart in the original image. To make this precise, one generalizes traditional dif-ferential operators, the laplacian and gradient, to take into account similaritiesbetween pixels which may be far apart.

This is done by constructing a weighted graph G, whose vertices consist ofeach of the pixels in the image and whose weighted edges quantify the amountof similarity between regions surrounding each pixel. We introduce a distancefunction d : Ω× Ω → R which measures the distance between two pixels. Thiscan be taken as the traditional distance between the pixel locations or it can bea measure of how similar the regions surrounding each pixel are, for example.We will later discuss distance functions which have produced good segmentationresults for us.

Given a distance function we define the weight between two pixels to bew(x, y) = e−d(x,y)/2σ2

where σ is a parameter. These weights make up theedges of the weighted graph G. Now we can generalize the traditional gradientas follows

∇wu(x, y) = w(x, y)(u(x)− u(y))

where ∇wu(x, ·) is the nonlocal gradient of u at the point x. The nonlocallaplacian is defined as follows.

∆wu(x) = Σy∈Ωw(x, y)(u(y)− u(x))

Note that the nonlocal laplacian is a linear map from the set of functions on theimage domain to itself and since the image domain consists of a finite numberof pixels, this map can be represented by a matrix which we will denote by L.

Through a series of calculations, one can show that the smallest eigenvaluesand their corresponding eigenvectors of the matrix L can yield approximate so-lutions to graph cut problems. These cuts cluster the nodes of the graph andthus provide a segmentation of our images[18]. One can also use these gener-alizations of the traditional differential operators to define functionals whoseminimization is similar to TV and produces better results for certain types ofimages. This is not, however, necessary as a preprocessing step for our data.

4 Fourier transform Methods for Analyzing Seg-mented Images

To study the patterns from Atomic Force Microscope (AFM) images, Prokhorovet al. used the Fourier transform to determine the orientation of the objectsin their images[22]. When analyzing our images, the orientation and spacing of

7

the texture features are also important for gaining information that can helpclassify the peptides. This is our motivation for using the Fourier transform onour data after we have segmented it.

A patch of the STM image which contains a uniform texture is selected. Inorder to eliminate unwanted boundary effects we window the patch accordingto the following formula.

Iw(x, y) = I(x, y) sin(π

M − 1x) sin(

π

N − 1y)

where I is a patch of size (M,N). The windowing process fades the border ofthe image and this removes unwanted effects which are due the the fact thatthe boundary is not periodic.

The orientation and spacing of the texture can now be determined basedupon the Fourier transform of the patches. First, we locate the peaks in theFourier transform. See figure ?? for an example.

The orientation can easily be found by calculating the angle of each of theselocated frequencies. Next, the Euclidean norm of the frequency locations givethe absolute frequency. From the absolute frequency, the period can be foundas the reciprocal of the frequency.

Finally, the distance between the lines is one of these periods (most likelythe one with the longest period).

Thus, we can find the orientation and the distance between the lines presentin a uniformly oriented patch. If a patch does not have uniform orientation,it is difficult to do the above procedure to get meaningful angle and periodinformation. This is due to multiple wave-like patterns generated by the differentorientations that interfere with each other. There are methods less susceptibleto this such as autocorrelation but it will still be highly impacted by theseinterfering patterns.

To address this problem, the image is first segmented into different regionseach containing at most one orientation and then the Fourier transform will beperformed in these regions to find the period and angle. The angle, however,will instead be based on the structure tensor because it gives a local orientation,which is more accurate.

5 Our Data, Method and Results

The STM images which we work with consist of two channels. One of the chan-nels gives topographical information while the other channel, the polarizabilityimage, gives information about the underlying chemical structure. Specifically,the topographical image consists of a plot of the actual height of the STM tip.The polarizability image is obtained by varying the tip-sample voltage at a veryhigh frequency and measuring how much of this variation passes through to thetunnelling current. Specifically, the topographical image consists of a plot ofthe actual height of the STM tip. The polarizability image, which gives infor-mation about the chemical structure of the sample is obtained by varying thetip-sample voltage at a very high frequency and measuring how much of thisvariation passes through to the tunnelling current[10]. This gives a measure ofhow easily the underlying molecules can be polarized since the varying voltagecauses the electron clouds to deform, resulting in the varying tunnelling current.

8

(a) (b) (c)

(d) (e)

Figure 2: (a) Image with section marked. (b) Section of the image. (c) Fouriertransform of the image section. The line artifact at the center is due to thenonperiodic nature of the border around the image section. (d) Windowing ofthe image section. (e) Fourier transform of the windowed image section. Theline artifact is no longer present.

Below we show some of the images we worked with.

Figure 3: Topography and Polarizability Images

The width and height of each of these images is 150A. This high resolutionallows us to distinguish individual molecules.

The topography image on the left clearly contains information on two dif-ferent scales. The large scale ridges running across the image from bottom leftto top right constitute the large scale structure of the images while the finefeatures oriented orthogonal to the ridges constitute the texture. We believethat each small line in the texture separates individual molecules layered on thesurface. As a result, one of our goals is to measure the spacing and orientationof the texture patterns since this would give us information about the spacingand orientation of the molecules on the surface.

Below we show two more of the images we worked with.

9

Figure 4: Topography and Polarizability Images

In these images we see two different regions. On the middle stripe thetexture patterns are oriented differently from the outer edges. Segmenting theimage would ideally separate these two regions, allowing us to analyze themindependently. This would give us useful texture orientation in both regionswhich could aid in classification.

Both of these examples demonstrate that separating the different scale fea-tures as well as segmenting the resulting images are important steps in obtaininginformation from these STM images which could ultimately lead to automaticclassification. Our method is a process which produced good results for both ofthese problems when applied to our images.

5.1 Results of Texture + Structure Decomposition

As previously mentioned, the first step in our method for extracting useful clas-sification information from STM images was to separate structure from texture.To do this we implemented the G-norm TV minimization model using split-Bregman[12]. We obtained the following results when applying this method tothe first example image above.

Figure 5: Original Image = Structure + Texture + Noise

These results demonstrate that the G-norm TV method performs well inseparating the large and small scale features of our images. This solves thefirst problem mentioned above and provides the first step for the segmentation.We now consider the structure and texture images separately and focus on thesegmentation and analysis of the texture images.

10

5.2 Results of Structure Tensor applied to the Images

Our next step was to apply the structure tensor to each of the texture images.The motivation for this is that each of the distinct regions in our images differprimarily by the orientation of the texture. Thus the structure tensor shouldreturn different angles on each of these different regions, producing an ideallypiecewise constant image which could then be segmented by Chan-Vese. Theresult of this process is shown in the following two images.

Figure 6: Texture and Structure Tensor

Here we clearly see that a large portion of the image has overall very lowcoherency. This proves to be a problem when attempting to apply Chan-Vese,since it makes the images very inhomogeneous. The next two possible stepsin our method address this problem by smoothing the results of the structuretensor.

5.3 PCA and Mean Diffused Orientation

As previously mentioned, the inhomogeneity of the structure tensor images im-pedes the success of Chan-Vese and Spectral Clustering algorithms for segmen-tation. The two methods which we used to address this problem, Mean DiffusedOrientation and PCA, are both useful in different situations. A data-set withlarge regions of uniformly low coherency are often filled in by Mean DiffusedOrientation, as can be seen by the following example.

(a) (b) (c)

Figure 7: Structure Tensor Image, PCA, Mean Diffused Orientation

Here we clearly see that the low coherency region in the upper left cornerof the image is filled in by Mean Diffused Orientation (c), whereas PCA does abetter job of distinguishing this region (b).

11

However, images with frequent, small, extraneous holes are handled better byMean Diffused Orientation than by PCA. This is because PCA does not removethese holes, whereas Mean Diffused Orientation does, as is demonstrated by thefollowing example.

Figure 8: PCA, Mean Diffused Orientation

It is apparent that PCA did not remove the holes in the structure tensorimage to a great enough degree for the image to be segmented using Chan-Vese. On the other hand, Mean Diffused Orientation did a much better jobof homogenizing the structure tensor image. Because both of these methodsproduce good results for different types of images we use both of them in ouranalysis of STM images.

5.4 Segmentation

Segmentation lies at the heart of our method for extracting useful informationfrom STM images. This is because once we have segmented the images wecan use methods such as Fourier transform on each of the pieces to extractinformation about orientation and spacing which could be potentially useful inclassifying the imaged peptides. Much like the two methods in the precedingsection, the two segmentation methods which we use, Chan-Vese and SpectralClustering, are useful in different situations.

The Chan-Vese model is more useful when the smoothness of the segmen-tation curve is important. However, Chan-Vese has problems with data-setsthat are not piecewise constant. Also, segmentation into more that one regionis more difficult with Chan-Vese than with Spectral Clustering. The results ofboth Chan-Vese and Spectral Clustering on the images produced by PCA andMean Diffused Orientation, respectively, are shown below.

Figure 9: Chan-Vese, Spectral Clustering 2 Clusters, Spectral Clustering 3 Clus-ters

12

5.5 Spectral Clustering Metrics

Since the results of Spectral Clustering depend critically upon the distance mea-sure which is used to compare patches around the pixels, we give a descriptionof the distance measures which have given us the best results. There are threemetrics which have given us good results. The first is a simple comparison ofthe structure tensor angles. However, the angles return by the structure tensormust be doubled and taken modulo 2π. We achieve this by doubling the anglesand mapping them onto the unit circle. Then we compute the distance by tak-ing the magnitude of the difference of the the complex numbers associated toeach angle.

Another metric which we have found useful is computed as follows. We takea patch surrounding each of the pixels and create a histogram of angles given bythe structure tensor in each patch. Then we take all of the histograms associatedto each of the patches about our pixels and perform PCA on them. We thentake the first three principle components and use their coefficients to computethe distance between our pixels.

The third metric which we have successfully applied is somewhat similar tothe previous one. We again take a patch surrounding each of the pixels andcreate a histogram of angles given by the structure tensor in each patch. How-ever, now we compute the Wasserstein or optimal transport distance betweenthe histograms to give us the distance between each of our pixels.

Using other metrics one may be able to improve upon our segmentationresults, and this is a possible direction for future research.

5.6 Analysis of the Segmented Regions

Now that we have a decent segmentation of our images, we are able to choosepieces of the segmented regions and analyze them using the Fourier transformmethods described in Prokhorov et al. This gives us accurate information aboutthe orientation and spacing of the peptides. Below is an example of our appli-cation of the Fourier transform to a window selected through our segmentation.

Figure 10: Window, Fourier transform

From these peaks we can clearly determine the orientation of the patternsin the windowed region and the spacing of the peptides using the techniquesdescribed in section 4.

13

6 Conclusion

In conclusion we have found a successful process for applying known imageprocessing methods to segment and analyze STM images of peptides. Below isa flowchart which summarizes our method.

Figure 11: Summary

7 Appendix

In this section we describe in more detail some of the more mathematicallyadvanced methods that we used, such as theG-norm TV minimization model. Inaddition we give more detailed descriptions of some of the numerical techniques

14

that we used to solve minimization problems such as TV minimization. We alsoprovide a description of the Wasserstein metric which we successfully applied incombination with Spectral Clustering.

7.1 Wasserstein Distance

The Wasserstein distance uses probability and cumulative distribution functions(PDF and CDF, respectively) to create a metric that quantifies the similaritybetween two patches of an image.

We first begin by dividing the image into n×n patches. For each of thesepatches, we create a histogram where each of the pixels in the patches is binnedaccording to their intensity value. We then create a probability distribution foreach of these histograms by normalizing them, or in other words, dividing thenumber of elements in each of the bins by the overall number of elements in thehistogram.

Having done this, we then use the method seen in Ni et al.[20], where theoptimal transport cost between patches u and v,

Tp(µ, ν) =

∫ 1

0

|F−1(x)−G−1(x)|pdt (1)

is modified to result in the Wasserstein distance,

W (µ, ν) =

∫R|F (x)−G(x)|dx (2)

where F and G are cumulative distribution functions of µ and ν and F−1 andG−1 are their inverses.

As can be seen, by letting p=1, we are able to avoid using the inverse CDFsand after discretizing (2), we are left with a relatively simple equation to im-plement.

The following section introduces the G-norm and explains its use in sepa-rating structure from texture.

7.2 G-norm TV

The G-norm model introduces a new type of norm which is designed to separatetexture from noise. In effect this new norm has the property that functionswhich are highly oscillatory and thus should represent texture have small normwhereas random noise has a high norm. This is then used to define a functionalwhose minimization gives the desired decomposition into structure, texture, andnoise. We begin by introducing the G-norm.

Let Ω ⊂ R2 be our image domain. Note that if a function f ∈ L1(Ω) is dif-ferentiable and vanishes on dΩ then integration by parts yields

∫Ω〈φ,∇f〉dA =

−∫

Ω(∇ · φ) · fdA where φ is a differentiable vector field φ : R2 → R2. Now we

define the total variation of a function f ∈ L1(Ω) by

V (f) = supφ∈C1(Ω,R2),‖φ‖L∞≤1

∫Ω

((∇ · φ) · f)dA.

For differentiable functions f which vanish on dΩ, this map V (f) represents∫Ω|∇f |dA where ∇f is the gradient of f . This can be seen from the above inte-

gration by parts formula since |〈φ(x),∇f(x)〉| ≤ ‖φ‖L∞ |∇f(x)| with equality if

15

and only if φ has constant length and points in the same direction as ∇f at ev-ery point. The given definition of V extends this notion of total variation to allfunctions f ∈ L1(Ω). The space BV , or bounded variation, is defined to be thesubspace of L1(Ω) consisting of functions f satisfying V (f) <∞. One can verifythat this is indeed a Banach Space under the norm ‖f‖BV = ‖f‖L1 + V (f).

Now we consider the dual space to BV . Denote this space by BV ∗. Thenthere is a subspace G of BV ∗ consisting of linear functionals q of the formq(f) =

∫Ωf · (∇·φ)dA where φ is a differentiable vector field, we will say that φ

represents q. Then from the definition of V we see that |q(f)| ≤ V (f)‖φ‖L∞ ≤‖f‖BV · ‖φ‖L∞ . This shows that ‖q‖G ≤ ‖φ‖L∞ for any φ representing q, where‖ · ‖G is the norm dual to the BV norm. We have in fact ‖q‖G = infφ ‖φ‖L∞

where the infimum is taken over all φ representing q.As previously mentioned, the space G is important because it is useful for

separating texture from noise. The linear functional q can be identified withg = ∇ · φ where φ represents q. Note that q determines g up to a set ofmeasure 0 and thus G can be identified with a subspace of L1(Ω). It turnsout that under this identification, functions with small G-norm represent highlyoscillatory patterns whose average over all of Ω is close to 0. This can bemotivated by noticing that given a regular, oscillatory pattern g which averagesto 0 on all of Ω, i.e., g oscillates regularly between positive and negative values,there will be a large amount of cancellation in the integral

∫ΩfgdA if f is a

slowly changing function, i.e., a function of small BV norm. Thus the G-normTV model proposes to use the space G to separate texture from noise. To dothis, the following functional is minimized

E(u, v) =

∫Ω

|∇u|dA+ λ‖v‖G + µ‖f − g − v‖L2

where f is the original image, u is the large scale structure, v ∈ G is the texture,and λ and µ are parameters of the model, which depend upon the scale of thestructure and texture and the amount of noise[13].

The next section describes the numerical scheme used in practice when solv-ing TV, G-norm TV, and nonlocal TV minimization.

7.3 Bregman and Split-Bregman

The method of Bregman iteration arises when minimizing a convex functionalE(f) subject to the constraint G(u) = 0 where G is a non-negative convex func-tional which equals 0 somewhere. One method would be would be to iterativelyminimize E(u) + λi‖G(u)‖2 with λi → ∞. However, such a method is oftenimpractical since this minimization becomes difficult as λi gets very large.

An alternative is to use Bregman iteration, where one chooses an initial u0

and p0 and iteratively minimizes

DpiE,ui

(u, ui) + λG(u). (3)

Here DpiE,ui

(u, ui) = E(u)−E(ui)−〈pi, u−ui〉 is the Bregman distance betweenu and ui evaluated at the element pi in the subdifferential of E at ui. The Breg-man distance is a measure of how well the element pi in the subdifferential ofE at ui approximates the change in E from ui to u. Thus a small Bregmandistance means that pi approximates the change in E from ui to u very well.

16

It remains to determine the evolution of the sequence of elements pi. p0 can betaken to be anything in the subdifferential of E at u0. For example ∇E(u0),which represents the first variation of the functional E, evaluated at u0. Sinceui+1 minimizes Dpi

E,ui(u, ui) + λG(u) we have that 0 is in the subdifferential of

DpiE,ui

(u, ui) + λG(u) at ui+1. Thus pi+1 = pi − λ∇G(ui+1) is in the subdiffer-

ential of E at ui+1, since DpiE,ui

(u, ui) = E(u) − E(ui) − 〈pi, u − ui〉, E(ui) isconstant with respect to u and pi is clearly in the subdifferential of 〈pi, u− ui〉at every u. This allows us to let pi+1 = pi− λ∇G(ui+1) in our iterative scheme(pi should be in the subdifferential of E at ui to prevent large jumps in theiterates ui which is necessary to guarantee convergence).

The motivation for this scheme is the following. In each step we are mini-mizing G(u) with the additional constraint that pi approximate the change inE as well as possible. But we also iteratively defined pi to be close (at least inthe first iterations) to −∇G(ui). So because the functional derivative of G ap-proximates the change in E in our step well, we ensure that through minimizingG we are also minimizing E as much as possible. It can be proven that undersuitable conditions on E and G Bregman iteration converges. In this case, wemust have G(u) = 0.

Lemma 7.1. If Bregman iteration converges, G is continuous and E is Holdercontinuous for some α > 0, then it must converge to a function u satisfyingG(u) = 0.

Proof. Note that we can assume without loss of generality that λ = 1 by replac-ing G(u) by λG(u) if necessary as this doesn’t affect the continuity of G nor theconclusion that G(u) = 0. Assume that ui → u. Then since G is continuouswe have that G(ui) → G(u). I claim that G(ui+1) ≤ G(ui) for all i. Thisfollows since ui+1 = minuD

piE,ui

(u, ui) + G(u) and DpiE,ui

(u, ui) ≥ 0 since pi isin the subdifferential of E at ui. Hence if G(ui+1) > G(ui) we would have thatDpiE,ui

(ui, ui) + G(ui) = G(ui) < DpiE,ui

(ui+1, ui) + G(ui+1). This contradictsthe minimality of ui+1 and hence we have that G(ui+1) ≤ G(ui).

Now note that if p is in the subdifferential of E at ui then 〈p, u − ui〉 ≤E(u)−E(ui) by the definition of subdifferential. By replacing u with 2ui− u ifnecessary we may assume that 〈p, u− ui〉 > 0. This proves that |〈p, u− ui〉| ≤max(|E(u)−E(ui)|, |E(2ui−u)−E(ui)|. Now since E is Holder continuous weget max(|E(u)−E(ui)|, |E(2ui−u)−E(ui)| ≤ C|u−ui|α for some constant C.Thus |〈p, u− ui〉| ≤ C|u− ui|α. Now |Dp

E,ui(u, ui)| ≤ |E(u)− E(ui)|+ |〈p, u−

ui〉| ≤ 2C|u− ui|α.We also have that since the ui converge, they must be bounded. Let B be a

bound for the ui.Now assume that G(u) = t > 0. Thus we have that G(ui) ≥ t for all i

Since G achieves the value 0 there exists a v such that G(v) = 0. ReplaceB by max(B, ‖v‖) if necessary. Now choose 0 < ε < ( t

4C(2B)α )1−α. Note

that we have that 2C(2Bε)α < tε2 . Since G(ui) → G(u) = t we can find an

i such that G(ui) < t + tε2 . Set q = (1 − ε)ui + εv. Note that since G is

convex we have that G(q) ≤ (1 − ε)G(ui) since G(v) = 0. Now we have thatDpiE,ui

(q, ui) + G(q) ≤ G(ui) − εG(ui) + 2C(2Bε)α since ui − q ≤ 2Bε. SinceG(ui) ≥ G(u) = t we have the bound

DpiE,ui

(q, ui) +G(q) < t+tε

2− tε+

tε

2= t.

17

Now since G(ui+1) ≥ t and the Bregman distance is positive we have that

DpiE,ui

(q, ui) +G(q) < DpiE,ui

(ui+1, ui) +G(ui+1).

This contradicts the definition of ui+1 as the minimizer of DpiE,ui

(u, ui) +G(u).This contradiction is the result of the assumption that t > 0, hence we haveproven that t = 0 as desired.

Note that we can replace the assumption that E is Holder continuous bythe assumption that E is Holder continuous within the region x|‖x‖ ≤ B.Weakening the assumption to Holder continuity within a bounded region allowsthe application of this lemma to a wide class of convex functions, for example allfunctions of the form E(f) = ‖Af −a‖1 or E(f) = ‖Af −a‖22 where A is linear.Functionals of this type will be examined in the remainder of the chapter.

In the case where G is of the form G(f) = ‖Bf − b‖22 with B linear theBregman iteration can be simplified.

Thm 7.1. If G is of the form G(f) = ‖Bf − b‖22, then the Bregman iterationis equivalent to

ui+1 = minuE(u) + λ‖Bu− bi‖2

bi+1 = bi + b−Bui+1.

Proof. We have that −∇G = −2BT (Bf−b) so the pi evolve according to pi+1 =pi−2λBT (Bui+1−b). Now let q0 be chosen so that p0 = 2λBT q0 (this is alwayspossible by choosing a suitable u0, for example choosing u0 to be the minimumof E, in which case p0 = 0 is in the subdifferential and we can choose q0 = 0)and let the qi evolve according to qi+1 = qi+b−Bui+1. I claim that pi = λBT qi.This can easily be seen by induction since p0 = 2λBT q0 and if pi = 2λBT qi,then 2λBT qi+1 = 2λBT qi + 2λBT (b−Bui+1) = pi − 2λBT (Bui+1 − b) = pi+1.

Now the Bregman minimization becomes ui+1 = minuE(u) + λ‖Bu − (b +qi)‖22. This can be seen as follows

DpiE,ui

(u, ui) + λG(u) = E(u)− E(ui)− 〈pi, u− ui〉+ λ‖Bu− b‖22 =

E(u)− E(ui)− 〈pi, u− ui〉+ λ〈Bu− b, Bu− b〉 =

E(u)− E(ui)− 〈2λBT qi, u− ui〉+ λ〈Bu,Bu〉 − 2λ〈Bu, b〉+ λ〈b, b〉 =

A+ E(u) + λ(〈Bu,Bu〉 − 2〈BT qi, u〉 − 2〈Bu, b〉+ 〈b+ qi, b+ qi〉) =

A+ E(u) + λ(〈Bu,Bu〉 − 2〈Bu, b+ qi〉+ 〈b+ qi, b+ qi〉) =

A+ E(u) + λ〈Bu− (b+ qi), Bu− (b+ qi)〉 = A+ E(u) + λ‖Bu− (b+ qi)‖22.

Here A = λ〈b, b〉 − λ〈b+ qi, b+ qi〉 − E(ui) + 2λ〈BT qi, ui〉 is independent of u.Hence minimizing E(u)+λ‖Bu−(b+qi)‖22 is equivalent to the original problemof minimizing Dpi

E,ui(u, ui) + λG(u). Now let bi = b + qi and note that the bi

evolve according to bi+1 = bi + b−Bui+1. Thus the Bregman iteration reducesto the iterative scheme

ui+1 = minuE(u) + λ‖Bu− bi‖2

bi+1 = bi + b−Bui+1

as desired.

18

Now since we have that the ui will converge to a function u satisfying Bu = bwe can easily prove that this limit minimizes E subject to the constraint.

Thm 7.1. If G is of the form G(f) = ‖Bf − b‖22 and E is Holder continuouson every bounded set, then if Bregman iteration converges, it converges to aminimizer of E under the constraint G(u) = 0.

Proof. By Lemma 4.1 we have that if Bregman iteration converges to u, thenG(u) = 0. Hence if u is the convergent, then Bu = b. Now if u∗ were a trueminimizer then Bu∗ = b = Bu and thus ‖Bu − bi‖2 = ‖Bu∗ − bi‖2. Usingthe reformulation of Bregman iteration outlined in thm 4.1, we have that uminimizes E(u)+λ‖Bu−bi‖22 Thus E(u)+λ‖Bu∗−bi‖22 = E(u)+λ‖Bu−bi‖22 ≤E(u∗) + λ‖Bu∗ − bi‖2. Thus E(u) ≤ E(u∗) and u minimizes E(u) subject tothe constraint Bu = b.

Bregman iteration is particularly useful in solving the unconstrained min-imization problems of the form minu ‖Au‖1 + ‖Bu − b‖2 which occur in TVdenoising and G-norm TV denoising. To see how Bregman can be applied tosolve such a minimization problem we first introduce a new variable d and re-formulate the problem as minu,d ‖d‖1 + ‖Bu− b‖2 under the constraint Au = d.If we define E(u, d) = ‖d‖1 + ‖Bu − b‖2 and G(u, d) = λ‖Au − d‖22 then Eand G are convex functionals and the Bregman iteration can be applied to solvethis constrained optimization problem which is clearly equivalent to the originalunconstrained optimization problem.

Now since G has the form G(f) = ‖Bf − b‖22 with respect to both uand d, when we make the simplification described above we obtain the steps(uk+1, dk+1) = minu,d ‖d‖1+‖Bu−b‖2+λ‖d−Au−bk‖22 and bk+1 = bk+Au−d.

The power in this method comes from the fact that the minimization prob-lem (uk+1, dk+1) = minu,d ‖d‖1 + ‖Bu − b‖2 + λ‖d − Au − bk‖22 can be easilysolved by iteratively minimizing with respect to u and d separately. We firstminimize with respect to u which gives a convex L2 minimization problem thatcan be solved using linear algebra methods, such as Gauss-Seidel. The min-imization with respect to d is fast because it is known that a minimizer forthe problem mind ‖d‖1 + λ‖d − f‖22 is d = shrink(f, λ). Here shrink(f, ε) =sgn(f) · (max(|f | − ε, 0)). Here we followed the work of Goldstein et al.[14] .

19

References

[1] Bertozzi, Andrea L., and Arjuna Flenner. “Diffuse Interface Models onGraphs for Classification of High Dimensional Data.” (2012): 1-29. Print.

[2] Bresson, Xavier. “A Short Guide on a Fast Global Minimization Algorithmfor Active Contour Models.” (2009): 1-19. Print.

[3] Brown, Ethan S., Tony F. Chan, and Xavier Bresson. “A Convex Approachfor Multi-phase Piecewise Constant Mumford-Shah Image Segmentation.”(n.d.): 1-31. Print.

[4] Brox, T., J. Weickert, B. Burgeth, and P. Mrazek. “Nonlinear StructureTensors.” Image and Vision Computing 24.1 (2006): 41-55. Print.

[5] Buades, A., B. Coll, and J. Morel. “A Non-Local Algorithm for ImageDenoising.” Computer Vision and Pattern Recognition 2 (2005): 20-25.Print.

[6] Chan, Tony, Selim Esedoglu, and Kangyu Ni. “Histogram Based Segmen-tation Using the Wasserstein Distance.” (n.d.): 1-12. Print.

[7] Chan, T., Y. B. Sandberg, and L.A. Vese. “Active Contours without Edgesfor Vector-Valued Images.” Journal of Visual Communication and ImageRepresentation 11.2 (2000): 130-41. Print.

[8] Chan, Tony F., and Luminita A. Vese. “Active Contours Without Edges.”IEEE Transactions on Image Processing 10.2 (2001): 266-77. Print.

[9] Chen, C. Julian. “Introduction to Scanning Tunneling Microscopy SecondEdition.” (2008): 23-40. Print.

[10] Claridge, Shelley A., Jeffrey J. Schwartz, and Paul S. Weiss. “Electrons,Photons, and Force: Quantitative Single-Molecule Measurements fromPhysics to Biology.” ACS Nano 5.2 (2011): 693-729. Print.

[11] Cyr, Donna M., Bhawani Venkataraman, George W. Flynn, Andrew Black,and George M. Whitesides. “Functional Group Identification in ScanningTunneling Microscopy of Molecular Adsorbates.” The Journal of PhysicalChemistry 100.32 (1996): 13747-3759. Print.

[12] Gilles, Jerome. “The Bregman CookBook.” (2011): 1-21. Print.

[13] Gilles, Jerome, and Stanley Osher. “Bregman Implementation of Meyer’sG−norm for Cartoon + Textures Decomposition.” (n.d.): 1-5. Print.

[14] Goldstein, Tom, and Stanley Osher. “The Split Bregman Method for L1-Regularized Problems.” SIAM Journal on Imaging Sciences 2.2 (2009):323. Print.

[15] Kokkinos, I., G. Evangelopoulos, and P. Maragos. “Texture Analysisand Segmentation Using Modulation Features, Generative Models, andWeighted Curve Evolution.” IEEE Transactions on Pattern Analysis andMachine Intelligence 31.1 (2009): 142-57. Print.

20

[16] Kothe, Ullrich. “Edge and Junction Detection with an Improved StructureTensor.” (n.d.): n. pag. Print.

[17] Lindenbaum, M., M. Fischer, and A. Bruckstein. “On Gabor’s Contributionto Image Enhancement.” Pattern Recognition 27.1 (1994): 1-8. Print.

[18] Luxburg, Ulrike. “A Tutorial on Spectral Clustering.” Statistics and Com-puting 17.4 (2007): 395-416. Print.

[19] Moore, Amanda M., Sina Yeganeh, Yuxing Yao, Shelley A. Claridge, JamesM. Tour, Mark A. Ratner, and Paul S. Weiss. “Polarizabilities of Adsorbedand Assembled Molecules: Measuring the Conductance through BuriedContacts.” ACS Nano 4.12 (2010): 7630-636. Print.

[20] Ni, Kangyu, Xavier Bresson, Tony Chan, and Selim Esedoglu. “LocalHistogram Based Segmentation Using the Wasserstein Distance.” Inter-national Journal of Computer Vision 84.1 (2009): 97-111. Print.

[21] Perona, P., and J. Malik. “Scale Space and Edge Detection UsingAnisotropic Diffusion.” IEEE Trans. Patt. Anal. Mach. Intell. 12 (1990):629-39. Print.

[22] Prokhorov, V. V., D. V. Klinov, A. A. Chinarev, A. B. Tuzikov, I. V.Gorokhova, and N. V. Bovin. “High-Resolution Atomic Force MicroscopyStudy of Hexaglycylamide Epitaxial Structures on Graphite.” Langmuir 27(2011): 5879-890. Print.

[23] Rudin, L., S. Osher, and E. Fatemi. “Nonlinear Total Variation Based NoiseRemoval Algorithms.” Physica D: Nonlinear Phenomena 60.1-4 (1992):259-68. Print.

[24] Stabel, A., R. Heinz, J. P. Rabe, G. Wegner, F. C. De Schryver, D. Corens,W. Dehaen, and C. Sueling. “STM Investigation of 2D Crystals of AnthroneDerivatives on Graphite: Analysis of Molecular Structure and Dynamics.”The Journal of Physical Chemistry 99.21 (1995): 8690-697. Print.

[25] Vese, Luminita A., and Stanley J. Osher. “Modeling Textures with To-tal Variation Minimization and Oscillating Patterns in Image Processing.”Journal of Scientific Computing 19.1-3 (2003): 553-72. Print.

[26] Vese, Luminita A., and Tony F. Chan. “A Multiphase Level Set Frameworkfor Image Segmentation Using the Mumford and Shah Model.” Interna-tional Journal of Computer Vision 50.3 (2002): 271-93. Print.

[27] Wang, Xiao-Feng, De-Shuang Huang, and Huan Xu. “An Efficient Lo-cal Chan–Vese Model for Image Segmentation.” Pattern Recognition 43.3(2010): 603-18. Print.

21

Image Analysis and Classi cation in Scanning Tunneling ...bertozzi/WORKFORCE/Summer2012... · 1...

Documents

Transcript of Image Analysis and Classi cation in Scanning Tunneling ...bertozzi/WORKFORCE/Summer2012... · 1...