SPM_2008_Single-Pixel Imaging via Compressive Sampling_Cited98

9
1053-5888/08/$25.00©2008IEEE IEEE SIGNAL PROCESSING MAGAZINE [83] MARCH 2008 © DIGITAL VISION H umans are visual animals, and imaging sensors that extend our reach— cameras—have improved dramatically in recent times thanks to the intro- duction of CCD and CMOS digital technology. Consumer digital cameras in the megapixel range are now ubiquitous thanks to the happy coincidence that the semiconductor material of choice for large-scale electronics inte- gration (silicon) also happens to readily convert photons at visual wavelengths into elec- trons. On the contrary, imaging at wavelengths where silicon is blind is considerably more complicated, bulky, and expensive. Thus, for comparable resolution, a US$500 digi- tal camera for the visible becomes a US$50,000 camera for the infrared. In this article, we present a new approach to building simpler, smaller, and cheaper digital cameras that can operate efficiently across a much broader spectral range than conventional silicon-based cameras. Our approach fuses a new camera architecture [ Marco F. Duarte, Mark A. Davenport, Dharmpal Takhar, Jason N. Laska, Ting Sun, Kevin F. Kelly, and Richard G. Baraniuk ] Single-Pixel Imaging via Compressive Sampling [ Building simpler, smaller, and less-expensive digital cameras ] Digital Object Identifier 10.1109/MSP.2007.914730

description

This is about single pixel imaging via compression

Transcript of SPM_2008_Single-Pixel Imaging via Compressive Sampling_Cited98

1053-5888/08/$25.00©2008IEEE IEEE SIGNAL PROCESSING MAGAZINE [83] MARCH 2008

© DIGITAL VISION

Humans are visual animals, and imaging sensors that extend our reach—cameras—have improved dramatically in recent times thanks to the intro-duction of CCD and CMOS digital technology. Consumer digital cameras inthe megapixel range are now ubiquitous thanks to the happy coincidencethat the semiconductor material of choice for large-scale electronics inte-

gration (silicon) also happens to readily convert photons at visual wavelengths into elec-trons. On the contrary, imaging at wavelengths where silicon is blind is considerablymore complicated, bulky, and expensive. Thus, for comparable resolution, a US$500 digi-tal camera for the visible becomes a US$50,000 camera for the infrared.

In this article, we present a new approach to building simpler, smaller, and cheaperdigital cameras that can operate efficiently across a much broader spectral range thanconventional silicon-based cameras. Our approach fuses a new camera architecture

[Marco F. Duarte,

Mark A. Davenport,

Dharmpal Takhar,

Jason N. Laska, Ting Sun,

Kevin F. Kelly, and

Richard G. Baraniuk]

Single-Pixel Imaging viaCompressive Sampling

[Building simpler, smaller, and less-expensive digital cameras]

Digital Object Identifier 10.1109/MSP.2007.914730

based on a digital micromirror device (DMD—see “Spatial LightModulators”) with the new mathematical theory and algorithmsof compressive sampling (CS—see “CS in a Nutshell”).

CS combines sampling and compression into a single non-adaptive linear measurement process [1]–[4]. Rather than meas-uring pixel samples of the scene under view, we measure innerproducts between the scene and a set of test functions.Interestingly, random test functions play a key role, making eachmeasurement a random sum of pixel values taken across theentire image. When the scene under view is compressible by analgorithm like JPEG or JPEG2000, the CS theory enables us tostably reconstruct an image of the scene from fewer measure-ments than the number of reconstructed pixels. In this mannerwe achieve sub-Nyquist image acquisition.

Our “single-pixel” CS camera architecture is basically anoptical computer (comprising a DMD, two lenses, a single pho-ton detector, and an analog-to-digital (A/D) converter) that com-putes random linear measurements of the scene under view.The image is then recovered or processed from the measure-ments by a digital computer. The camera design reduces therequired size, complexity, and cost of the photon detector arraydown to a single unit, which enables the use of exotic detectorsthat would be impossible in a conventional digital camera. Therandom CS measurements also enable a tradeoff between spaceand time during image acquisition. Finally, since the cameracompresses as it images, it has the capability to efficiently andscalably handle high-dimensional data sets from applicationslike video and hyperspectral imaging.

This article is organized as follows. After describing the hard-ware, theory, and algorithms of the single-pixel camera in detail,we analyze its theoretical and practical performance and com-pare it to more conventional cameras based on pixel arrays andraster scanning. We also explain how the camera is informationscalable in that its random measurements can be used to direct-ly perform simple image processing tasks, such as target classi-fication, without first reconstructing the underlying imagery.We conclude with a review of related camera architectures and adiscussion of ongoing and future work.

THE SINGLE-PIXEL CAMERA

ARCHITECTUREThe single-pixel camera is an optical computer that sequentiallymeasures the inner products y[m] = 〈x, φm〉 between an N-pixel sampled version x of the incident light-field from the sceneunder view and a set of two-dimensional (2-D) test functions{φm} [5]. As shown in Figure 1, the light-field is focused bybiconvex Lens 1 not onto a CCD or CMOS sampling array butrather onto a DMD consisting of an array of N tiny mirrors (see“Spatial Light Modulators”).

Each mirror corresponds to a particular pixel in x and φm

and can be independently oriented either towards Lens 2 (corre-sponding to a one at that pixel in φm) or away from Lens 2 (cor-responding to a zero at that pixel in φm). The reflected light isthen collected by biconvex Lens 2 and focused onto a singlephoton detector (the single pixel) that integrates the productx[n]φm[n] to compute the measurement y[m] = 〈x, φm〉 as itsoutput voltage. This voltage is then digitized by an A/D convert-er. Values of φm between zero and one can be obtained bydithering the mirrors back and forth during the photodiodeintegration time. To obtain φm with both positive and negativevalues (±1, for example), we estimate and subtract the meanlight intensity from each measurement, which is easily meas-ured by setting all mirrors to the full-on one position.

To compute CS randomized measurements y = �x as in(1), we set the mirror orientations φm randomly using apseudorandom number generator, measure y[m], and thenrepeat the process M times to obtain the measurement vec-tor y. Recall from “CS in a Nutshell” that we can set

[FIG1] Aerial view of the single-pixel CS camera in the lab [5].

Object Light

Lens 1

DMD+ALP Board

Lens 2

Photodiode Circuit

SPATIAL LIGHT MODULATORS

A spatial light modulator (SLM) modulates the intensity of alight beam according to a control signal. A simple example ofa transmissive SLM that either passes or blocks parts of thebeam is an overhead transparency. Another example is a liq-uid crystal display (LCD) projector.

The Texas Instruments (TI) digital micromirror device (DMD)is a reflective SLM that selectively redirects parts of the lightbeam [31]. The DMD consists of an array of bacterium-sized,electrostatically actuated micromirrors, where each mirror inthe array is suspended above an individual static randomaccess memory (SRAM) cell (see Figure 6). Each mirror rotatesabout a hinge and can be positioned in one of two states(+10◦and −10◦ from horizontal) according to which bit isloaded into the SRAM cell; thus light falling on the DMD canbe reflected in two directions depending on the orientationof the mirrors.

The DMD micro-mirrors in our lab’s TI DMD 1100 developer’skit (Tyrex Services Group Ltd., http://www.tyrexservices.com)and accessory light modulator package (ALP, ViALUX GmbH,http://www.vialux.de) form a pixel array of size 1024 × 768.This limits the maximum native resolution of our single-pixelcamera. However, mega-pixel DMDs are already available forthe display and projector market.

IEEE SIGNAL PROCESSING MAGAZINE [84] MARCH 2008

M = O(K log(N/K)) which is � N when the scene beingimaged is compressible by a compression algorithm likeJPEG or JPEG2000. Since the DMD array is programmable,

we can also employ test functions φm drawn randomly froma fast transform such as a Walsh, Hadamard, or noiselettransform [6], [7].

CS is based on the recent understanding that a small collectionof nonadaptive linear measurements of a compressible signal orimage contain enough information for reconstruction and pro-cessing [1]–[3]. For a tutorial treatment see [4] or the article byRomberg in this issue.

The traditional approach to digital data acquisition samples ananalog signal uniformly at or above the Nyquist rate. In a digitalcamera, the samples are obtained by a 2-D array of N pixel sensorson a CCD or CMOS imaging chip. We represent these samplesusing the vector x with elements x[n], n = 1, 2, . . . , N. Since N isoften very large, e.g., in the millions for today’s consumer digitalcameras, the raw image data x is often compressed in the follow-ing multi-step transform coding process.

The first step in transform coding represents the image in termsof the coefficients {αi} of an orthonormal basis expansionx = ∑N

i=1 αiψi where {ψi}Ni=1 are the N × 1 basis vectors. Forming

the coefficient vector α and the N × N basis matrix� := [ψ1|ψ2| . . . |ψN] by stacking the vectors {ψi} as columns, wecan concisely write the samples as x = �α. The aim is to find abasis where the coefficient vector α is sparse (where only K � Ncoefficients are nonzero) or r-compressible (where the sorted coef-ficient magnitudes decay under a power law with scaling expo-nent −r). For example, natural images tend to be compressible inthe discrete cosine transform (DCT) and wavelet bases on whichthe JPEG and JPEG-2000 compression standards are based. The sec-ond step in transform coding encodes only the values and loca-tions of the K significant coefficients and discards the rest.

This sample-then-compress framework suffers from threeinherent inefficiencies: First, we must start with a potentiallylarge number of samples N even if the ultimate desired K issmall. Second, the encoder must compute all of the N transformcoefficients {αi}, even though it will discard all but K of them.Third, the encoder faces the overhead of encoding the locationsof the large coefficients.

As an alternative, CS bypasses the sampling process and direct-ly acquires a condensed representation using M < N linearmeasurements between x and a collection of test functions{φm}M

m=1 as in y[m] = 〈x, φm〉. Stacking the measurements y[m]into the M × 1 vector y and the test functions φT

m as rows into anM × N matrix � we can write

y = �x = ��α. (1)

The measurement process is nonadaptive in that � does notdepend in any way on the signal x.

The transformation from x to y is a dimensionality reductionand so loses information in general. In particular, sinceM < N, given y there are infinitely many x ′ such that �x ′ = y.The magic of CS is that � can be designed such thatsparse/compressible x can be recovered exactly/approximatelyfrom the measurements y.

While the design of � is beyond the scope of this review, anintriguing choice that works with high probability is a random

matrix. For example, we can draw the elements of � as i.i.d.±1 random variables from a uniform Bernoulli distribution[22]. Then, the measurements y are merely M different ran-domly signed linear combinations of the elements of x. Otherpossible choices include i.i.d., zero-mean, 1/N-varianceGaussian entries (white noise) [1]–[3], [22], randomly permut-ed vectors from standard orthonormal bases, or random sub-sets of basis vectors [7], such as Fourier, Walsh-Hadamard, orNoiselet [6] bases. The latter choices enable more efficientreconstruction through fast algorithmic transform implemen-tations. In practice, we employ a pseudo-random � driven bya pseudo-random number generator.

To recover the image x from the random measurements y, thetraditional favorite method of least squares can be shown to failwith high probability. Instead, it has been shown that using the�1 optimization [1]–[3]

α = arg min ‖α′‖1 such that ��α′ = y (2)

we can exactly reconstruct K-sparse vectors and closely approxi-mate compressible vectors stably with high probability using justM ≥ O(K log(N/K)) random measurements. This is a convexoptimization problem that conveniently reduces to a linear pro-gram known as basis pursuit [1]–[3]. There are a range of alter-native reconstruction techniques based on greedy, stochastic,and variational algorithms [4].

If the measurements y are corrupted by noise, then the solu-tion to the alternative �1 minimization, which we dub basis pur-suit with inequality constrains (BPIC) [3]

α = arg min ‖α′‖1 such that ‖y − ��α′‖2 < ε, (3)

satisfies ‖α − α‖2 < CNε + CKσK(x) with overwhelming probabil-ity. CN and CK are the noise and approximation error amplifica-tion constants, respectively; ε is an upper bound on the noisemagnitude, and σK(x) is the �2 error incurred by approximating αusing its largest K terms. This optimization can be solved usingstandard convex programming algorithms.

In addition to enabling sub-Nyquist measurement, CS enjoysa number of attractive properties [4]. CS measurements areuniversal in that the same random matrix � works simultane-ously for exponentially many sparsitfying bases � with highprobability; no knowledge is required of the nuances of thedata being acquired. Due to the incoherent nature of themeasurements, CS is robust in that the measurements haveequal priority, unlike the Fourier or wavelet coefficients in atransform coder. Thus, one or more measurements can be lostwithout corrupting the entire reconstruction. This enables aprogressively better reconstruction of the data as more meas-urements are obtained. Finally, CS is asymmetrical in that itplaces most of its computational complexity in the recoverysystem, which often has more substantial computationalresources than the measurement system.

CS IN A NUTSHELL

IEEE SIGNAL PROCESSING MAGAZINE [85] MARCH 2008

The single-pixel design reduces the required size, complexity,and cost of the photon detector array down to a single unit,which enables the use of exotic detectors that would be impossi-ble in a conventional digital camera. Example detectors includea photomultiplier tube or an avalanche photodiode for low-light(photon-limited) imaging (more on this below), a sandwich ofseveral photodiodes sensitive to different light wavelengths formultimodal sensing, a spectrometer for hyperspectral imaging,and so on.

In addition to sensing flexibility, the practical advantages ofthe single-pixel design include the facts that the quantum effi-ciency of a photodiode is higher than that of the pixel sensors ina typical CCD or CMOS array and that the fill factor of a DMDcan reach 90% whereas that of a CCD/CMOS array is only about50%. An important advantage to highlight is the fact that eachCS measurement receives about N/2 times more photons thanan average pixel sensor, which significantly reduces image dis-tortion from dark noise and read-out noise. Theoretical advan-tages that the design inherits from the CS theory include itsuniversality, robustness, and progressivity.

The single-pixel design falls into the class of multiplex cam-eras [8]. The baseline standard for multiplexing is classicalraster scanning, where the test functions {φm} are a sequence ofdelta functions δ[n − m] that turn on each mirror in turn. Aswe will see below, there are substantial advantages to operatingin a CS rather than raster scan mode, including fewer totalmeasurements (M for CS rather than N for raster scan) and sig-nificantly reduced dark noise.

IMAGE ACQUISITION EXAMPLESFigure 2(a) and (b) illustrates a target object (a black-and-whiteprintout of an “R”) x and reconstructed image x taken by the sin-gle-pixel camera prototype in Figure 1 using N = 256 × 256 andM = N/50 [5]. Figure 2(c) illustrates an N = 256 × 256 colorsingle-pixel photograph of a printout of the Mandrill test imagetaken under low-light conditions using RGB color filters and aphotomultiplier tube with M = N/10. In both cases, the imageswere reconstructed using Total Variation minimization, which isclosely related to wavelet coefficient �1 minimization [2].

STRUCTURED ILLUMINATION CONFIGURATIONIn a reciprocal configuration to that in Figure 1, we can illumi-nate the scene using a projector displaying a sequence of ran-dom patterns {φm} and collect the reflected light using a singlelens and photodetector. Such a “structured illumination” setuphas advantages in applications where we can control the lightsource. In particular, there are intriguing possible combinationsof single-pixel imaging with techniques such as three-dimen-sional (3-D) imaging and dual photography [9].

SHUTTERLESS VIDEO IMAGINGWe can also acquire video sequences using the single-pixelcamera. Recall that a traditional video camera opens a shutterperiodically to capture a sequence of images (called videoframes) that are then compressed by an algorithm like MPEGthat jointly exploits their spatiotemporal redundancy. In con-trast, the single-pixel video camera needs no shutter; we mere-ly continuously sequence through randomized test functionsφm and then reconstruct a video sequence using an optimiza-tion that exploits the video’s spatiotemporal redundancy [10].

If we view a video sequence as a 3-D space/time cube, thenthe test functions φm lie concentrated along a periodic sequenceof 2-D image slices through the cube. A naïve way to reconstructthe video sequence would group the corresponding measure-ments y[m] into groups where the video is quasi-stationary andthen perform a 2-D frame-by-frame reconstruction on eachgroup. This exploits the compressibility of the 3-D video cube inthe space but not time direction.

A more powerful alternative exploits the fact that even thougheach φm is testing a different 2-D image slice, the image slicesare often related temporally through smooth object motions inthe video. Exploiting this 3-D compressibility in both the spaceand time directions and inspired by modern 3-D video codingtechniques [11], we can, for example, attempt to reconstruct thesparsest video space/time cube in the 3-D wavelet domain.

These two approaches are compared in the simulation studyillustrated in Figure 3. We employed simplistic 3-D tensor prod-uct Daubechies-4 wavelets in all cases. As we see from the figure,3-D reconstruction from 2-D random measurements performs

almost as well as 3-D reconstruction from3-D random measurements, which arenot directly implementable with the sin-gle-pixel camera.

SINGLE-PIXEL CAMERA TRADEOFFSThe single-pixel camera is a flexiblearchitecture to implement a range ofdifferent multiplexing methodologies,just one of them being CS. In this sec-tion, we analyze the performance of CSand two other candidate multiplexingmethodologies and compare them tothe performance of a brute-force arrayof N pixel sensors. Integral to our analy-sis is the consideration of Poisson pho-

[FIG2] Single-pixel photo album. (a) 256 × 256 conventional image of a black-and-whiteR. (b) Single-pixel camera reconstructed image from M = 1, 300 random measurements(50× sub-Nyquist). (c) 256 × 256 pixel color reconstruction of a printout of the Mandrilltest image imaged in a low-light setting using a single photomultiplier tube sensor, RGBcolor filters, and M = 6, 500 random measurements.

(a) (b) (c)

IEEE SIGNAL PROCESSING MAGAZINE [86] MARCH 2008

ton counting noise at the detector, which is image dependent.We conduct two separate analyses to assess the “bang for thebuck” of CS. The first is a theoretical analysis that providesgeneral guidance. The second is an experimental study thatindicates how the systems typically perform in practice.

SCANNING METHODOLOGIESThe four imaging methodologies we consider are:

■ Pixel array (PA): a separate sensor for each of the N pixelsreceives light throughout the total capture time T. This isactually not a multiplexing system, but we use it as the goldstandard for comparison.■ Raster scan (RS): a single sensor takes N light measure-ments sequentially from each of the N pixels over the cap-ture time. This corresponds to test functions {φm} that aredelta functions and thus � = I. The measurements y thusdirectly provide the acquired image x.■ Basis scan (BS): a single sensor takes N light measure-ments sequentially from different combinations of the N pix-els as determined by test functions {φm} that are not deltafunctions but from some more general basis [12]. In ouranalysis, we assume a Walsh basis modified to take the values0/1 rather than ±1; thus � = W, where W is the 0/1 Walshmatrix. The acquired image is obtained from the measure-ments y by x = �−1 y = W−1 y.■ CS: a single sensor takes M ≤ N light measurementssequentially from different combinations of the N pixelsas determined by random 0/1 test functions {φm} .Typically, we set M = O(K log(N/K )) which is � N whenthe image is compressible. In our analysis, we assume thatthe M rows of the matrix � consist of randomly drawnrows from a 0/1 Walsh matrix that are then randomly per-muted (we ignore the first row con-sisting of all ones). The acquiredimage is obtained from the meas-urements y via a sparse reconstruc-tion algorithm such as BPIC (see“CS in a Nutshell”).

THEORETICAL ANALYSISIn this section, we conduct a theoreti-cal performance analysis of the abovefour scanning methodologies in termsof the required dynamic range of thephotodetector, the required bit depth ofthe A/D converter, and the amount ofPoisson photon counting noise. Ourresults are pessimistic in general; weshow in the next section that the aver-age performance in practice can beconsiderably better. Our results aresummarized in Table 1. An alternativeanalysis of CS imaging for piecewisesmooth images in Gaussian noise hasbeen reported in [13].

DYNAMIC RANGEWe first consider the photodetector dynamic range requiredto match the performance of the baseline PA. If each detectorin the PA has a linear dynamic range of 0 to D, then it is easyto see that single-pixel RS detector need only have that samedynamic range. In contrast, each Walsh basis test functioncontains N/2 ones and so directs N/2 times more light to thedetector. Thus, BS and CS each require a larger lineardynamic range of 0 to ND/2. On the positive side, since BSand CS collect considerably more light per measurementthan the PA and RS, they benefit from reduced detector non-idealities like dark currents.

QUANTIZATION ERRORWe now consider the number of A/D bits required within therequired dynamic range to match the performance of the base-line PA in terms of worst-case quantization distortion. Definethe mean-squared error (MSE) between the true image x andits acquired version x as MSE = 1

N ‖x − x‖22 . Assuming that

each measurement in the PA and RS is quantized to B bits, theworst-case mean-squared quantization error for the quantizedPA and RS images is MSE = √

N D2−B−1 [14]. Due to its larg-er dynamic range, BS requires B + log2 N bits per measure-ment to reach the same MSE distortion level. Since thedistortion of CS reconstruction is up to CN times larger thanthe distortion in the measurement vector (see “CS in aNutshell”), we require up to an additional log2 CN bits permeasurement. One empirical study has found roughly that CN

lies between eight and ten for a range of different randommeasurement configurations [15]. Thus, BS and CS require ahigher-resolution A/D converter than PA and RS to acquire animage with the same worst-case quantization distortion.

[FIG3] Frame 32 from a reconstructed video sequence using (top row) M = 20, 000 and(bottom row) M = 50, 000 measurements (simulation from [10]). (a) Original frame of anN = 64 × 64 × 64 video of a disk simultaneously dilating and translating. (b) Frame-by-frame 2-D measurements + frame-by-frame 2-D reconstruction; MSE = 3.63 and 0.82,respectively. (c) Frame-by-frame 2-D measurements + joint 3-D reconstruction; MSE = 0.99and 0.24, respectively. (d) Joint 3-D measurements + joint 3-D reconstruction; MSE = 0.76and 0.18, respectively.

(a) (b) (c) (d)

IEEE SIGNAL PROCESSING MAGAZINE [87] MARCH 2008

PHOTON COUNTING NOISEIn addition to quantization error from the A/D converter, eachcamera will also be affected by image-dependent Poisson noisedue to photon counting [16]. We compare the MSE due to pho-ton counting for each of the scanning methodologies. The detailsare worked out in the Appendix and are summarized in Table 1.We see that the MSE of BS is about three times that of RS.Moreover, when M < N/(3C 2

N ), the MSE of CS is lower thanthat of RS. We emphasize that in the CS case, we have only a fair-ly loose upper bound and that there exist alternative CS recon-struction algorithms with better performance guarantees, suchas the Dantzig selector [3].

SUMMARYFrom Table 1, we see that the advantages of a single-pixel cam-era over a PA come at the cost of more stringent demands on thesensor dynamic range and A/D quantization and larger MSE dueto photon counting effects. Additionally, the linear dependence ofthe MSE on the number of image pixels N for BS and RS is apotential deal-breaker for high-resolution imaging. The promis-ing aspect of CS is the logarithmic dependence of its MSE on Nthrough the relationship M = O(K log(N/K )).

EXPERIMENTAL RESULTSSince CS acquisition/reconstruction methods often performmuch better in practice than the above theoretical bounds

suggest, in this section we conduct a simple experiment usingreal data from the CS imaging testbed depicted in Figure 1.Thanks to the programmability of the testbed, we acquiredRS, BS, and CS measurements from the same hardware. Wefixed the number of A/D converter bits across all methodolo-gies. Figure 4 shows the pixel-wise MSE for the capture of aN = 128 × 128 pixel “R” test image as a function of the totalcapture time T. Here the MSE combines both quantizationand photon counting effects. For CS we took M = N/10 totalmeasurements per capture and used a Daubechies-4 waveletbasis for the sparse reconstruction.

We make two observations. First, the performance gain of BSover RS contradicts the above worst-case theoretical calcula-tions. We speculate that the contribution of the sensor’s darkcurrent, which is not accounted for in our analysis, severelydegrades RS’s performance. Second, the performance gain of CSover both RS and BS is clear: images can either be acquired inmuch less time for the same MSE or with much lower MSE inthe same amount of time.

INFORMATION SCALABILITY AND THE SMASHED FILTERWhile the CS literature has focused almost exclusively onproblems in signal and image reconstruction or approxima-tion, reconstruction is frequently not the ultimate goal. Forinstance, in many image processing and computer visionapplications, data is acquired only for the purpose of making adetection, classification, or recognition decision. Fortunately,the CS framework is information scalable to a much widerrange of statistical inference tasks [17]–[19]. Tasks such asdetection do not require reconstruction, but only require esti-mates of the relevant sufficient statistics for the problem athand. Moreover, in many cases it is possible to directly extractthese statistics from a small number of random measurementswithout ever reconstructing the image.

The matched filter is a key tool in detection and classifica-tion problems that involve searching for a target template in ascene. A complicating factor is that often the target is trans-formed in some parametric way—for example the time orDoppler shift of a radar return signal; the translation and rota-tion of a face in a face recognition task; or the roll, pitch, yaw,and scale of a vehicle viewed from an aircraft. The matched filterdetector operates by forming comparisons between the giventest data and all possible transformations of the template to findthe match that optimizes some performance metric. Thematched filter classifier operates in the same way but chooses

[TABLE 1] COMPARISON OF THE FOUR CAMERA SCANNING METHODOLOGIES.

PIXEL ARRAY RASTER SCAN BASIS SCAN COMPRESSIVE SAMPLING

NUMBER OF MEASUREMENTS N N N M ≤ N

DYNAMIC RANGE D D ND2

ND2

QUANTIZATION (TOTAL BITS) NB NB N(B + log2 N) M(B + log2 N + log2 CN + 1)

PHOTON COUNTING MSE PT N P

T (3N − 2) PT < 3C2

NM PT

[FIG4] Average MSE for RS, BS, and CS single-pixel images as afunction of the total image capture time T (real data).

0 50 100 1500.01

0.02

0.03

0.04

0.05

0.06

0.07

Total Capture Time (S)

Mea

n S

quar

e E

rror

Raster Scan

Basis Scan

CS Imaging

IEEE SIGNAL PROCESSING MAGAZINE [88] MARCH 2008

the best match from a number of different potential trans-formed templates.

The naïve approach to matched filtering with CS would firstreconstruct the images under consideration and then apply astandard matched filtering technique. In contrast, the smashedfilter (for dimensionally reduced matched filter) performs all ofits operations directly on the random measurements [18].

The two key elements of the smashed filter are the general-ized likelihood ratio test and the concept of an image appear-ance manifold [20]. If the parametric transformation affectingthe template image is well behaved, then the set of trans-formed templates forms a low-dimensional manifold in thehigh-dimensional pixel space RN with the dimension K equalto the number of independent parameters. (For the purposesof the discussion here, a K-dimensional manifold can be inter-preted as a K-dimensional hypersurface in RN .) Thus, thematched filter classifier can be interpreted as classifying a testimage according to the closest template manifold in RN andno reconstruction is necessary.

The smashed filter exploits a recent result that the struc-ture of a smooth K-dimensional manifold in RN is preservedwith high probability under a random projection to the lowerdimensional space RM as long as M = O(K log N ) [21]. Thisis reminiscent of the number of measurements required forsuccessful CS but with K now the manifold dimension. Thus,to classify an N-pixel test image, we can alternatively compareM random measurements of the test image to the M-dimen-sional projections (using the same �) of the candidate imagetemplate manifolds. The upshot is that all necessary computa-tions can be made directly in RM rather than in RN . As in theconventional matched filter, a by-product of the closest mani-fold search is an estimate of the template parameters that bestmatch the test image. Previous work in the computer sciencecommunity (the other “CS”) has also employed the Johnson-Lindenstrauss lemma [22] to reduce the data dimensionalityprior to computing features for classification; however, theyhave not considered the intrinsic manifold structure mani-fested in many image processing andcomputer vision settings.

Figure 5 demonstrates the effective-ness of smashed filtering for the task ofclassifying an N = 128 × 128 pixel testimage under an unknown translation inthe vertical and horizontal directions(hence K = 2). The three classes corre-spond to different translations of a bus,truck, or tank. The test data was generat-ed randomly from one of the three class-es. The random measurements wereproduced using a simulated single-pixelCS camera that takes into account thePoisson photon counting noise associatedwith a total measurement interval oflength T. We average over 10,000 itera-tions of each experiment. We see that

increasing the number of measurements improves performanceat first; however, performance then degrades due to the reducedtime available to obtain each measurement. Correspondingly,increasing the total capture time improves the performance ofthe algorithm.

OTHER MULTIPLEXING CAMERA ARCHITECTURESTwo notable existing DMD-driven imaging applications involveconfocal microscopy (which relates closely to the raster scanstrategy studied above) [23], [24] and micro-optoelectromechani-cal (MOEM) systems [12], [25], [26]. In a MOEM system, a DMDis positioned between the scene and the sensing array to performcolumnwise multiplexing by allowing only the light from thedesired pixels to pass through. In [12] and [25] the authors pro-pose sets of N Hadamard patterns, which enables simple demul-tiplexing, and randomized Hadamard patterns, which yield auniform signal-to-noise ratio among the measurements.

[FIG6] (a) Schematic of two mirrors from a Texas Instruments digital micromirror device(DMD). (b) A portion of an actual DMD array with an ant leg for scale. (Image provided byDLP Products, Texas Instruments.)

Mirror −10°Mirror +10°

Hinge

Yoke

Landing TipCMOS

Substrate(a) (b)

[FIG5] Smashed filter image classification performance plotted asa function of the number of random measurements M from asimulated single-pixel CS camera and the total acquisition time T.

0 20 40 60 80 1000

20

40

60

80

100

Number of Measurements, M

Cla

ssifi

catio

n R

ate,

%

T = 1,000t

T = 100t

T = 10tT = t

IEEE SIGNAL PROCESSING MAGAZINE [89] MARCH 2008

IEEE SIGNAL PROCESSING MAGAZINE [90] MARCH 2008IEEE SIGNAL PROCESSING MAGAZINE [90] MARCH 2008

Other compressive cameras developed to date include [27],and [28], which employ optical elements to perform transformcoding of multispectral images. These designs obtain sampledoutputs that correspond to coded information of interest, suchas the wavelength of a given light signal or the transform coeffi-cients in a basis of interest. The elegant hardware designed forthese purposes uses optical projections, group testing, and sig-nal inference. Recent work in [29] has compared several singleand multiple pixel imaging strategies of varying complexities;their simulation results for Poisson counting noise agree closelywith those above.

Finally, in [30] the authors use CS principles and a random-izing lens to boost both the resolution and robustness of a con-ventional digital camera.

CONCLUSIONSFor certain applications, CS promises to substantially increasethe performance and capabilities of data acquisition, processing,and fusion systems while lowering the cost and complexity ofdeployment. A useful practical feature of the CS approach is thatit off-loads processing from data acquisition (which can be com-plicated and expensive) into data reconstruction or processing(which can be performed on a digital computer, perhaps noteven colocated with the sensor).

We have presented an overview of the theory and practice ofa simple yet flexible single-pixel architecture for CS based on aDMD spatial light modulator. While there are promising poten-tial applications where current digital cameras have difficultyimaging, there are clear tradeoffs and challenges in the single-pixel design. Our current and planned work involves betterunderstanding and addressing these tradeoffs and challenges.Other potential avenues for research include extending the sin-gle-pixel concept to wavelengths where the DMD fails as a mod-ulator, such as THz and X rays.

ACKNOWLEDGMENTSThanks to Dave Brady for suggesting the Poisson noise analy-sis, to Dennis Healy and Courtney Lane for many enlighteningdiscussions, and to Michael Wakin for his many contributions.This work was supported by grants DARPA/ONR N66001-06-1-2011 and N00014-06-1-0610, NSF CCF-0431150, ONRN00014-07-1-0936, AFOSR FA9550-07-1-0301, ARO W911NF-07-1-0502, ARO MURI W311NF-07-1-0185, and the TexasInstruments Leadership University Program. Special thanks toTI for providing the DMD developer’s kit and accessory lightmodulator package.

AUTHORSMarco F. Duarte ([email protected]) received the B.S. degree incomputer engineering (with distinction) and the M.S. degree inelectrical engineering from the University of Wisconsin atMadison in 2002 and 2004, respectively. He is currently a Ph.D.student at Rice University. He spent the summer of 2006 at RicohInnovations, Palo Alto, California. His research interests includepattern recognition, sensor networks, compressive sensing and

optical image coding. He received the Presidential Fellowship andthe Texas Instruments Distinguished Fellowship in 2004 fromRice University. He is a member of Tau Beta Pi and a StudentMember of IEEE and SIAM.

Mark A. Davenport ([email protected]) received the M.S. degreein electrical engineering in 2007 and the B.S. degree in electri-cal engineering in 2004, both from Rice University. He is cur-rently pursuing a Ph.D. in electrical engineering at RiceUniversity. His research interests include sparse approximation,high-dimensional geometry, and machine learning.

Dharmpal Takhar ([email protected]) is currently a Ph.D. stu-dent at Rice University. He received the B.Tech. degree in metal-lurgical engineering and materials science and the M.Tech.degree in ceramics and composites, both from I.I.T. Bombay. Hiscurrent interests include compressed sensing-based imagingtechniques and atomic level investigation of functionalized car-bon nanotubes.

Jason N. Laska ([email protected]) completed the B.S. degreein computer engineering from the University of Illinois–Urbana-Champaign in 2005. He is currently an M.S. student in electricalengineering at Rice University. His research interests includesignal processing (especially for audio and images), sparseapproximation, and algorithms.

Ting Sun ([email protected]) is currently a Ph.D. student inapplied physics at Rice University. Her interests include opticalimaging systems and compressive sensing.

Kevin F. Kelly ([email protected]) is an assistant professor ofElectrical and Computer Engineering at Rice University. Heearned a B.S. in engineering physics from the Colorado School ofMines in 1993 and the M.S. and Ph.D. degrees in applied physicsfrom Rice University in 1996 and 1999, respectively. He was apost-doctoral researcher with Toshio Sakurai in Japan and withPenn State University. His research interests include molecularelectronics, carbon nanotubes, scanning tunneling microscopy,compressed sensing, and near-field microscopy.

Richard G. Baraniuk ([email protected]) is the Victor E.Cameron Professor of Electrical and Computer Engineering atRice University. His research interests lie in new theory, algo-rithms, and hardware for signal processing and imaging. He is aFellow of the IEEE and has received National Young InvestigatorAwards from the National Science Foundation and the Office ofNaval Research, the Rosenbaum Fellowship from the Isaac NewtonInstitute of Cambridge University, and the ECE Young AlumniAchievement Award from the University of Illinois. He has receivedthe George R. Brown Award for Superior Teaching at Rice threetimes, received the C. Holmes MacDonald National OutstandingTeaching Award from Eta Kappa Nu, and was selected as one ofEdutopia Magazine’s Daring Dozen Education Innovators in 2007.

REFERENCES[1] D.L. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, pp.1289–1306, Sept. 2006.

[2] E.J. Candès and T. Tao, “Near optimal signal recovery from random projections:Universal encoding strategies?,” IEEE Trans. Inform. Theory, vol. 52, pp.5406–5425, Dec. 2006.

[3] E.J. Candès, “Compressive sampling,” in Proc. Int. Cong. Mathematicians,Madrid, Spain, vol. 3, 2006, pp. 1433–1452.

IEEE SIGNAL PROCESSING MAGAZINE [91] MARCH 2008

[4] R.G. Baraniuk, “Compressive sensing,” IEEE Signal Processing Mag., vol. 24,no. 4, pp. 118–120, 124, 2007.

[5] D. Takhar, J.N. Laska, M.B. Wakin, M.F. Duarte, D. Baron, S. Sarvotham, K.F.Kelly, and R.G. Baraniuk, “A new compressive imaging camera architecture usingoptical-domain compression,” in Proc. Computational Imaging IV, vol. 6065, SanJose, CA, 2006, pp. 43–52.

[6] R. Coifman, F. Geshwind, and Y. Meyer, “Noiselets,” Appl. Comp. Harmon.Anal., vol. 10, no.1, pp. 27–44, 2001.

[7] E.J. Candès and J. Romberg, “Sparsity and incoherence in compressive sam-pling,” Inverse Prob., vol. 23, pp. 969–985, June 2007.

[8] D. Brady, “Multiplex sensors and the constant radiance theorem,” Opt. Lett.,vol. 27, no. 1, pp. 16–18, 2002.

[9] P. Sen, B. Chen, G. Garg, S. Marschner, M. Horowitz, M. Levoy, and H. Lensch,“Dual photography,” ACM Trans. Graph., vol. 24, no. 3, pp. 745–755, 2005.

[10] M.B. Wakin, J.N. Laska, M.F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K.F.Kelly, and R.G. Baraniuk, “Compressive imaging for video representation and cod-ing,” in Proc. Picture Coding Symp., Beijing, China, 2006.

[11] A. Secker and D. Taubman, “Motion-compensated highly scalable video com-pression using an adaptive 3D wavelet transform based on lifting,” in Proc. IEEEInt. Conf. Image Processing, vol. 2, Thessaloniki, Greece, 2001, pp. 1029–1032.

[12] R.A. DeVerse, R.R. Coifman, A.C. Coppi, W.G. Fateley, F. Geshwind, R.M.Hammaker, S. Valenti, and F.J. Warner, “Application of spatial light modulators fornew modalities in spectrometry and imaging,” in Spectral Imaging: Instrumentat.,Applicat. Anal. II, vol. 4959, pp. 12–22, 2003.

[13] J. Haupt and R. Nowak, “Compressive sampling versus conventional imaging,”in Proc IEEE Int. Conf. Image ., Atlanta, GA, 2006, pp. 1269–1272.

[14] R.M. Gray and D.L. Neuhoff, “Quantization,” IEEE Trans. Inform. Theory, vol.44, pp. 2325–2383, Oct. 1998.

[15] E.J. Candès, J. Romberg, and T. Tao, “Stable signal recovery from incompleteand inaccurate measurements,” Comm. Pure Appl. Math., vol. 59, pp. 1207–1223,Aug. 2006.

[16] R. Constantini and S. Susstrunk, “Virtual sensor design,” in Sens. CameraSyst. Scientif. Ind. Digital Photo. Applicat. V, vol. 5301, pp. 408–419, June 2004.

[17] M.F. Duarte, M.A. Davenport, M.B. Wakin, and R.G. Baraniuk, “Sparse signaldetection from incoherent projections,” in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, Toulouse, France, 2006, pp. III–872–875.

[18] M.A. Davenport, M.F. Duarte, D. Takhar, J.N. Laska, K.K. Kelly, and R.G.Baraniuk, “The smashed filter for compressive classification and target recogni-tion,” in Computat. Imag. V, San Jose, CA, vol. 6498, pp. 142-153, Jan. 2007.

[19] J. Haupt, R. Castro, R. Nowak, G. Fudge, and A. Yeh, “Compressive samplingfor signal classification,” in Proc. Asilomar Conf. Signals, Systems, Computers,Pacific Grove, CA, Oct. 2006, pp. 1430-1434.

[20] M.B. Wakin, D.L. Donoho, H. Choi, and R.G. Baraniuk, “The multiscale struc-ture of non-differentiable image manifolds,” in Proc. Wavelets XI, vol. 5914, SanDiego, CA, 2005, pp. 413-429.

[21] R.G. Baraniuk and M.B. Wakin, “Random projections of smooth manifolds,”submitted for publication.

[22] R.G. Baraniuk, M. Davenport, R.A. DeVore, and M.B. Wakin, “A simple proof ofthe restricted isometry property for random matrices,” submitted for publication.

[23] P.M. Lane, R.P. Elliott, and C.E. MacAulay, “Confocal microendoscopy withchromatic sectioning,” in Spectral Imaging: Instrumentat., Applicat. Anal. II, vol.4959, pp. 23–26, July 2003.

[24] V. Bansal, S. Patel, and P. Saggau, “High-speed confocal laser scanningmicroscopy using acousto-optic deflectors and a digital micromirror device,” inProc. Three-Dimensional Multidimensional Microscopy: Image AcquisitionProcessing XI, vol. 5324, 2004, San Jose, CA, pp. 47–54.

[25] G.L. Davis, M. Maggioni, F.J. Warner, F.B. Geshwind, A.C. Coppi, R.A. DeVerse,and R.R. Coifman, “Hyper-spectral analysis of normal and malignant microarraytissue sections using a novel micro-optoelectrical-mechanical system,” ModernPathology, vol. 17 Supplement 1, p. 358A, 2004.

[26] R. Muise and A. Mahalanobis, “Target detection using integrated hyper spec-tral sensing and processing,” presented at Proc. IMA Workshop Integration ofSensing Processing, Minneapolis, MN, Dec. 2005.

[27] N.P. Pitsianis, D.J. Brady, and X. Sun, “Sensor-layer image compression basedon the quantized cosine transform,” in Proc. SPIE Visual Information ProcessingXIV, vol. 5817, Orlando, FL, 2005, p. 250.

[28] D.J. Brady, M. Feldman, N. Pitsianis, J.P. Guo, A. Portnoy, and M. Fiddy,“Compressive optical MONTAGE photography,” in Proc. SPIE Photonic DevicesAlgorithms Computing VII, vol. 5907, San Diego, CA, 2005, pp. 44–50.

[29] J. Ke and M. Neifeld, “Optical architectures for compresssive imaging,” Appl.Opt., vol. 46, pp. 5293–5303, Aug. 2007.

[30] R. Fergus, A. Torralba, and W.T. Freeman, “Random lens imaging,” MITComputer Science and Artificial Intelligence Laboratory, Cambridge, MA, Tech.Rep. MIT-CSAIL-TR-2006-058, Sept. 2006.

[31] J. Sampsell, “An overview of the digital micromirror device (DMD) and itsapplication to projection displays,” in Proc. SID Int. Symp. Digest TechnicalPapers, vol. 24, May 1993, p. 1012.

APPENDIX: POISSON PHOTON COUNTING CALCULATIONSThis appendix derives the average MSE of the four cameraschemes studied in the “Single-Pixel Camera Tradeoffs” sec-tion under Poisson counting noise (the last row of Table 1).Let x be the estimated version of the ideal image x. Assumingthat the pixel estimates are unbiased and independent withvariances σ 2

x[n], we calculate that

E [MSE] = E[

1N

‖x − x‖22

]= 1

N

N∑n=1

σ 2x[n]. (4)

We now briefly review the Poisson model of photon detec-tion. Consider a point in the scene under view that emits pho-tons at a rate of P photons/s; then over τ s the number ofphotons follows a Poisson distribution with mean and vari-ance λ = Pτ . To form an unbiased estimate P of the rate, wecollect and count photons with a photodetector over τ s andthen normalize the count by dividing by τ . The variance ofthis estimator is then σ 2

P= P/τ . To simplify the analysis we

assume that the photon rate of each image pixelx[n], n = 1, 2, . . . , N is an independent and identically dis-tributed (i.i .d.) Gaussian random variable with meanμx[n] = P. Let the total image capture time be T s.

In the PA, each pixel x[n] has its own dedicated sensor thatcounts photons for the entire period T. The number of receivedphotons p[n] is Poisson with mean λ[n] = Tx[n].The time nor-malized measurement x[n] = p[n]T has varianceσ 2

x[n]= (x[n]/ T), and thus as N → ∞ we have

E [MSE] = 1N

N∑n=1

σ 2x [n] = 1

N

N∑n=1

x [n]T

≈ PT

. (5)

The RS is similar, except that one sensor is time shared overeach of the N pixels, meaning that the time for each measure-ment is reduced from T to T/N. Accounting for this, we obtainas N → ∞ that E [MSE] ≈ N P

T .In BS and CS, the single sensor measures the sum of a group

of N/2 pixels for T/N s in BS and T/M s in CS. As N → ∞, thephoton rate incident at the sensor is approximately(NE [x[n]]/2), and so each measurement follows a Poisson dis-tribution with mean and variance (P T/2) for BS and (PTM/2N)

for CS. The time-normalized measurement y[m] thus has vari-ance (N 2 P/2T) for BS and (NMP/2T)for CS. Continuing forBS, we estimate the image as x = W −1 y. The variance of thepixel estimates is (2(N + 1)P/ T) and thus the average MSEequals E [MSE] = [(3N − 2)P/ T].

For CS, we estimate the image via a nonlinear optimizationthat requires that the entries of � be ±1/

√M rather than 0/1.

Hence, we correct the measurements y by recentering and scal-ing, which changes the measurement variances to (3NP/ T) andresults in the measurement average squared error normE [‖y − y‖2

2] = ∑Mn=1 σ 2

y [n]= (3MNP/ T) . Assuming that we

reconstruct the image using a technique such as BPIC (see “CSin a Nutshell”), we obtain E [MSE] ≤ 3C 2

N M PT . [SP]