Optodigital neural network classifier · Optodigital neural network classifier Alain Bergeron...

Optodigital neural network classifier

Alain Bergeron Abstract. A two-layer neural network architecture for carrying out optoNational Optics Institute digital classification operations is proposed. The optical neural network 369 Franquet implementation is suitable for pattern recognition and classification into Sainte-Foy, Quebec, Canada G1 P 4N8 digital format. The neural network is based on an optical correlator, an E-mail: [email protected] optoelectronic threshold, and an optodigital encoder. The module needs

only one laser light source, and the light propagation from the input to the Henri H. Arsenault, FELLOW 5PIE output is uninterrupted. The output is the class of the input pattern enUniversite Laval coded in a digital format. Experimental results using different images and COPL classes are presented. The classification can be changed arbitrarily by Sainte-Foy, Quebec, Canada G1 K 7P4 changing the encoding mask. © 1997 Society of Photo-Optical Instrumentation

Engineers. [50091-3286(97)02011-4J

Michel Doucet Subject terms: optical neural networks; correlator; optoelectronic threshold; optoLuc Veilleux digital encoder. Denis Gingras

National Optics Institute Paper 25027 received Feb. 23,1997; revised manuscript received June 16, 1997; 369 Franquet accepted for publication June 17, 1997. Sainte-Foy, Quebec, Canada G1 P 4N8

1 IntrOduction 2 Classifier Architecture

The neural network is an emerging tool for pattern recog The classifier architecture is composed of three main modnition, target tracking, and many image-related processing_ ules inserted between an input and an output layer. The So far, many optical implementations have been oriented complete architecture of the optical neural network is towards associative memories 1•

2 where the degraded input shown in Fig. 1. The input is displayed on a liquid-crystal television screen illuminated with a collimated laser beam. image is reconstructed to provide better image quality. The first module, a correlator,5 performs the detection ofVarious results published on associative memories in the the object. The correlation peak output is processed with literature are promising. However, this strategy supposes

that the prime user of the system is a human being, because the improved output, an image, still needs to be analyzed by a human operator. Automatic target recognition often requires an entirely autonomous system, i.e., one where the output can be directly used by a control system.

In order to achieve this kind of system, many optical neural networks rely on electronics or software to process information coming out of the optical neural network. This approach was in great part imposed by the lack of reliable nonlinear optical or optoelectronics operations. The result is usually an optical system performing a large amount of computation in a short period of time, but the whole process is slowed down by input-output electronic data transfer.

In this paper, an optodigital neural network classifier is proposed. The system classifies an input object into a binary optical signal intended to be compatible with numerical system formats. The detection of the input object is achieved with a correlator coupled to an optoelectronic thresholding module,3 which allows uninterrupted-opticalpath propagation. The threshold also permits one to reduce the noise content in the output of the first neural layer. The output signal from the first layer is then directly forwarded to an optodigital encoder,4 which permits the optical conversion into a digital code. So all the massive processing operations are performed with optics, and the output is a compact digital sequence providing classification of the in

flRST LAYER: INPUT -CORRELATOR ·DETECTION

Beam spliner

Camera CCD

Laser

Filter

FourierLens

Input Image

Collimator

Polarizer I Fourier

1~ --=!=f==Len=s~ II CCDCa~ra

~ ".1

~ Liquid

Fourier I Crystal

LHf--Le_n_S,-_f'_::::;:;:::;-:::::;::;~::-:;:;:;;;;;:~:;:-t1~ICViSion OPTOELECTRONIC THRESHOLD Output

oflhe correlator

put. The system could also be used as an automated tracking system by an appropriate choice of masks and filters. Fig. 1 Architecture of the optodigital neural classifier.

3134 Opt. Eng. 36(11) 3134-3139 (November 1997) 0091-3286/97/$10.00 © 1997 Society of Photo-Optical Instrumentation Engineers

mailto:[email protected]

Bergeron et al.: Optodigital neural network classifier

the optoelectronic threshold module. This layer performs a nonlinear operation on the correlation performed by the optical correlator. The third module is a Dammann-gratingbased optodigital converter. The processed output of the first layer, a delta-like function, is converted into a binary optical code to be compatible with numerical systems.

The input image e(x,y) is injected into the system via a transparent mask, although a liquid-crystal television screen could also be used for real-time operation. The incident collimated beam is spatially modulated by the mask and forwarded to the correlator. When a filter with impulse response h I (x ,y) is used in the Fourier plane, the output is given by the correlation:

where sex ,Y) represents the output and the subscript stands for the layer number. The choice of the filter h1(x,y) is important because it directly sets the input of the nonlinear function module. The two-dimensional correlation plane sex ,y) is then input to the optoelectronic thresholds. This module implements the nonlinear function, and is used to clean the output from the correlator. To clean the correlation, the image presented to the input is replicated by means of a beamsplitter and forwarded to a CCD camera. The resulting image is then mapped to a liquid-crystal television screen located further in the light propagation path. The two-dimensional spatial topology of the replicated optical signal is preserved, and the correlation plane undergoes a modification corresponding to its intensity value. The overall effect of the optoelectronic thresholding module is to attenuate small light intensities and to leave highintensity light levels unchanged. The threshold is set by the CCD saturation level. If the intensity of a point incoming on the camera is lower than the threshold value, the LCTV will attenuate the beam. If the intensity is equal to or higher than the CCD saturation level, the LCTV is fully opened and the light beam passes unchanged. The overall threshold value can be set with an attenuator located in the tapping path. The output of the nonlinear processor will correspond to a cleaned delta-like function. Mathematically this is expressed by

s(x,y) = 8(x - X o+xI'Y- Yo +YI)' (2)

where X o and Yo are the positions of the object in the input scene, whereas xI and YI are the filter locations in the spatial domain.

To process the information, one could analyze the output with a computer. However, a new generation of spatial light modulators promises rates of thousands of frames per second, which would create a data flow bottleneck and make the optical processor totally useless. In order to overcome this problem, it is possible to go one step further, with an optodigital encoder as a third module. If the input object is centered in the input scene, the position of the output will be imposed by the filter position. Provided that many filters are spatially multiplexed, the maximumcorrelation-peak position will correspond to the position of the memory object that correlates the most with the input object. So, if many objects are encoded in the filter plane,

the position of the maximum correlation peak will identify the object at the input. Because each memory object is different, only one maximum correlation peak is obtained. The processed output of the nonlinear dimensional function will be a single delta-like function whose position identifies the object.

A Dammann-grating-based optodigital position encoder4

converts an input luminous point into a digital code corresponding to the position of the object. It takes advantage of the fact that a correlation peak is narrow and that it can represent a one when it has a high value, and a zero when it has a low value. This is especially true if an optoelectronic threshold module, which binarizes the correlation output, is used.

The second-layer input scene is duplicated by means of a Dammann grating. The multiple images are then projected onto binary encoding masks, each of which encodes one bit of the bit sequence representing the position of the peak. If a point is on the right side of a mask, a one will be coded; on the left side, a zero. If the light transmitted by individual masks is collected on a detector, the thresholded signals provided by the detectors constitute a digital representation of the peak position and are compatible with numerical systems.

The output of the system can thus be detected with the help of single detectors (one detector behind each mask subregion) instead of using standard CCD cameras. So the second layer avoids the use of CCD cameras and likewise the need for processing a whole image at the output. In fact, only a fewer single detectors have to be used, so, instead of waiting for each frame of a CCD camera and processing the information with a computer, the answer can be obtained at very high speed, because the few single detectors can be driven at very high rate. To process an M X M pixel image, only 2 log2M masks are needed, so a 512X 512 image will require 16 output detectors. Since single detectors can run much faster than cameras, the data flow bottleneck is eliminated by the optics, and the overall system is in fact only limited by the SLM speed.

3 Learning

The learning phase of the digital classifier is simple, because there is no modification of weights and it is performed off line. The filter of the first correlation is composed of spatially multiplexed images h li(x ,y). In order to provide a high energy efficiency and sharp peaks, a phaseonly filter is used. The filter is coded in order to give to the correlator the following impulse response:

N

hl(x,Y)=L hli(x-xli,Y-Y/J· (3) 1=1

The filter is easily obtained by Fourier-transforming hI (x ,Y) and keeping only the phase of the transform. The choice of the position of each memory template, (Xli 'YI;) , sets the location of the peak. The classification can thus be performed by changing the memory template position or by changing the numerical encoding sequence. It should also be noted that the encoding sequence can associate the same coding to two different templates. So the classifier allows one to classify similar objects in the same numerically encoded category or to join two different kinds of object into

Optical Engineering, Vol. 36 No. 11, November 1997 3135


(a) (b) (c)

Fig. 2 Three examples of image used for the classification. The acid symbol (a) was not recorded in the memory.

the same class. The system has a reduced translation invariance, since the object cannot translate by more than the corresponding dimension of an encoding section in the encoding plane. Finally, if only one template is recorded in the correlator, the system is transformed into a real-time tracking system provided that a SLM such as a liquidcrystal television screen is used in the input plane.

The number of neurons can be easily obtained from the dimensions of the support used to display the images. For an M XM image, the number of input values is determined by the size of the modulator used to display the image (M 2

). A 512X 512 modulator yields 2.62X 105 input values. Each point of the optoelectronic threshold plane performs a nonlinear operation on the correlation plane. Each correlation point corresponds to set of multiplications, and integrations. So the number of neurons is given by the number of resolvable points in the optoelectronic threshold (M 2

). The neural classifier would show 2.62X lOS neurons for a 512X 512-pixel modulator, each neuron processing 2.62 X 105 input values.

In the second layer, there cannot be more inputs than the number of positions to be coded. This number is dependent

Fig. 3 Memory template used for the phase-only filter generation in the first layer.

3136 Optical Engineering, Vol. 36 No. 11, November 1997

on the size of the memory. For example, a 512X 512-pixel reference memory with reference templates of 64x 64 pixels gives 64 inputs (N). These inputs are classified according to the binary mask of the second layer. As six sections are required to encode 64 inputs (log2 N), there will be six weights in the second layer for 64 templates in the memory. Finally, six single detectors are used to detect the overall results. The last nonlinear function can be considered to be applied to the electrical signal provided by these six detectors. So the second layer really uses six neurons.

From this example it is to be noted that the input image of 2.62X 105 pixels is completely processed by the system and the result to be forwarded to a computer only takes 6 bits of space. This shows, beyond the analysis of the number of neurons, the real capabilities of the system. The system is only limited in speed by the SLM refreshing rate.

4 Experimental Results

The optodigital architecture of Fig. 1 was built. Both the input image and the correlator filter were recorded on highresolution photographic film with a laser writer. The input images were binary with 256 X256 pixels. The numerical sequence was also recorded on the same type of film. For these experiments, a 320X 200 pixel LCTV operated at a video frame rate (30 frame/s) was used in the optoelectronic threshold. The extinction ratio was limited to around 2% by the LCTV contrast ratio (=50: 1). The Dammann grating used has a 20-,um pitch with a diffraction efficiency of approximately 65%.

Four images were recorded in the first-layer memory (wheat, biological hazard, fire, and skull). Five images were presented at the input. Two experiments were performed. In the first one, the objects in the memory were each assigned to a different encoded class. The acid symbol, not included in the memory, did not cause any response. In the second experiment, the skull and the biological hazard symbol were assigned to the same class.

Figure 2 shows three input images: the acid, biological hazard, and wheat symbols. Figure 3 shows the information recorded in the filter. The correlations obtained are shown in Fig. 4. The correlation of Fig. 4(a) was obtained with the image of Fig. 2(a) as the input. In the same manner, Fig.


(a) (b) (c)

Fig. 4 Correlation of the input images-(a) acid, (b) wheat, (c) biological hazard-with the template of Fig. 3. The maximum correlation peak depends on the input object, and its location depends on its corresponding position in the filter template.

(a) (b) (c)

Fig. 5 Correlations produced with the skull (b) and the fire (c) symbols cleaned with the optoelectronic threshold (referred to the third and fourth positions in the filters, see Fig. 2). Because the correlation value of the acid symbol (a) is not high enough, no energy is transmitted to the second layer. The skull and fire symbols can be assimilated to a delta function.

Zone for Zone for Zone for the first the second the third replica replica replica

Fig. 6 Binary pattern used for class encoding. The cleaned correlation is imaged on each of the three vertical bands. The two bands on the left encode the class, and one on the right encodes the presence or absence of an object. The white zones are transmissive, and the class can be read on a horizontal band from left to right.

4(b) corresponds to Fig. 2(b) and Fig. 4(c) to Fig. 2(c). Because the acid symbol was not included in the memory, the correlation with the acid symbol gave rise to only small cross-correlation values [Fig. 4(a)]. The wheat pattern produces a maximum correlation value at the first position, whereas the biological hazard symbol produces a bright peak at the second position. The skull and the fire symbol produce the same kind of results. Figure 5 shows the output for the skull and fire symbols (the third and fourth positions) after the optoelectronic threshold. In the acid-symbol correlation plane, all the values vanish. The skull and the fire produce clean delta-like functions at the third and fourth positions.

The image is then replicated laterally by means of the Dammann grating to reproduce three identical images. The replicated output of the thresholded results is multiplied by the encoding mask of Fig. 6. The results are presented in



(a) (b) (c)

Fig. 7 Classification of the acid symbol (a) in class 000, the skull (b) in class 011, and the fire symbol (c) in class 111.

Fig. 8 Second classification template. The codes for the third and the fourth position (vertical) are the same and will include two different objects in the same class.

(a) (b) (c)

Fig. 9 Second classification for the acid (a) 000, the skull (b) 011, and the fire symbol (c) 011. Both skull and fire symbols are now part of the same class.

Fig. 7. The skull peak, located in the third position, is multiplied with a dark mask, producing a zero, and a transparent mask, producing a one. The last transparent mask indicates the presence of an object. So the whole encoded class is OIl. The fire symbol, located in the fourth position, is encoded with three transparent masks, producing a class 111. If the encoding mask is changed for the one of Fig. 8, the skull and the fire symbol will be encoded with the same numerical sequence (011). The acid will still be encoded in the class 000. These results are shown in Fig. 9. From those results it should be clear that the system is fully invariant under translation along the horizontal axis, whereas vertically an object can translate by only one-fourth the total image height.

The overall system performs two Fourier transformations and three image multiplications. The capacity of the system is only limited by the LCTV frame rate. For example, with a commercially available ferroelectric SLM at 1000 frames/s, the classifier performs 4.4X 109 operations/s and 4.5 X 1013 interconnections/so This system, inspired by

3138 Optical Engineering, Vol. 36 No. 11, November 1997

neural networks, is inherently robust because crosscorrelations are eliminated in the first layer. An operation range of 50% of the maximum value is also provided by the binary encoding scheme. An intensity lower than 50% of the maximum correlation value is set to zero, whereas an intensity above 50% of the maximum correlation value is set to one. The overall path is uninterrupted because the system output directly comes from the laser.

5 Conclusion

An optodigital neural network classifier has been implemented. The experiments performed yielded correct classification for both objects included in the memory and not in the memory. The classifier combines the classical possibilities of the optical correlator with the nonlinear capabilities of the neural networks via the optoelectronic thresholder, to the compatibility of digital optics with the optodigital encoder. Coupled with LCTVs and cameras, it can be made versatile, and it can be modified easily to provide a real


time tracking system. This system could be used in many applications because of its inherent compatibility with digital systems.

Acknowledgment

This research was supported by grants from the Natural Sciences and Engineering Research Council of Canada (NSERC), from the Fonds pour la formation des chercheurs et I' aide it la recherche (FCAR) program of Quebec, and from the JSTF program of the Canadian Ministry of External Affairs.

References 1. N. H. Farhat, D. Psaltis, A. Prata, and E. Paek, "Optical implemen

tation of the Hopfiel model," Appl. Opt. 24, 1469-1475 (1985). 2. E. Paek and D. Psaltis, "Optical associative memory using Fourier

transform holograms," Opt. Eng. 26, 428-433 (1987). 3. A. Bergeron, H. H. Arsenault, E. Eustache, and D. Gingras, "Opto

electronic thresholding module for winner-take-all operations in optical neural networks," Appl. Opt. 33, 1463-1468 (1994).

4. A. Bergeron, H. H. Arsenault, and D. Gingras, "Dammann-gratingbased optodigital position converter," Opt. Lett. 20, 1895-1897 (1995).

5. A. van der Lugt, "Signal detection by complex filtering," IEEE Trans. Inf Theory IT-lO, 139-145 (1964).

Alain Bergeron received his BSc degree in physics engineering at Universite Laval in 1987. He completed his MSc in computer generated holograms in 1988 at the same university. Until 1991 he worked in research and development at the National Optics Institute (NOI) on graded reflectivity mirrors and fiber optic sensors. He then undertook his PhD studies in optical implementation of neural networks in a joint project of NOI, Universite Laval, and the

Communication Research Laboratory of Japan. Since 1994, he has been a researcher at NOI and he is currently in charge of the processors and algorithms group in the Canadian Optical Computing Consortium, OPCOM. His current fields of interest include pattern recognition systems, optical computing, neural networks, and vision systems.

Henri H. Arsenault is a professor in the Department of Physics at Laval University in Quebec City, Canada. He is the author of more than 100 publications in optical and digital information processing, pattern recognition, optical computing, and artificial intelligence. He is a fellow of the Optical Society of America and of SPIE, the International Society of Optical Engineering. He has filled a number of functions in optical societies. He is coeditor of the book

Optical Processing and Computing, is coauthor of the book An Introduction to Optics in Computers, and has contributed chapters to various books.

Michel Doucet received his BSc degree in physics in 1988 from Universite du Quebec a Chicoutimi, Canada, and his MSc degree in optics in 1991 from Universite Laval, Quebec, Canada. He has been a researcher at the National Optics Institute since 1992, working on the development of optical correlators, 3D laser measurement systems, sensors for plastic sorting, and sensors related to machine vision sys

tems. His research interests include optical information processing, machine vision, pattern recognition, and speckle.

Luc Veilleux received his DEC in physics technology at the CEGEP of La Pocatiere, Quebec, in 1992. Since 1992, he has been a technologist at the National Optics Institute (NOI). He has been working on thin film deposition, electronic control, and guided wave device realization. He is currently working in the Digital and Optical System Sector on projects related to optical correlators, 3D vision, and neural networks.

Denis Gingras received his BSc and MSc degrees in electrical engineering from Laval University in 1980 and 1984, respectively, and his DrS in 1989 from the Ruhr-Universistat Bochum, Germany. His work has been on signal and image processing. From 1989 to 1990, he was a STA fellowship award recipient as a guest researcher at the Communication Research Laboratory in Tokyo, Japan. He is currently director of the Digital and Optical Systems Sector at the National Optics Institute, in Quebec City, Canada. His current research interests include signal and image processing, neural networks, and artificial vision. Dr. Gingras is a member of IEEE, INNS, EURASIP, and SPIE.


Optodigital neural network classifier · Optodigital neural network classifier Alain Bergeron...

Documents

Transcript of Optodigital neural network classifier · Optodigital neural network classifier Alain Bergeron...