Smart Cameras a Review

Smart Cameras: A Review

Yu Shi*, Serge Lichman

Interfaces, Machines And Graphical ENvironments (IMAGEN)

National Information and Communications Technology Australia (NICTA)

Australian Technology Park, Bay 15 Locomotive Workshop

Eveleigh, NSW 1430, Australia

*Corresponding Author. Tel.: +61 2 8374 5565; Fax: +61 2 8374 5527. E-mail Addresses: [email protected], [email protected]

Abstract

Smart cameras are cameras that can perform tasks far beyond simply taking photos and recording videos. Thanks to

the purposely built-in intelligent image processing and pattern recognition algorithms, smart cameras can detect

motion, measure objects, read vehicle number plates, and even recognize human behaviors. They are essential

components to build active and automated control systems for many applications, and they will play significant role in

our daily life in the near future. This paper aims to provide a first comprehensive review of smart camera technologies

and applications. Here, we analyse the reasons behind the recent rapid growth of the smart cameras, discuss different

categories of them and review their system architectures. We also examine their intelligent algorithms, features and

applications. Finally we conclude with a discussion on design issues, challenges and future technological directions.

Keywords: smart cameras, pattern recognition, machine vision, computer vision, video surveillance, embedded

systems.

1 Introduction

What is a smart camera? Different researchers and camera manufacturers offer different definitions.

There does not seem to be a well-established and agreed-upon definition in either the video surveillance

or machine vision industries, probably the two most active and advanced applications for smart cameras

at present. For the purpose of this paper, we define a smart camera as a vision system in which the

primary function is to produce a high-level understanding of the imaged scene and generate application-

specific data to be used in an autonomous and intelligent system. The idea of smart cameras is to convert

data to knowledge by processing information where it becomes available, and transmit only results that

are at a higher level of abstraction. A smart camera is ‘smart’ because it performs application specific

information processing (ASIP), the goal of which is usually not to provide better quality images for

human viewing but to understand and describe what is happening in the images for the purpose of better

decision-making in an automated control system. For example, a motion-triggered surveillance camera

captures video of a scene, detects motion in the region of interest, and raises an alarm when the detected

motion satisfies certain criteria. In this case, the ASIP is motion detection and alarm generation.

The important differences between a smart camera and “normal” cameras, such as consumer digital

cameras and camcorders, lie in two aspects. The first is in camera system architecture. A smart camera

usually has a special image processing unit containing one or more high performance microprocessors to

run intelligent ASIP algorithms, in which the primary objective is not to improve images quality but to

extract information and knowledge from images. The image processing hardware in normal cameras is

usually simpler and less powerful with the main aim being to achieve good visual image quality. The

other main difference is in the primary camera output. A smart camera outputs either the features

extracted from the captured images or a high-level description of the scene, which is fed into an

automated control system, while for normal cameras the primary output is the processed version of the

captured images for human consumption. For this reason, normal video cameras have large output

bandwidth requirements (in direct proportion to the resolution of the image sensor used), while smart

camera can have very low data bandwidth requirements at the output (it can be just one bit in the simplest

case, with ‘1’ meaning ‘there is motion’ and ‘0’ meaning ‘there is no motion’, for example). These

differences are illustrated in figure 1.

imagesensing

imageprocessing

video to TV display or digital displayfor human consumption

imagesensing ASIP

app. specific datageneration andcommunication

meta data to an automatedcontrol system for decision making

(a)

(b)

image/video outputgeneration andcommunication

Figure 1: Differences between a normal camera (a) and a smart camera (b).

Smart cameras can exist where a camera is not expected to be. A good example is the ubiquitous optical

mouse for PC. Most optical mice contain a miniature digital video camera inside the mouse casing. They

work by shining a bright light onto the surface below, then using a camera to take up to 1 500 pictures a

second of that surface. An intelligent image processing circuit inside the mouse performs image

enhancement and calculates the mouse motion based on image difference between successive frames.

This difference is then used to displace the mouse cursor on the screen. The optical mouse is a good

example of a smart camera in three respects: firstly it’s a stand-alone camera with camera and processing

in a single embedded device; secondly the camera is used not to take pictures or video for human

consumption, but to produce a feature vector (motion vector of in x and y directions) to represent the

object (the mouse in this case) displacement; thirdly it shows that smart cameras are not restricted to a

niche market, but can be adopted ubiquitously.

Strictly speaking, a smart camera is a stand-alone, self-contained device that integrates image sensing,

ASIP and communications in one single box. It is designed for a special type of application (for example,

surveillance, and industrial machine vision). However, there are other types of vision systems that are

often referred to as smart cameras as well, such as PC-based smart cameras. We’ll analyze these different

types of smart cameras in section 3. The term ‘smart camera’ in this paper covers both stand-alone smart

cameras and other types of smart cameras, as described in section 3.1, unless specified otherwise.

The advent of smart cameras can be traced back to the early 1990s when PCs became popular and video

frame grabbers became available. Early solid state CCD (Charge-Coupled Device) cameras of the mid-

1970s were analog cameras. Later digital signal processing (DSP) technologies pushed analog CCD

cameras into the digital era with enhanced image quality, but the output of most of these cameras was still

analog (e.g. NTSC/PAL signals). Frame grabbers allowed CCD cameras with analog output to be

connected to computers and digitized for versatile processing by computers. This marked the beginning of

smart camera systems, with the camera performing image capture and computer carrying out intelligent

processing tasks such as motion detection and shape recognition. The first applications were in the area of

industrial machine vision and surveillance.

The real interest in and the growth of smart cameras started in late 1990s and early 2000s, spurred by

factors such as technological advancements in chip manufacturing, embedded system design, the coming-

of-age of CMOS (Complementary Metal Oxide Semiconductor) image sensors and so on. Market

demands from surveillance and machine vision also played significant roles. Advanced smart camera

systems often integrate the latest technologies in image sensors, optics, imaging systems, embedded

systems, computer vision, video analysis and communication, networking and etc.

The heart of smart cameras is the intelligent ASIP algorithms and the hardware that runs them. Image

feature extraction and pattern recognition are probably among the most widely used algorithms in smart

cameras. In a way, a smart camera can be thought of as an image feature extractor or a visual pattern

recognizer. Research in computer vision, image understanding and pattern recognition has yielded many

algorithms and solutions that can be used by smart cameras. However, the performance and robustness of

the ASIP algorithms when deployed into cameras operating under real-world conditions are among the

most important issues facing the development and commercialization of new smart cameras.

In the remainder of this paper, we analyze the main reasons behind the rapid growth of smart cameras

(section 2), review system architectures of different smart cameras (section 3), review the state-of-the-art

smart camera systems and ASIP algorithms for some applications (section 4), and finally discuss some

design issues and conclude with some thoughts about technical challenges and future technological

directions (section 5).

2 The Rapid Growth of Smart Cameras

2.1 Coming of Age of CMOS Image Sensors The advent of CMOS image sensors (CIS) in late 1990s played an important role in the development of

smart camera technology and systems, and has potential to make smart camera smaller, cheaper and more

pervasive. Compared to CCD, CIS have several advantages which make them excellent candidates for

smart camera front-end. These include smaller size, cheaper manufacturing cost, lower power

consumption, the ability to build a camera-on-a-chip, the ability to integrate intelligent processing circuits

onto the sensor chip, and significantly simplified camera system design.

Most CIS’s are manufactured using the same process by which semiconductor chips (CPUs, memories,

etc) are made. This means that many semiconductor manufacturers can make CIS, which drives up

competition and reduces cost. CCD sensors, by contrast, are made using special chip manufacturing

process and there are only a few manufacturers in the world, mostly in Japan. CCD-based camera chip-

sets usually include at least three or four chips: a CCD pixel array, CDS (Correlated Double Sampling), a

timing generator, and ADC (Analog-to-Digital Converter). In the case of CIS, all these functions can be

integrated onto one single chip, making it a real camera-on-a-chip with light in and pixel out. This greatly

simplifies camera system design and reduces cost. Compared with the CCD chip-set, there are many more

sources from which a CIS can be purchased, even a single item at a time, which is very difficult to

achieve in the case of CCD. All this makes it much easier for more researchers, students, and camera

manufacturers alike to develop smart cameras of their own.

Probably the most important advantage of CIS over CCD lies in its ability to have image sensor array

and intelligent image processing circuits side by side on the same chip. This makes a single chip smart

camera possible. One example is a vision-based single-chip fingerprint reader with on-chip CIS, a

processing circuitry performing pattern matching and a memory storing templates of one or several user

fingerprints for real-time comparison and identification [1].

A recent market survey by Gartner Dataquest [2] estimated that there are about 40 suppliers of CIS

world-wide, and that the global CIS market would increase from $3.2 billion in 2005 to 5.6 billion by

2008. The survey showed that automobile, medical imaging and surveillance applications are among the

emerging markets for CIS products.

2.2 Research in Computer Vision and Pattern Recognition What makes a camera smart is the intelligent ASIP - the application-specific information processor

built into the camera system. The advancement in academic and industrial research in real-time image

processing and understanding, pattern recognition, machine learning, computer vision and video

communication continues to provide a large library of intelligent algorithms for use by smart cameras for

different applications. As an example, Intel’s OpenCV (Open Source Computer Vision) Library [3] has

been very popular with academic researchers and students working on smart camera projects. Every year,

numerous international journals, conferences and workshops give researchers world-wide forums to

present their innovative work in areas such as computer vision and pattern recognition. A lot of the work

presented can be seen as embryos of future smart cameras. Recently, first ever international conferences

and workshops have been held focusing on the design of embedded vision systems.

2.3 Embedded System Technologies A stand-alone smart camera is essentially an embedded vision system. Compared with PC-based

systems, an embedded system is usually subject to many constraints on the design, implementation and

production of the device which encapsulates it, such as low power, limited resources, real-time processing

and low cost. An embedded vision system is even more challenging to design due to video processing’s

insatiable demand for computing power and memory resources. In the last decade, embedded vision

systems have made great progress thanks to the increasing affordability of powerful processors and

memory chips, availability of real-time operating systems, low complexity intelligent algorithms and the

coming-of-age of system development software and tools.

Functional integration seems to be a trend in consumer electronics and ICT (Information and

Communications Technology). For example, many cellular phones now come with a camera and can play

music and receive radio. Some webcams have built-in intelligence such as face tracking. Functional

integration can seemingly make a normal camera become smart. For example, a camera with an

integrated voice/sound detection component can take a picture of the surrounding area when a human

voice is detected, or it can take a picture in a direction from which a gun-shot has been detected [4].

2.4 Socio-Economical Drivers Thanks to Moore’s law, semiconductor chips and computer hardware continue to shrink in size, reduce

in cost and gain in performance. This has driven the prices of cameras, frame grabbers and computers

down and made smart camera systems, especially PC-based systems, more affordable to research and

development on one hand and to the market and end-users on the other. As hardware constraints (cost-

wise) are lifted, software developers have more freedom to write "smarter" algorithms.

One of the most significant developments in surveillance and security industries in the last several years

has been the wide use of CCTV (Closed Circuit Television) cameras and their impact on crime, terrorist

attacks, and on the general public. It is noticeable that after the 9/11 event in the US, video surveillance

has received more attention not only from the academic community, but also from industry and

governments. The recent terrorist attacks in the London Underground in mid-2005 and the successful use

of CCTV by police in identification of perpetrators have intensified the talk about a new generation of

intelligent video surveillance systems based on smart cameras. In fact, surveillance and security demands

are an important driving force behind the ever-increasing scale of academic and industrial research in

advanced vision algorithms such as object tracking and identification, and human behavior analysis.

2.5 Market Demands and Analysis

2.5.1 Digital Video Surveillance The first generation of CCTV cameras (1980s-1990s) was mostly analog cameras with limited

functionality and high cost. Digital CCTV cameras and the use of DVR (Digital Video Recorders)

represented the second generation (2G, 1990s-now). Digital CCTV cameras built using CCD and CMOS

image sensors provide better video quality, some intelligent functions such as motion detection, electronic

PTZ (Pan-Tilt-Zooming), and networking. The 2G CCTV systems have become mass market products,

fuelled by improved affordability and society’s increasing concerns over safety and security. According

to estimates made in 2004 by market research firm Datamonitor [5], digital video surveillance is a high-

growth segment within the overall surveillance market estimated at 55% CAGR (Compound Annual

Growth Rate) between 2003 and 2007. In dollar terms, between 2003 and 2007 the market will grow from

US$1.3bn to US$7.4bn globally.

However, the 2G CCTV systems are not “smart” enough to help prevent crimes or terror attacks, even

though they proved very useful in post-event identification of crime perpetrators. The 2G CCTV systems

are mostly not automated systems and rely strongly on trained security personnel to perform image

analysis, object tracking and identification. The increasing number of cameras makes this difficult for

real-time analysis by security personnel. Network bandwidth is another important issue affecting real-

time processing needed for crime prevention. The intelligent video surveillance system (IVSS) (also

called the third generation CCTV system) will try to provide solutions to these problems. Smart cameras

will be one of the fundamental building blocks of the IVSS, making it possible to build and deploy

automated, distributed and intelligent multi-sensory surveillance systems capable of tracking humans and

suspected objects, analyzing human behaviors, and etc. Many market research firms have predicted

significant growth in intelligent video systems and smart cameras. For example, the market researcher

Frost & Sullivan [6] has forecast that the US$153.7 million video surveillance software market is

expected to witness a healthy CAGR of 23.4% from 2004 to 2011 to reach US$670.7 million.

2.5.2 Industry Machine Vision Industrial machine vision is probably the birth place of smart cameras, at least in terms of the

systematic use of commercial smart cameras. It is also one of their most active playgrounds. Most

machine vision smart cameras are stand-alone cameras. The demand for these cameras has been steadily

increasing over the years. The major end user industries are in robotics, semiconductor, electronics,

pharmaceutical, manufacturing, food, plastics and printing. The tasks these smart cameras usually

perform include bar-code reading, part inspection, flaw detection, surface inspection, dimensional

measurement, assembly verification, print verification, object sorting, OCR (optical character

recognition) and maintenance. A recent survey on machine vision products from a Europe based market

research firm IMS Research [7] has discovered that smart cameras are rapidly accounting for a greater

share of the machine vision market revenue. Demand for smart cameras is primarily driven by the

increasing demand for better production efficiency and quality control in industries such as manufacturing

and medicine / pharmaceutical. The survey revealed that whilst the sale of more traditional PC-based

products (cameras and frame grabbers) has fallen, sales of smart cameras and compact vision systems

have continued to grow. The survey predicts that the machine vision market in Europe will grow at an

average rate of 11.6% each year to 2006. The highest levels of growth, approaching 20%, are forecast for

the smart sensor and cameras product groups resulting in more than doubling in value in dollar terms. The

same trend has also been forecast by the same company for the Asia-Pacific market [8]. An estimate

provided by the annual market study by the AIA (Automated Imaging Association) for the 2003 North

American machine vision smart camera market is about $57 million US dollars, with growth at 15% per

year in terms of revenues and 20% per year in terms of units [9].

2.5.3 Other Significant Markets Other important markets for smart cameras are ITS (Intelligent Transport Systems), automobiles, HCI

(Human Computer Interface), medical/healthcare, games, toys, video conferencing, biometrics.

3 Review of Smart Camera System Architectures

In recent years, smart cameras have attracted considerable attention from academic and industrial

research and development (R&D) organizations. However, to the best of the authors’ knowledge, a

systematic approach to analyzing smart cameras has yet to be agreed-upon. In this section we firstly

present one approach to classify smart camera systems and provide an analysis of their system

architectures, followed by a review of some R&D activities on the design of smart cameras as embedded

systems.

3.1 Classification of Smart Cameras

Smart cameras can come in different system and physical configurations. Figure 2 shows one proposed

classification of different types of vision systems and smart cameras.

Vision Systems

EmbeddedVision Systems

PC basedVision Systems

Network basedVision Systems

HybridVision Systems

Stand-aloneSmart Cameras

Non Stand-aloneSmart Cameras

Single ChipSmart Cameras

PC-based Smart Cameras

Networked Smart Cameras

Other types ofSmart Cameras

Figure 2: One proposed classification of vision systems and smart cameras.

As shown in Figure 2, stand-alone smart cameras are a subset of embedded vision systems. Non-stand-

alone embedded smart cameras are sometimes called compact vision systems. Compact vision systems

are usually composed of general purpose cameras connected to an external embedded processing unit in a

separate box to provide ASIP and communication/networking functionality. Single-chip smart cameras

can be thought of as a special case of smart cameras because they require special system design

considerations and are usually used in carefully targeted applications. Non-stand-alone smart cameras can

be thought of as virtual smart cameras because from user point of view the cameras are smart, even

though the ASIP which makes them smart may be performed by an external unit, like a hardware

accelerator board, a local PC or a networked PC. PC-based smart cameras, consisting of a general purpose

video camera, a frame-grabber of some sort and a PC, of which the CPU performs the ASIP, is a very

common and inexpensive platform for researchers, academics and students to conduct research on smart

cameras. Sometimes a normal camera is connected to a PCI (Peripheral Component Interconnect)

processing board within a PC. In this case, the PCI board may perform most of the ASIP and output

generation, while the PC provides a flexible operator interface or additional processing power. This kind

of system is a special case of a compact vision system and a PC-based system. A digital CCTV

surveillance system with intelligent features is an example of a network-based smart camera system, and

the next generation of distributed intelligent video surveillance systems will be the exciting test ground

for smart cameras, especially stand-alone smart cameras. Hybrid vision systems may give rise to some

special types of smart cameras. This category may also include smart camera systems that may need some

kind of human intervention to help provide high accuracy data output.

3.2 Analysis of Different Types of Smart Cameras

3.2.1 Common Characteristics The common basic components of a normal digital video camera (consumer, professional or industrial)

include optics, solid-state image sensor (CCD or CMOS), image processor(s) and supporting hardware,

output generator, and communication ports. The main tasks performed by the image processor(s) are to

provide color interpolation, color correction or saturation, gamma correction, image enhancement and

camera control such as white balance and exposure control. The output generator can be an NTSC/PAL

encoder to provide standard TV-compatible output, or a video compression engine to provide compressed

video streams for communication over network, or digital video output generator such as a Firewire

encoder. Communication ports, such as Ethernet or RS232 provide the basis for networked camera

functionality or camera configuration and firmware upgrading through a PC respectively.

The main basic components of a smart camera typically exhibit all the above essential components of a

normal camera, with the following differences:

• A smart camera has a distinct and powerful signal processing unit to perform image feature

extraction and/or pattern analysis based on application-specific requirements; and

• A smart camera has an output generator to produce a coded representation of the image features

and/or results from the pattern matching, or in some cases, control signals for other devices (e.g.

alarm triggering signal) or actions (e.g. sending a picture of the number plate of a car which is

speeding to police).

System architecture design for smart cameras often involves significant system engineering effort.

Clear application requirements and specifications are crucial to the successful design. Software

architecture, hardware architectures, and network architecture for network-based systems, need to be

jointly designed to maximize resource usage and efficiency, and to reduce cost and time-to-completion.

More detailed design considerations are discussed in section 5.1.

3.2.2 Stand-alone Smart Cameras A stand-alone smart camera integrates image capture, ASIP and application specific output generation

into a single device casing. A stand-alone smart camera may look very much like a normal industrial

camera or a CCTV camera. While the primary function of a normal camera is to provide raw video for

monitoring and recording, a smart camera is usually designed to perform specific, repetitive, high-speed

and high-accuracy tasks in industries such as machine vision and surveillance. Most of the industry

machine vision cameras are stand-alone smart cameras. While a normal video camera may only cost

anywhere between US$50 and US$2 000, a machine vision smart camera can cost between US$1 000 and

$6 000 per unit [10] and beyond, depending on the functionality and level of customization.

Many pattern recognition techniques involve two types of processing tasks, data-intensive tasks such as

image enhancement and feature extraction, and math-intensive tasks such as statistical pattern matching.

While data-intensive tasks require high speed hardware to deal with high pixel volume and high frame

rate, math-intensive tasks often require high performance processors to deal with issues such as pipelining

and floating-point arithmetic. For demanding applications, camera hardware architecture may be based on

a heterogeneous- and multiple-processor platform, with one or more processor(s) capable of

implementing parallel processing (e.g. an FPGA - Field Programmable Gate Array) performing data

intensive tasks, and a DSP and/or a RISC (Reduced Instruction Set Computer) processor performing

math-intensive tasks. A smart camera built for face detection and recognition application by Broers et al.

[11] is such an example. The system employs an FPGA and a parallel processor Xetal working in SIMD

(Single Instruction Multiple Data) mode, to perform data intensive operations such as face detection. A

high performance DSP, TriMedia, with a VLIW (Very Long Instruction Word) core is used to perform

high level programs such as face recognition. The system architecture can be represented as in Figure 3.

Image sensor andAFE/ADC Blocks

CameraControl

MemorySystem

Communications/network interfaces

Math-Intensive ProcessingBlock

TriMedia

System Data Bus

Data-Intensive ProcessingBlock

FPGA Xetal

Figure 3: A stand-alone smart camera system architecture for face recognition [11].

3.2.3 Single-Chip Smart Cameras Single-board or single-chip smart cameras are a special kind of stand-alone smart camera. Single chip

smart cameras take advantage of the integration capability of CMOS image sensors by building intelligent

ASIP circuits onto the image sensor chip, potentially releasing the host computer of cumbersome pixel

processing tasks and minimizing the data transfer between camera and computer. In some cases, pixel-

level ADC and processing can be achieved [12], which can lead to a brand new level of signal and image

processing methodologies. Single-chip smart cameras make it possible to design very efficient, very

small, low power and low cost cameras (when a large volume is produced). As examples, the VISoc

single chip smart camera [13] integrates a 320x256 pixel CMOS image sensor, a RISC processor, a vision

co-processor and I/O onto a single chip, which has been fabricated in a 0.35µm process on an area of

about 36mm2, and a typical power dissipation of about 1W at 3.3V at 60MHz. Moorhead et al. [14]

designed a smart CMOS camera chip which integrates an edge detection mechanism directly into the

sensor array. Lee et al. [15] reported the design of a 30 frames/second VGA-format CMOS image sensor

with an embedded massively parallel processor, for real-time skin-tone detection.

In some applications single chip smart camera can bring distinct advantages. For example, Shigematsu

et al. argue that, compared with conventional multi-chip fingerprint readers, a single-chip smart camera

based fingerprint reader can have advantages of being much smaller, allowing much simplified

integration into mobile devices such as mobile phone, being low in cost, and having improved security

[1]. The main disadvantage of the single-chip smart camera lies in the cost of chip design and

manufacturing, unless a large volume of units can be produced to justify the initial capital investment.

Nevertheless, a single-chip smart camera is a smart sensor that has potential to make vision systems

pervasive, especially when connected to wireless sensor networks.

3.2.4 Embedded System based Smart Cameras This category of smart cameras most often consists of a camera (usually a general purpose one) and an

external embedded processing unit connected to it. For example, an embedded system based smart

camera could be a general-purpose camera connected to a high performance video processing board,

which itself is connected to a PC, either through a PCI slot or through a RS232 port. This kind of

configuration is not too different from a PC-based system. Many 2G digital CCTV systems with some

intelligent features belong to this category.

The necessity of having a dedicated and embedded processing unit in this type of smart cameras is due

to the fact that PC, while flexible and versatile, is far from being adequate to perform intensive image and

video processing and pattern recognition tasks, particularly when high-resolution, high frame rate and low

latency processing is required. Another advantage of this kind of system is that once proof-of-concept is

achieved and end-users are identified, it is easier for the system to be converted to a stand-alone smart

camera if required.

Smart cameras used in robotic and automobile applications can also be classified into this category.

These cameras may share computing resources such as a processor and memory with other embedded

devices in the robot and in the vehicle.

3.2.5 PC and Network based Smart Cameras PC-based smart camera systems are probably most popular within the academic research environment,

as a first step to conducting computer vision and pattern recognition research, and building first prototype

for proof-of-concepts. It is a very simple and inexpensive configuration, as the prices for general purpose

video cameras and PCs continue to fall. Most often, a general purpose camera is connected to a PC

through either a frame grabber or a communication port such as USB, Firewire, CameraLink, or Ethernet.

This type of system relies on the PC’s CPU to perform image analysis, feature extraction and pattern

recognition tasks. The availability of various vision processing libraries for PC platforms makes this kind

of system very popular. PCs also provide a more flexible environment for building user interfaces.

USB cameras, Firewire cameras and network cameras allow digital images to be transferred directly

from camera to a PC or an embedded processing hardware, avoiding signal integrity loss caused by DAC

(digital to analog conversion) inside many CCTV cameras and ADC by frame grabbers. For high-

resolution cameras, Firewire cameras are starting to become popular and affordable, but CameraLink

remains dominant, especially for high bandwidth and high performance applications.

The 2G CCTV system is a network based video surveillance system (NVSS). An NVSS with built-in

intelligent surveillance features can be loosely considered as a network of virtual smart cameras. An

NVSS is composed of four main layers: a CCTV camera (sensor) layer, a network layer, a central

computer layer and a trained security personnel layer (Figure 4). As discussed in section 2.5.1, in most of

the currently deployed NVSSes, the ASIP tasks such as object tracking and identification and threat

detection are typically performed mostly by trained security personnel. However, human monitoring of

surveillance video is a very labor-intensive task. It is generally agreed that watching video feeds requires

a higher level of visual attention than most every day tasks. Specifically vigilance, the ability to hold

attention and to react to rarely occurring events, is extremely demanding and prone to error due to lapses

in attention. A recent study by the US National Institute of Justice found that, after only 20 minutes of

watching and evaluating monitor screens, the attention of most individuals will degenerate to well below

acceptable levels [16]. The next generation of video surveillance systems - intelligent video surveillance

systems (IVSS) – will try to solve these problems by providing automated video surveillance and crime

preemption abilities. The IVSS will seek a re-distribution of ASIP tasks among the four layers in the

NVSS system, notably shifting processing load from security personnel to central computers or DVR (in

short-term), and probably more importantly to the surveillance cameras – that is, the introduction of

(stand-alone) smart cameras to replace passive or dumb CCTV cameras (in mid- and long-term). The use

of smart cameras would greatly reduce the bandwidth problem caused by the increasing number of

cameras present in the system and enhance surveillance system performance, as sending raw pixels over

the network is less efficient than sending the results of intermediate analysis results. Smart cameras can

also help in decentralizing the overall surveillance system, which can lead to improved fault tolerance and

the realization of more surveillance tasks than with traditional cameras [17].

sensor layer network layer central computer(server) layer

security personallayer

Camera 1

Camera 2

Camera N

network layer

Figure 4: Four layers of a network based video surveillance system (NVSS).

3.3 Research in Smart Cameras as Embedded Systems

Video processing is notoriously hungry for computation horsepower, memory and other resources.

Smart cameras as embedded systems have to meet the insatiable demand of video processing on one

hand, and to meet the challenging demands of embedded systems, such as real-time, robustness,

reliability under real-world conditions, on the other hand. This has made smart cameras a leading-edge

application for embedded systems research [18]. Recently there has been a significant increase in research

in building smart cameras as embedded systems. The first IEEE workshop on Embedded Computer

Vision (ECV’05) was held in June 2005 [19]. The workshop addressed issues such as how to design

smart algorithms to efficiently utilize embedded hardware, how to meet real-time constraints in embedded

environment and verification methods for mission-critical embedded vision systems. In particular, the

workshop discussed the suitability of FPGA for embedded vision systems.

Apart from numerous research groups working on developing smart cameras for video surveillance,

there are a number of academic research groups in the world dedicated to research into building smart

cameras as embedded systems. One prominent group is the Embedded Systems Group in Princeton

University’s Department of Electrical Engineering [18]. This group has developed an embedded smart

camera system that can detect people and analyze their movement in real time. They are also working on

a VLSI (Very Large Scale Integration) smart camera. An interesting research activity involving the design

of stand-alone smart cameras is the SmartCam project at University of Technology Eindhoven [20]. This

project investigates multi-processor based smart camera system architectures and addresses the critical

issue of determining correct camera architectural parameters for a given application domain. Another

project bearing the same name is being undertaken by the University of Technology in Graz, Austria [17].

The project aims to develop distributed smart cameras for traffic surveillance applications. They also

investigate various issues involved in making smart cameras as embedded systems, such as resource-

aware dynamic task allocation systems to support real-time requirements.

Many industry research groups and companies are involved in smart camera research for machine

vision, especially in Germany, Japan and the US. There exist some very informative and useful journals

and web portals for the machine vision world, such as IEEE Transactions on Pattern Analysis and

Machine Intelligence, Advanced Imaging Magazine [21], Machine Vision Resources [22], Machine

Vision Online [23].

A search on USPTO (US Patent and Trademark Office) website can reveal many patents filed or issued

in relation to the concept and embodiment of smart cameras as embedded systems. For example, patent

#6 985 780 filed in Aug 2004 under the title of “Smart Camera” [24] made claims about a camera system

that includes an image sensor and a processing module at the imaging location that processes the capture

images prior to sending the results to a host computer. The processing module can perform tasks such as

image feature extraction and filtering, convolution and deconvolution methods, correction of parallax and

perspective image error and image compression.

4 Review of ASIP Algorithms for Smart Cameras and State-of-the-Art Systems

If cameras are extensions of human eyes, the smart cameras are pushing the boundary of possibilities to

become extensions of human brain as well. What makes a camera smart is the intelligent and application

specific information processing (ASIP) algorithms that are built into the software architecture of the

camera systems. In this section we firstly explore some common characteristics of intelligent algorithms

for smart cameras. We then review several categories of algorithms as applied to machine vision,

surveillance and other prominent applications, and some state-of-the-art smart camera systems in use in

these applications areas.

4.1 Common Characteristics of Algorithms for Smart Cameras The primary function of a smart camera is to conduct autonomous analysis of the content of an image

or video and achieve a high-level understanding of what is happening in the scene. One of the most

commonly adopted approaches is image processing-based pattern recognition, which is a branch of

artificial intelligence. Pattern recognition assumes that the image may contain one or more objects and

that each object belongs to one of several predetermined types or classes. Given a digitized image

containing several objects, the pattern recognition process consists of three main phases, each including

several processing tasks:

• Signal level processing – image enhancement, image segmentation;

• Feature level processing – feature extraction, feature measurements and tracking; and

• Object level processing – object classification and estimation.

This is illustrated in figure 5. Also shown in figure 5 is a semantic-level processing component, which is

central to the output or action side of smart cameras. The main tasks at this level include possible joint

analysis of inputs from additional cameras, other sensory and database inputs, data fusion, event

description, control signal generation. It should be noted that some tasks at different levels or phases may

intersect each other during processing.

featureextraction and

tracking

person, behaviorand event

description

signal level feature level object level

+

other camera and/orsensory and/ordatabase inputs

videocapture

objectclassification and

estimationcontrol signal

generation

imageenhancement and

segmentation

semantic level

Figure 5: General processing flow of algorithms for pattern recognition and smart cameras.

Image segmentation at signal level is essential to all subsequent processing tasks, aiming at dividing an

image into distinct parts, each having a common characteristic. Image segmentation can be based on

color, texture, shape and motion. Feature extraction is crucial to pattern recognition. This is where the

segmented regions or objects are measured. A measurement is the value of some quantitative property of

an object. A feature is a function of one or more measurements, computed so that it quantifies some

significant characteristic of the object. This drastically reduced amount of information (compared to the

original image) represents all the knowledge upon which the subsequent classification decision must be

based. Object classification outputs a decision regarding the class to which each object belongs. Each

object is recognized as being of one particular type, and the recognition is implemented as a classification

process [25].

For simple applications, not all these levels and tasks are required to be implemented. For example, the

camera in an optical mouse only performs signal- and feature-level processing tasks. On the other hand,

for a particular processing task, different applications can have quite different requirements on the

camera’s performance, robustness and reliability. For example, the requirements for robustness of

processing tasks at all levels are much higher for video surveillance monitoring human movement and

behaviors than for industry machine vision cameras performing parts inspection or sorting.

Tasks at signal- and feature-levels are usually data-intensive and are well suited for hardware-based

implementation to meet speed demands. Tasks at the object level can be math-intensive and may need

high performance processor(s) to complete. Stand-alone smart cameras built on a multi-processor

architecture would have one processor, such as a DSP or an FPGA, to perform tasks at signal- and

feature-levels, and have a high performance DSP or RISC microprocessor to perform statistical object

classification.

When designing smart cameras as embedded systems for demanding applications such as surveillance

and automobiles, there are several important and challenging issues that need to be addressed, such as the

development of low-complexity, low-cost algorithms suitable for hardware implementation, and software

and hardware co-design, in order to map algorithmic requirements to hardware resources. These issues

will be further discussed in section 5.1.

4.2 Application: Intelligent Video Surveillance Systems (IVSS)

4.2.1 Current Research in Algorithms for IVSS

Video surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most

active research topics in computer vision and pattern recognition. The IEEE and IEE have organized

many workshops and conferences on intelligent visual surveillance in the last several years and have

published special journal issues that focus solely on visual surveillance or in human motion/behavior

analysis. Hu et al. [26] and Valera et al. [27] recently conducted excellent surveys on various algorithms

and techniques under research and development for video surveillance. They also reviewed some high

profile IVSS systems. Some comments in this section are derived from their papers.

For video surveillance, image segmentation most often starts with motion detection, which aims at

segmenting regions corresponding to moving objects from the rest of an image. Background modeling is

indispensable to motion detection. 3-D models can provide more realistic background descriptions but are

more costly. 2-D models have more applications currently due to their simplicity. However, all modeling

techniques need to find ways to reduce the effect of unfavorable factors such as illumination variation,

moving shadows and so on. Promising techniques for motion segmentation include simple background

subtraction, temporal differencing, and more complex optical flow methods. Skin-color based

segmentation can be very useful when human objects are close enough to the camera and lighting is

consistent. Once segmentation has provided isolated objects, feature extraction and measurements can be

performed on each object. Simple algorithms for feature extraction include image moments, which can

provide geometrical features of the objects. For gesture and behavior recognition, promising algorithms

for feature extraction include MEF (Most Expressive Features), extracted by Karhunen-Loeve projection,

and MDF (Most Discriminative Features), extracted by multivariate discriminate analysis [28]. Since

sometimes it is not easy to specify features explicitly, in some applications when the image size is small

enough, the whole image or transformed image is taken as the feature vector. Examples of algorithms for

object classification are shape-based classification and motion-based classification. After motion

detection and object classification, video surveillance systems generally track moving objects from one

frame to another. Promising algorithms for object tracking can be classified into four categories: region-

based tracking, active contour based tracking, feature based tracking, and model based tracking. Particle

filters have recently become a major way of tracking moving objects.

Human behavior understanding and personal identification are among the most challenging tasks facing

IVSS systems for high-end security applications. Behavior understanding involves the analysis and

recognition of motion patterns, and the production of high-level description of actions and interactions.

Promising approaches and algorithms for behavior understanding include dynamic time warping, finite

state-machine, HMMs (Hidden Markov Models), time-delay neural networks. Personal identification is of

increasing importance for many security applications. The human face and gait are now regarded as the

main biometric features that can be used for personal identification in video surveillance systems. While

face recognition research and development has made a lot of progress in recent years, current research on

gait recognition is still in its infancy.

4.2.2 State-of-the-Art IVSSes

A number of high-profile IVSSes have been reported in recent years. These systems, some deployed in

real-world applications, applied various pattern recognition techniques described in previous sections and

provided features such as people tracking, behavior recognition, detection of unattended objects and so

on. Examples are the real-time visual surveillance system W4 [29], the Pfinder system developed by

Wren et al. [30], the single-person tracking system, TI, developed by Olsen et al. [31], and a system at

CMU (Carnegie Mellon University) [32] that can monitor activities over a large area using multiple

cameras connected by a network.

A few IVSSes based on the use of stand-alone smart cameras have also been reported. The V2 system

developed by Christensen and Alblas [33] is a surveillance system that avoids the disadvantages of the

centralized computer server, and moves many of the processing tasks directly to the camera, making the

system a group of smart cameras connected across the network. The event detection and storage of event

video can be performed autonomously by the camera. Thus, normally, it is only necessary to

communicate with a central point when significant events occur. The VSAM project described by Collins

[34, 35] is a multi-camera surveillance system composed of a network of ‘smart’ sensors that are

independent and autonomous vision modules. These vision sensors are capable of detecting and tracking

objects, classifying the moving objects into semantic categories such as ‘human’ or ‘vehicle’ and

identifying simple human movements such as walking. Desurmont et al. [36] developed a smart network

camera system with three smart cameras to perform people tracking and counting in shopping malls.

Their system uses web services standards and XML-based meta data to implement inter-camera and

camera-to-host coordination. Fleck et al. [37] designed a smart camera that contains an FPGA and a

PowerPC processor to perform face tracking and people tracking, using particle filters on HSV (Hue,

Saturation, Value) color distributions. The camera outputs the approximated PDF (probability distribution

function) of the target state to a host computer.

4.3 Application: Industry Machine Vision

While advanced algorithms for smart cameras for surveillance applications are mostly still in their

research and development stage, due to high complexity and high-level of robustness requirement for

real-world applications, smart cameras for industry machine vision have long established their places in

the market as mature players. Most machine vision cameras are stand-alone and autonomous smart

cameras, where communications with PC or other central control unit is only needed for camera

configuration, firmware upgrading or in some cases output data collection. Most algorithms implemented

in these cameras follow the similar processing flow described in figure 5. One important reason for the

relative maturity of machine vision smart cameras, compared with smart cameras for surveillance, is that

the application requirements for machine vision cameras are much less restrictive compared with those

for surveillance cameras. In other words, many pattern recognition algorithms or techniques have a much

better chance of performing with satisfactory robustness and reliability for machine vision than for

surveillance applications. This is because machine vision cameras mainly deal with conditions such as:

• indoor use, thus good and consistent lighting conditions can be more easily guaranteed;

• minimum problems of occlusion;

• static and known background, thus unusual feature detection is simpler;

• limited object patterns to be recognized; and

• no human movement tracking and recognition is necessary.

There are many proven software packages on the market that can be customized or directly

implemented for programmable machine vision cameras. Most of these packages are for special industry

sectors, but some are general purpose packages, including a few powerful up-market libraries such as

Halcon library [38]. The Halcon library provides algorithms that include shape-based matching to find

objects based on ROI (region of interest) modeling, blob analysis, metrology (both 1D and 3D), edge

detection, edge and line extraction, contour processing, template matching, and color processing.

Thanks to the advancements in embedded system technologies and improved affordability of

processing power, there is a migration of the functionality of what were once only PC-based systems

down to the smart camera level. Artificial intelligence is one of these functionalities. Pulnix America’s

ZiCAM camera, for example, makes use of a hardware neural network to eliminate the need for

programming to execute image-understanding algorithms [39]. It can learn what is required for a machine

vision application, and once taught, operates as a stand-alone smart camera. Wintriss Engineering

manufactured a smart camera which sports a microprocessor, DSP and multiple FPGAs with up to

130,000 gates [40]. The company offers both area- and line-scan versions of their smart cameras, with

line scan version being able to perform imaging-related processes on 5 150 pixel lines at 40 MHz. One

such camera uses an FPGA to perform image sensor control and pixel correction, and the combination of

the compute power in the camera head to run real-time digital filters, lighting correction, streak correction

and input/output capability. Ultimately geometric and photometric manifested flaws are discriminated

based on connectivity analysis, all performed within the camera.

4.4 Application: Intelligent Transport Systems and Automobiles

4.4.1 ITS Applications There is growing awareness and interest in using smart cameras in Intelligent Transport Systems (ITS)

and automobile industries. IEEE organized very recently an international workshop in June 2005 on

Machine Vision for Intelligent Vehicles [41]. Generally speaking, the application and algorithmic

requirements for ITS are quite similar to those of IVSS. These requirements can be quite different for

automobile applications, however, where high-speed imaging and processing are often needed, imposing

higher level of demand on both hardware and software. Increased robustness is also required for car-

mounted cameras to deal with varying weather conditions, speeds, road conditions, car vibrations. CMOS

image sensors can overcome problems like large intensity contrasts due to weather conditions or road

lights and further blooming, which is an inherent weakness of existing CCD image sensors [42].

There have been a number of successful applications of smart camera systems for ITS reported in the

literature. The VIEWS system at the University of Reading [43] is a 3D model-based vehicle tracking

system. Kumar et al. [44] described a real-time rule-based behavior-recognition system for traffic videos.

This system will be useful for better traffic rule enforcement by detecting and signaling improper

behaviors, which is capable of detecting potential accident situations and is designed for existing camera

setups on road networks. Beymer et al. [45] presented a smart camera-based monitoring system for

measuring traffic parameters. The aim of the system is to capture video from cameras that are placed on

poles or other structures looking down at traffic. Once the video is captured, digitized and processed by

onsite smart camera, it is transmitted in summary form to a transportation management centre for

computing multi-site statistics like travel times. Bramberger et al. [42] described an embedded smart

camera for stationary vehicle detection. They discussed the mapping of high-level algorithms to

embedded system components. Dimitropoulos et al. [46] described a network of smart cameras deployed

at the airport to detect and track aircrafts; each camera can autonomously detect aircraft traffic in multiple

locations within its field of view. A camera data fusion module performs data fusion from multiple

cameras to determine the location and size of the aircraft. Other applications for smart cameras for ITS

include vehicle behavior in parking lots, vision based vehicle speed measurement, red-light intrusion at

traffic lights, vehicle number plate recognition. Some authors have expressed the need to integrate smart

traffic surveillance systems with existing traffic control systems to develop the next generation of

advanced traffic control and management system [47].

4.4.2 Automobile Applications Intelligent vehicles will form an integral aspect of the next generation technology of ITS. Smart

camera-powered intelligent vehicles will have the comprehensive capability of monitoring the vehicle

environment including the driver’s state and attention inside of the vehicle as well as detecting roads and

obstacles outside the vehicle, so as to provide assistance to drivers and avoid accidents in emergencies.

However, building and integrating smart cameras into vehicles is not an easy task: On one hand the

algorithms require considerable computing power to work reliably in real-time and under a wide range of

lighting conditions. On the other hand, the cost must be kept low, the package size must be small and the

power consumption must be low [48]. Applications of smart cameras in intelligent vehicles include lane

departure detection, cruise control, parking assistance, blind-spot warning, driver fatigue detection,

occupant classification and identification, obstacle and pedestrian detection, intersection-collision

warning, overtaking vehicle detection. Below are a few examples.

Stein [49] described a single smart camera-based adaptive cruise control system for intelligent vehicles.

In a paper on obstacle detection using stereo vision, Ruichek [68] focused on a multilevel- and neural-

network-based stereo-matching method for real-time road obstacle detection with linear cameras for use

in vehicles. Xu et al. [50] addressed the problem of pedestrian detection and tracking with night vision

using a single infrared video camera installed on the vehicle. The EyeQ is a single chip smart camera

processor developed by Mobileye [51]. It has been fabricated using 0.18µm CMOS technology, operating

at 120 MHz. It integrates two 32 bit RISC ARM946E CPUs, four Vision Computing Engines, a multi-

channel DMA (Direct Memory Access) and several peripherals and is designed for computationally

intensive applications for real-time visual recognition and scene interpretation for use in intelligent

vehicle systems.

4.5 Other Application Areas Other important applications for smart cameras include HCI, medical imaging, robotics, games and

toys. Optical mice are widely used. Smart cameras performing gesture recognition will play important

role in the development of multimodal user interfaces. Bonato et al. [52] presented an FPGA-based smart

vision system for mobile robots capable of performing real-time human gesture recognition. The RVT

system developed by Leeser et al. [53] and based on FPGA processing allows surgeons to see live retinal

images with vasculature highlighted in real time during surgery.

5 Smart Camera Design Considerations and Future Directions

In this final section we discuss design considerations for smart cameras as embedded systems, identify

several key issues that need to be addressed by the design and research community, and speculate on the

future directions of smart camera research and development.

5.1 Design Considerations

5.1.1 Design and Development Process Figure 6 shows a typical design and development process for smart cameras as embedded systems

(excluding single-chip smart cameras). A shown in figure 6, the process can be iterative, especially if the

initial application specification was not complete from the end user point of view.

ApplicationRequirementsSpecifications

SystemArchitecture

Design

Proof of Concept- Algorithm and

Hardware

AlgorithmConversion

Embedded SystemIntegration and

Debugging

Field Test -Evaluation

RequirementsMet?

Engineering Prototyping / Manufacturing

ProjectDefinition

No

Yes

Figure 6: Design and development process for smart cameras as embedded systems.

The system architecture design stage will decide on software and hardware architectures, based on

performance, deadline and cost criteria. Algorithmic design and timing design suitable to the targeted

hardware platform also needs to be defined. The mapping between algorithm requirements and hardware

resources is an important issue. The proof-of-concept stage may use a PC platform for research and

algorithm development. Usually a COTS (Commercial Off-The-Shelf) general purpose camera is used at

this stage. Hardware components need to be acquired, integrated and tested. However, this is not needed

if, during the architecture design stage, a third party camera development platform or hardware

accelerator unit for video processing is identified to be an appropriate solution to hardware platform (see

section 5.1.6 for examples of smart camera development platforms). The algorithm conversion stage

includes tasks such as converting floating-point arithmetic to fixed-point arithmetic, low power and low

complexity version consideration, implementation using HDL (Hardware Description Language). The

Embedded System Integration stage will result in a prototype smart camera using an embedded hardware

platform running embedded versions of algorithms.

5.1.2 System Architecture and Design Methodology System architecture design will surely depend on application requirements, which can be very simple

(e.g. an optical mouse) but can be very complex (e.g. face recognition). System architecture design has to

consider many factors such as the hardware platform, cost, time to market, flexibility, and so on.

Generally speaking, a heterogeneous, multiple-processor architecture can be ideal for smart camera

development. For example, such an architecture may consist of an FPGA or a DSP as a data processor to

tackle image segmentation and feature extraction, and a high-performance DSP or media processor to

tackle math-intensive tasks such as statistical pattern classification. This kind of system can allow better

exploitation of pipelining and parallel processing, which are essential to achieve high frame rates and low

latency. Some authors have reported work on the impact of hardware system architecture on the level of

implementable pipelining and parallel processing for smart cameras [54, 55]. Some initial work has been

reported on design methodology for embedded vision systems [56, 57].

5.1.3 Embedded Processors There are generally four main families of embedded processors that can be used for smart cameras:

Microcontrollers, ASICs (Application Specific Integration Circuits), DSPs (Digital Signal Processors)

and PLDs (Programmable Logic Devices) such as the FPGA. Microcontrollers are cheap but have limited

processing power and are generally not suited for building demanding smart cameras. ASICs are powerful

and power-efficient processors, but the design cost and risk are high and they are viable solutions only

when volume is high and time-to-market is well-timed. DSPs are relatively cheap and powerful in

performing image and video processing, but for demanding applications usually more than one DSP

would be needed. DSP-based solutions can be cost-effective for medium-volume production. Recently a

new class of DSP processors, called media processors, has come into the vision market. Media processors

try to provide a good trade-off between flexibility and cost-effectiveness. They typically have a high-end

DSP core employing SIMD (Single Instruction Multiple Data) and VLSI architectures, married on-chip

with some typical multimedia peripherals such as video ports, networking support, and other fast data

ports [58]. Examples of media processors are Philip’s TriMedia, TI’s DM64x, ADI’s (Analog Devices,

Inc) Blackfin.

The FPGA has recently emerged as a very good hardware platform candidate for embedded vision

systems such as smart cameras. One of the most important advantages of the FPGA is the ability to

exploit the inherently parallel nature of many vision algorithms. FPGAs used to be mainly employed as

glue logic between processors and peripherals, but the introduction of on-chip hardware multipliers and

dual-port memory has made FPGAs excellent options for DSP applications. The integration of

microprocessors into FPGA chips (such as Xilinx’ Virtex-II Pro and Virtex-4 chips) made them true

system-on-a-chip solutions. These features, together with the continuous improvements in cost and

maturity of design tools, have made FPGAs very competitive against DSPs and media processors for

many types of embedded vision system designs. In fact, an increasing number of publications on smart

cameras as embedded systems have employed FPGAs as the sole processor or as a data-intensive

processor before a DSP or a media processor, in a powerful heterogeneous multi-processor architecture

[59]. Sen et al. [56] has recently proposed a design methodology for effectively and efficiently

implementing computer vision algorithms on FPGA to build smart cameras. A study to compare the

relative performance of running various image processing routines on DSP, PowerPC, Intel Pentium 4

and FPGA was published on Alacron’s web site [60], in which the FPGA solution was found to produce a

distinct advantage. However, a more standardized performance evaluation mechanism to help processor

selection is much needed.

How should one choose between DSPs, media processors, ASICs and FPGAs? Kisacanin [58] proposed

a practical way to help processor selection based on intended production volume, cost and development

flexibility. He argued that ASICs may be suitable for high volume of over 1 000 000units, DSPs or media

processors for medium volumes between 10 000 and 100 000 units, while for low volumes of under 10

000, FPGAs can be a good viable candidate.

5.1.4 Algorithms Development and Conversion Algorithm development for embedded systems is quite different from that for PC-based platforms.

Basically it can be a lot more demanding and challenging, especially if FPGA or ASIC processors are

targeted. Usually when designing applications for ASIC or FPGA, one has to understand chip architecture

so that algorithms can be executed efficiently and effectively. Nowadays behavior synthesizers or

algorithmic synthesizers do exist to help designers to forget about the device architecture and focus on

functionality, but they come at the cost of efficiency in terms of chip area or gate counts and power

consumption. Therefore, it is always important to gain an intimate knowledge of the device architecture

of whichever of the ASIC, FPGA or DSP is targeted. This intimate knowledge can also help design

parallel processing and pipelining processing, which can be a very important and effective video

processing technique. Converting floating-point arithmetics to fixed-point and eliminating divisions as

much as possible (by using hardware multipliers and look-up tables, for example) are other design

considerations for algorithm conversion.

5.1.5 Other Factors Memory System - Smart cameras need flexible memory models to meet requirements such as scalable

frame buffers to cope with increasing image sensor resolutions. As the smart camera may integrate

different types of processors, the memory system should support potentially complex processing pipeline

and parallelism in order to meet the application’s real-time requirements. For single chip smart cameras,

care needs to be taken at design stage to conserve memory [54].

Communication Protocols - There are currently too many data output protocols for cameras, such as

Firewire, CameraLink, GigE, USB. Firewire is maturing but CameraLink remains the bandwidth leader

and very popular with the machine vision users. Unfortunately, the variety of digital interfaces increases

the confusion in the market and put pressure on the camera vendors to support multiple versions of

cameras with different interfaces.

5.1.6 Smart Camera Development Platforms

There have been a number of commercially available programmable smart camera platforms for

developers to design and prototype smart cameras for applications such as machine vision, biometrics,

HCI and surveillance. Philips has introduced the INCA (INtelligent CAmera) series of programmable

cameras [61] which integrate CMOS image sensors of various resolutions and a highly flexible duel-core

processing unit which includes a Xetal processor for computation intensive signal processing such as

feature extraction, together with a high performance TriMedia DSP core for math-intensive processing

tasks such as pattern recognition. The camera comes with an application development kit allowing for fast

prototyping. One application has been designed for face recognition [62], in which the Xetal is used for

face detection and TriMedia for face recognition. Sony has recently released a smart camera development

system XCI-SX1 that integrates an SXGA CCD image sensor (15 frames per second, 34fps at 640x480

resolution) and an AMD GeodeGX533 400Mhz processor running MontaVista Linux operating system

[63]. The camera platform is designed to provide OEMs, systems integrators and vision tool

manufacturers a rugged, robust component, combining the imager, intelligence and interface in a single

plug-in module that is simple to set up and easy to integrate. The IQeye3 IP camera from IQinvision Inc,

powered by a 250 MIPS PowerPC CPU, is a platform for smart IP network camera development [64].

Some signal processing tool development companies provide multi-processor development systems that

can serve as excellent development platforms for smart cameras. For example, Hunt Engineering [65]

provides a development platform HERON based on a Xilinx FPGA and a TI (Texas Instruments) DSP.

They also provide expansion capabilities to integrate video capture, IPs, more DSPs and/or FPGAs for

creating scalable smart camera architectures. Lyrtech also provides similar development systems in its

SignalMaster series of products [66]. These systems generally provide flexible communication ports and

drivers.

5.2 Key Issues or Challenges

System Design – The proprietary nature of smart cameras can limit choices of hardware, like imagers,

I/O, lighting, lens and the communications format. This may lead to a lack of expandability and flexibility

of PC-based systems. On the other hand, smart cameras don’t have as many software applications and

libraries as already exist for PC/frame grabber-based systems. In terms of design methodology, the easy

integration of intellectual property in the design tool and flow can help foster product differentiation.

Other important system-level issues include smart camera operating systems, development tools.

CMOS Image Sensors – Dynamic range is still one of the key aspects where CMOS image sensors lag

behind CCD. Improvement in this area can lead to more low-cost smart cameras using CMOS image

sensors for machine vision and surveillance applications.

Algorithm Development – Many intelligent pattern recognition algorithms work well in laboratory

conditions but fail when deployed and implemented in real-world conditions (occlusion, lighting

condition changes, unfavourable weather conditions), and embedded system environments (scant

resources, low power, low cost). Robustness and low complexity are among key issues facing researchers

developing algorithms for smart cameras in surveillance, ITS and automobile applications.

Performance Evaluation - This is a very significant challenge in smart surveillance systems. Evaluating

the performance of video analysis systems requires significant amounts of annotated data. Typically,

annotation is a very expensive and tedious process. Additionally, there can be significant errors in

annotation. All of these issues make performance evaluation a significant challenge [16].

Standards Development – There is need for the development of some smart camera standards. In fact,

the European Machine Vision Association (EMVA, [67]) has recently launched an initiative (EMVA

1288 Standard) to define a unified method to measure, compute and present specification parameters for

smart cameras and image sensors used for machine vision applications. More needs to be done in this

respect.

Single Chip Smart Cameras – Single-chip smart cameras are an attractive concept, but the

manufacturing cost for the single-chip smart cameras can be high because the feature size for making

digital processors and memory is often different from the one used to make image sensors, which may

require relatively large pixels to efficiently collect light. Therefore, for applications where physical space

and power consumption is not extremely restrictive, it probably still makes sense to design the smart

camera in a multi-chip approach with a separate image sensor chip. Separating the sensor and the

processor also makes sense at the architectural level, given the well-understood and simple interface

between the sensor and the computation engine [54].

5.3 Future Directions

The demand for smart cameras will steadily increase in traditional industries such as surveillance and

industry machine vision, and may also come from new industry and market segments such as healthcare,

entertainment, education and so on. Research interest, economic and social factors will drive continuous

technological and product development. Based on the discussions above, we can discern the following

future directions for smart camera system and technologies.

• At the system design level, continuous effort will be made in the development of a research

strategy or design methodology for smart cameras as embedded systems. Same for the

development of libraries and tools that facilitate algorithm implementation in DSPs and FPGAs.

Research on the general and ‘optimal” architectures for smart cameras and on real-time

operating systems for smart cameras will be undertaken, and the issue of too many digital

interfaces (Firewire, CameraLink, etc) for cameras will be addressed.

• At the ASIP algorithm development level, in order to improve performance and robustness of

existing techniques, research should address issues such as occlusion handling, fusion of 2D and

3D tracking, anomaly detection and behavior prediction, combination of video surveillance and

biometrical personal identification, multi-sensory data fusion [26].

• Multi-modal, multi-sensory augmented video surveillance systems have the potential to provide

improved performance and robustness. Such systems should be adaptable enough to adjust

automatically and cope with changes in the environment like lighting, scene geometry or scene

activity.

• Work on distributed (or networked) IVSS should not be limited to the territory of computer

vision laboratories, but should involve telecommunication companies and network service

providers, and should take into account system engineering issues.

• In the machine vision arena, smart cameras will offer more and more functionality. The trend of

distributing machine vision across the entire production line at points before value is added will

continue. Neural network techniques seem to have become a key paradigm in machine vision

that are used either to correctly segment an image in a wide variety of operational conditions or

to classify the detected object. Stereo and 3D-vision applications are also increasingly

widespread. Another trend is to utilize machine vision in the non-visible spectrum.

• New product developments will introduce smart camera-based digital imaging systems into

existing consumer and industry products, to increase their value and create new products.

• Standards development. One area which may need standardization is the metadata format that

facilitates integration and communication between different cameras, sensors and modules in a

distributed and augmented video surveillance system. New communication protocols may be

needed for better communication between different smart camera products.

Acknowledgements

The authors would like to thank Dr. Xing Zhang from ST Microelectronics and Dr. Julien Epps of

National ICT Australia for their many valuable comments and corrections of parts of this paper.

References

[1] S. Shigematsu, H. Morimura: A Single-Chip Fingerprint Sensor and Identifier. IEEE Journal of Solid-State

Circuits, Vol. 34, No. 12, December 1999. pp.1852-1859.

[2] M. LaPedus: CMOS Image Sensors Market Consolidates.

http://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846.

[3] Intel Open Source Computer Vision Library. http://www.intel.com/technology/computing/opencv/index.htm.

[4] Chicago Pairing Surveillance Cameras with Gunshot Recognition Systems.

http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-

Gunshot-Recognition-Systems/4628SIW427.

[5] Marketresearch.com: Global digital video surveillance markets.

http://www.marketresearch.com/product/display.asp?productid=1032291&xs=r.

[6] Frost & Sullivan: Video Surveillance Software Emerges as Key Weapon in Fight Against Terrorism.

http://www.prnewswire.co.uk/cgi/news/release?id=151696.

[7] Smart Products Can See The Future. http://www.imsresearch.com/members/pr.asp?X=103.

[8] Smart Cameras Drive Machine Vision Growth. Advanced Imaging Journal. October 2005. page 8.

[9] Machine Vision Online: JAI PULNiX Forms New “Smart Camera” Business Unit.

http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990.

http://www.eet.com/news/semi/showArticle.jhtml?articleID=177102846

http://www.intel.com/technology/computing/opencv/index.htm

http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427

http://www.securityinfowatch.com/online/CCTV--and--Surveillance/Chicago-Pairing-Surveillance-Cameras-with-Gunshot-Recognition-Systems/4628SIW427

http://www.marketresearch.com/product/display.asp?productid=1032291&xs=r

http://www.prnewswire.co.uk/cgi/news/release?id=151696

http://www.imsresearch.com/members/pr.asp?X=103

http://www.machinevisiononline.org/public/articles/archivedetails.cfm?id=1990

[10] W. Hardin, Smart Cameras: The Last Step in Machine Vision Evolution?


[11] H. Broers, R. Kleihorst, M. Reuvers and B. Krose: Face Detection and Recognition On A Smart Camera.

Proceedings of ACIVS 2004, Brussels, Belgium, Aug.31- Sept.3, 2004.

[12] Pixim Digital Pixel System® Technology Backgrounder. http://www.pixim.com/html/tech_about.htm.

[13] L. Albani, P. Chiesa, D. Covi, G. Pedegani, A. Sartori, M. Vatteroni: VISoc : A Smart Camera SoC. Proceedings

of the 28th European Solid-State Circuits Conference, pp.367-370, Firenze, Italy, September 2002.

[14] T.W.J. Moorhead, T.D.Binnie: Smart CMOS Camera For Machine Vision Applications. Image Processing and

Its Applications, Conference Publication No.465. IEE 1999. pp.865-869.

[15] M.S. Lee, R. Kleihorst, A. Abbo, E. Cohen-Solal: Real-time Skin-tone Detection with A Single-chip Digital

Camera. Proc. of 2001 Int’l Conference on Image Processing. Volume 3, 7-10 Oct. 2001. Page(s):306 – 309.

[16] A. Hampapur, L. Brown, J. Connel, S. Pankanti, A. Senior, Y. Tian: Smart Surveillance: Applications,

Technologies and Implications. 4th IEEE Pacific-Rim Conference On Multimedia. 15-18 December 2003, Singapore.

[17] SmartCam - Design and Implementation of an Embedded Smart Camera:

http://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.html.

[18] W. Wolf, B. Ozer, T. Lu: Smart Cameras As Embedded Systems. IEEE Computer, 35(9):48–53, Sep 2002.

[19] The First IEEE Workshop on Embedded Computer Vision: http://www.scr.siemens.com/ecv05/.

[20] SmartCam: Devices for Embedded Intelligent Cameras. http://www.stw.nl/projecten/E/ees5411.html.

[21] Advanced Imaging. http://www.advancedimagingpro.com/.

[22] Machine Vision Resources. http://www.eeng.dcu.ie/~whelanp/resources/r_references.html.

[23] Machine Vision Online: http://www.machinevisiononline.org/.

[24] USPTO. http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-

adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(

smart+AND+camera).

[25] K. R. Castleman: Digital Image Processing. 1st edition, Prentice Hall, New Jersey, 1996.

[26] W. Hu, T. Tan, L. Wang and S. Maybank: A Survey on Visual Surveillance of Object Motion and Behaviors.

IEEE Transactions on Systems, Man and Cybernetics. Vol. 34, No. 3, August 2004. 334-352.

[27] M. Valera and S.A. Velastin: Intelligent distributed surveillance systems: A review. IEE Proc.-Vis. Image Signal

Process. Vol. 152, No. 204 2, April 2005. 192-204.

[28] Y. Wu, T.S. Huang: Vision-Based Gesture Recognition: A Review. Lecture Notes in Computer Science. Volume

1739, 1999. pp.103-114.


http://www.pixim.com/html/tech_about.htm

http://www.iti.tu-graz.ac.at/de/research/smartcam/smartcam.html

http://www.scr.siemens.com/ecv05/

http://www.stw.nl/projecten/E/ees5411.html

http://www.advancedimagingpro.com/

http://www.eeng.dcu.ie/~whelanp/resources/r_references.html

http://www.machinevisiononline.org/

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=(smart+AND+camera).TTL.&OS=ttl/(smart+and+camera)&RS=TTL/(smart+AND+camera)



[29] I. Haritaoglu, D. Harwood, and L. S. Davis: Real-time surveillance of people and their activities. IEEE Trans.

Pattern Anal. Machine Intell., vol. 22, pp. 809–830, Aug. 2000.

[30] C. R. Wren, A. Azarbayejani, T. Darrell, and A. P. Pentland: Pfinder: real-time tracking of the human body.

IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 780–785, July 1997.

[31] T. Olson and F. Brill: Moving object detection and event recognition algorithms for smart cameras. In Proc.

DARPA Image Understanding Workshop, 1997, pp. 159–175.

[32] A. J. Lipton, H. Fujiyoshi, and R. S. Patil: Moving target classification and tracking from real-time video. In

Proc. IEEE Workshop Applications of Computer Vision, 1998, pp. 8–14.

[33] M. Christensen, R. Alblas: V2- design issues in distributed video surveillance systems. Denmark, 2000, pp.1–86.

[34] R.T. Collins, A.J. Lipton, H. Fujiyoshi, and T. Kanade: Algorithms for cooperative multisensor surveillance,

Proc. IEEE, 89, (10), 2001, pp. 1456–1475.

[35] R. T. Collins, A. J. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, O.

Hasegawa, P. Burt, and L.Wixson: A system for video surveillance and monitoring. Carnegie Mellon Univ.,

Pittsburgh, PA, Tech. Rep., CMU-RI-TR-00-12, 2000.

[36] X. Desurmont, B. Lienard, J. Meessen,, J.F. Delaigle: Real-Time Optimization For Integrated Network Camera.

Proc. of SPIE - Real Time Imaging 2005, San Jose, CA, January 2005.

[37] S. Fleck, W. Strasser: Adaptive Probabilistic Tracking Embedded in a Smart Camera. Proc. Of the 2005

Computer Society Conference on CVPR.

[38] Halcon. http://www.mvtec.com/halcon/.

[39] Machine Vision Online: JAI PULNiX Forms New “Smart Camera” Business Unit.


[40] Are Smart Cameras Smart Enough?


[41] MVIV’05: http://www.scr.siemens.com/mviv05/.

[42] M. Bramberger, R. P. Pflugfelder, A. Maier, B. Rinner, B. Strobl, H. Schwabach: A Smart Camera For Traffic

Surveillance. Proceedings of the first Workshop on Intelligent Solutions in Embedded Systems (WISES). June 2003.

[43] T. N. Tan, G. D. Sullivan, and K. D. Baker: Model-based localization and recognition of road vehicles. Int. J.

Comput. Vis., vol. 29, no. 1, pp. 22–25, 1998.

[44] P. Kumar, S. Ranganath, H. Weimin, K. Sengupta: Framework for Real-Time Behavior Interpretation From

Traffic Video. IEEE Transaction on ITS, March 2005. Volume 6, No.1. pp. 43-54.

[45] D. Beymer, P. McLauchlan, B. Coifman, and J. Malik: A real-time computer vision system for measuring traffic

parameters. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. pp. 495–502.

http://www.mvtec.com/halcon/



http://www.scr.siemens.com/mviv05/

[46] K. Dimitropoulos, N. Grammalidis, D. Simitopoulos, N. Pavlidou, M. Strintzis: Aircraft Detection and Tracking

Using Intelligent Cameras. IEEE Int’l Conference on Image Processing. Vol 2, 11-14 Sept. 2005 Page(s):594 – 597.

[47] C. Nwagboso: User focused surveillance systems integration for intelligent transport systems. In Regazzoni,

C.S., Fabri, G., and Vernazza, G. (Eds.): ‘Advanced Video-based Surveillance Systems’ (Kluwer Academic

Publishers, Boston, 1998), Chapter 1.1, pp. 8–12.

[48] G. Stein: A Computer Vision System on a Chip: A case study from the automobile domain. First IEEE

Workshop on Embedded Computer Vision. June 2005.

[49] G. Stein, O. Mano and A. Shashua: Vision-based ACC with a Single Camera: Bounds on the Range and Range

Rate Accuracy, IEEE Intelligent Vehicles Symposium, June 2003, Columbus, OH.

[50] F. Xu, X. Liu and K. Fujimura: Pedestrian Detection and Tracking With Night Vision. IEEE Transaction on ITS,

March 2005, Vol.6 No.1. 63-71.

[51] EyeQ: System-on-a-chip. http://www.mobileye.com/eyeQ.shtml.

[52] V. Bonato, A.K. Sanches, M.M. Fernandes, J.M.P. Cardoso, E.D.V. Simoes, E. Marques: A Real Time Gesture

Recognition System for Mobile Robots. In International Conference on Informatics in Control, Automation, and

Robotics, August 25-28, Setúbal, Portugal, 2004, INSTICC, pp. 207-214.

[53] M. Leeser, S. Miller, H. Yu: Smart Camera Based on Reconfigurable Hardware Enables Diverse Real-time

Applications. Proc. of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[54] W. Wolf, B. Ozer, T, Lu: VLSI Systems for Embedded Video. Proc. IEEE Computer Society Annual

Symposium on VLSI, 2002.

[55] W.Wolf, T. Lv, B. Ozer: An Architecture Design Study for a High Speed Smart Camera. Proceedings of the 4th

Workshop on Media and Streaming Processors. Istanbul, Turkey, 2002.

[56] M. Sen, I. Corretjer, F. Haim, S. Saha, J. Schlessman, S. S. Bhattacharyya, W. Wolf: Computer Vision on

FPGAs: Design Methodology and Its Application To Gesture Recognition. Proc. of the 2005 IEEE CVPR.

[57] W. Caarls, P. Jonker, H. Corporaal: Benchmarks For SmartCam development. Proceedings of ACIVS 2003

(Advanced Concepts for Intelligent Vision Systems), Ghent, Belgium, September 2-5, 2003

[58] B. Kisacanin: Examples of Low-Level Computer Vision on Media Processors. Proceedings of the 2005 IEEE

Computer Society Conference on Computer Vision and Pattern Recognition.

[59] W.J. MacLean: An Evaluation of the Suitability of FPGAs for Embedded Vision Systems. Proceedings of the

2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60] The Future of High Performance Machine Vision. http://www.alacron.com/NEWS/FUTURE.HTM.

[61] Philips Industrial Vision Products. http://www.apptech.philips.com/industrialvision/products.htm.

http://www.mobileye.com/eyeQ.shtml

http://www.alacron.com/NEWS/FUTURE.HTM

http://www.apptech.philips.com/industrialvision/products.htm

[62] R. Kleihorst, M. Reuvers, B. Krose and H. Broers: A Smart Camera for Face Recognition. Proc. 2004

International Conference on Image Processing (ICIP’04). Pp. 2849-2852.

[63] Sony introduces first in smart camera series. http://news.sel.sony.com/pressrelease/5915.

[64] IQinvision Smart Camera Systems. IQeye300 Series. http://www.iqeye.com/iqeye300.html.

[65] http://www.hunteng.co.uk/.

[66] http://www.lyrtech.com/index.php/

[67] Standard for Measurement and Presentation of Specifications for Machine Vision Sensors and Cameras.

http://emva.org/home/content/blogcategory/135/164/.

[68] Y. Ruicheck: Multilevel- and Neural-network-Based Stereo-Matching Method for Real-Time Obstacle Detection

Using Linear Cameras. IEEE Transactions on ITS, March 2005, Vol.6 No.1. 54-62.

About the Authors – YU SHI is a Senior Researcher with National ICT Australia in Sydney, Australia. He was

granted his B.Eng in 1982 by the National University of Defense Technology in Changsha, Hunan, China. He later

obtained his M.Eng and PhD in signal processing and biomedical engineering in 1988 and 1992 respectively in

Toulouse, France. He also completed post-doctoral research at Oxford Brookes University in England in the late

1990s. His main research interests are in embedded vision systems, FPGA-based design and applications, multimodal

user interfaces and web services.

SERGE LICHMAN is a Senior Research Engineer with National ICT Australia in Sydney, Australia. He received

M.Eng in Electrical Engineering in 1988 by the Odessa State Polytechnic University in Ukraine. His 12 years of

experience in the area of image and signal processing for commercial software and hardware gave him practical skills

in full product development life cycles, from research to deployment. His work has led to several publications.

http://news.sel.sony.com/pressrelease/5915

http://www.iqeye.com/iqeye300.html

http://www.hunteng.co.uk/

http://www.lyrtech.com/index.php/

http://emva.org/home/content/blogcategory/135/164/

Smart Cameras a Review

Documents

Transcript of Smart Cameras a Review