Room service, AI-style Rutgers Universitypeople.csail.mit.edu/mhcoen/Papers/stopworrying.pdfto stop...

The future of human-computerinteraction, or how I learned tostop worrying and love myintelligent roomMichael H. Coen, MIT Artificial Intelli-gence Lab

Predicting the future is notoriously diffi-cult. Suppose 100 years ago someone sug-gested that every bedroom in the US wouldsoon have a bell that anyone in the worldcould ring anytime, day or night. Wouldyou have believed it? Nevertheless, the tele-phone caught on and has become a technol-ogy conspicuous only by its absence.

Now, I find myself in a similar position.Having spent the past four years immersedin the future as part of the MIT AI Lab’sIntelligent Room project, I have gained aninkling of what is to come in the next 50. Ihave taken to heart the advice of Alan Kay,“the best way to predict the future is toinvent it.” Of course, this is not a solo per-formance, and the cast of fellow prognosti-cators (researchers and inventors) whowork on similarly futuristic environmentssuch as the intelligent room has grownmarkedly.1 It is interesting to note that thiscast is equally divided among industrialand academic research labs. There are, Ithink, two reasons for this:

• intelligent rooms are an incredibly ex-citing testbed in which to do research(read the rest of this essay to find outwhy), and

• there is a lot of money to be made. In-telligent rooms promise to have theubiquity of, well, rooms and theupgradability of PCs—you do the math.

The starting—and I think quite uncon-troversial—premise for my research is thatcomputers are not particularly useful. Ofcourse, scientists and engineers adorethem, as do a growing cadre of Web surf-

8 IEEE INTELLIGENT SYSTEMS

Room service, AI-style

Can a room be intelligent? This month’s “Trends and Controversies” presents the thoughtsand work of four people who not only believe the answer is yes, but are working towards mak-ing this happen.

Michael Coen’s leadoff essay, “The future of human-computer interaction or how I learnedto stop worrying and love my intelligent room,” discusses insights gleaned from work at theIntelligent Room project at the MIT AI Lab. An intelligent room should base its actions on anintegration of sensing modalities, such as vision and speech, performing tasks (such as speakertracking) that would be more difficult to perform using any of the modalities in isolation. Inter-estingly, the lion’s share of the effort did not involve work on the technologies of the individualmodalities, but rather in understanding how to structure the task so that a range of sensing-modality subsystems can interoperate and be integrated into a seamless whole. Coen also pro-poses general principles for future builders of intelligent rooms coming out of his own ex-perience, such as “Do not scavenge”—individual components must be designed with an eye tohow they will ultimately be used as part of the overall whole, rather than being inherited as aby-product of someone else’s goals.

In contrast to the two intelligent rooms built in the MIT AI Lab’s quarters in an otherwise(comparatively) mundane office building in Cambridge, Massachusetts, Michael Mozer’s essaydiscusses his experiences making an entire house in Colorado behave intelligently. Whereasmuch of Coen’s efforts center on dealing with the integration of various sensing modalities,Mozer’s focus is instead on adaptability, building an intelligence into the various sensors (suchas thermometers) and effectors (such as a heating system) so that it can adapt to the preferencesof the house residents. How the intelligence can infer the preferences of its residents is one ofthe important questions that Mozer discusses, such as by learning from any time an inhabitantmanually adjusts the settings of lights or thermostats. As Mozer points out, although we maybelieve that our habits are far from predictable, for the vast majority of the time that we do noteven notice, we are very predictable—for example, if we are happy with the environment in thehouse at one point in time, we are usually happy with it five minutes later.

Mozer’s adaptive house began as an existing home, renovated to serve as a testbed for intelli-gent-room research. But what if you are building a brand new residence from scratch? Moreover,what if you are not constrained by the more typical budgetary limitations of the average Ameri-can home buyer? Richard Hasha’s essay discusses his experiences as chief systems architect forthe Gates Estate, Bill Gates’ well-publicized 66,000-square-foot new home in Medina, Washing-ton. Hasha’s focus is on the development of an underlying architecture that is able to scale up tothe demands of building intelligence into a home of this magnitude. To do this, Hasha identifiessome of the “plumbing-related work” that must be performed as part of any large-scale effort tobuild an intelligent home, and describes a distributed-object framework that supports it.

It is not only offices and homes that can be intelligent. James Flanagan’s final essay consid-ers some of the functionalities that we would want intelligent rooms with large numbers ofpeople—such as lecture halls and conference venues—to possess. For example, we would likean intelligent lecture hall to recognize automatically who in the audience is asking a question,point a video camera at that person, and position and use a microphone array to filter outsounds coming from elsewhere in the room. Flanagan breaks up the problem into two pieces—identifying the spatial location of the sound source, and then extracting the desired sound sig-nal emanating from that source out of the collection of sounds being received by the micro-phones. He points out how recent advances, such as in both microphone and digital-signal-processing technologies, are enabling the creation of such intelligent rooms, and the technicaladvances that have made this possible.

Twenty years ago, John McCarthy asked whether we could ascribe mental qualities to a ther-mostat (J. McCarthy, “Ascribing Mental Qualities to Machines,”Philosophical Perspectives inArtificial Intelligence, M. Ringle, ed., Harvester Press, Brighton, Sussex, UK, 1979). Perhapssomeday we will be able to ask them ourselves.

—Haym Hirsh

By Haym HirshRutgers University

[email protected]

T R E N D S & C O N T R O V E R S I E ST R E N D S & C O N T R O V E R S I E S

MARCH/APRIL 1999 9

ers,e-mailers,and online socialites. Andcertainly, computation is a fundamentalcomponent of our society’s technical,financial,and industrial infrastructures;machines are outstanding data-processingslaves. But in terms of raising our qualityof lif e, computers have a very far road totravel before becoming as essential as thelight bulb, indoor plumbing, or pillow.

The reason is obvious. Computers aregenerally used for things that are computa-tional,such as reading e-mail,and for mostof us,the majority of our lives are spentdoing noncomputational things,such astaking baths and eating dinner. Most peoplespend their time in the real world, not in themuch-ballyhooed realm of cyberspace, andas a description of the current utility ofcomputation, I propose the following ob-servation: the value of a computerde-creases with the square of your distancefrom its monitor. That being said, computa-tion will not become sociologically essen-tial until computers are connected to thehuman-level events going on in the realworld around them—until they are in-grained in and conversant with our ordi-

nary state of affairs. Borrowing once morefrom Kay, “the computer revolution hasn’thappened yet.”

In the Intelligent Room project,we areinterested in creating spaces in which compu-tation is seamlessly used to enhance ordinary,everyday activities. We want to incorporatecomputers into the real world by embeddingthem in regular environments,such as homesand offices,and allow people to interact withthem the way they do with other people. Theuser interfaces of these systems are notmenus,mice, and keyboards but instead ges-ture, speech,affect,context, and movement.Their applications are not word processorsand spreadsheets,but smart homes and per-sonal assistants. Instead of making computerinterfaces for people, it is of more fundamen-tal value to make people interfaces for com-puters. The need for doing this is not new; ithas simply become more urgent:

The great creators of technics [i.e., technol-ogy], among which you are one of the mostsuccessful,have put mankind into a perfectlynew situation, to which it has as yet not at alladapted itself.—Albert Einstein,in a tributeto Thomas Edison,21 Oct. 1929.

It is time for technology to start adaptingto us. This might sound trite but only be-cause it is so obviously true.

Sounds great, but how?We have built two intelligent rooms in

our laboratory, where our approach hasbeen to give the rooms cameras for eyesand microphones for ears to make accessi-ble the real-world phenomena occurringwithin them. A multitude of computer-vision and speech-understanding systemsthen help interpret human-level phenom-ena,such as what people are saying andwhere they are standing. By embeddinguser interfaces this way, the fact that peo-ple, for example, tend to point at what theyare speaking about is no longer meaning-less from a computational viewpoint,andwe can (and have) built systems that makeuse of this information.

Coupled with their natural interfaces isthe expectation that these systems are notonly highly interactive—they talk backwhen spoken to—but more importantly,that they are useful during ordinary activi-ties. They enable tasks historically outsidethe normal range of human-computer inter-action by connecting computers to phe-nomena (such as someone sneezing orwalking into a room) that have traditionallybeen outside the purview of contemporaryuser interfaces. Thus,in the future, you canimagine that elderly people’s homes wouldcall an ambulance if they saw anyone falldown. Similarly, you can also imaginekitchen cabinets that automatically lockwhen young children approach them.

The most important factor in makingintelligent rooms possible in recent yearshas been the novel viability of real-timecomputer vision and speech understanding.AI, and computer science more generally,have experienced something of a renais-sance in the past decade, with many re-search areas blossoming almost entirelydue to the sudden and unexpected avail-ability of inexpensive but powerful proces-sors. It is now entirely possible to haveliterally a dozen computer-vision systemsas components of a larger project,the sug-gestion of which would surely have raisedmore than a few skeptical eyebrows in thenot-too-distant past.

Computer vision and speech understand-ing are among the preeminent members of aclass of research problems known as beingAI-hard; namely, they are as difficult to

Intelligent RoomsFigure A shows two sample scenarios

with the Intelligent Room that run today,reflecting both our interests and those ofour funders.At play: (Our interests...)Me: I walk inside my office.Room:Turns on lights and says,“Goodevening, Michael,you have three mes-sages waiting.”Me: I sit down on the couch.Room:Displays messages on one pro-jected display. Shows a video on theother.Me: I lie down on the couch and lay stillfor a while.Room:“Mic hael,are you still awake?”Me: “No, wake me up at 8 am.”Room:Turns off video,dims the lights,closes the drapes. And puts Mozart onsoftly in the background.

At work: (Our funders’...)Me: I walk inside my office and say,“Computer, activate the command post.”Room:Shows interactive world map onone display and Web browser on theother. Closes the blinds and drapes. Says,“The command post is activated.”Me: Mark region on map with a laserpointer, “Zoom in here”Room:Zooms in on the map.Me: Point at country using laser pointer,“What is this country?”

Room:“You are pointing at Iraq” and dis-plays CIA World Fact Book informationabout Iraq in Web browserMe: “I need information; does Iraq haveballistic missiles?”Room:Displays Iraq’s current missile capa-bilities in the Web browser.

Figure A. Intelligent room scenarios: (1) our interests and(2) our funders’ interests.

(1)

(2)

solve as anything else we don’t yet knowhow to do. When we started working on ourlab’s first Intelligent Room,we expected thevision and natural-language subsystemswould require the overwhelming majority ofour intellectual effort. What was not initiallyobvious was the amount of forethought thatwould be required to integrate the room’smyriad subsystems and from them producea coherent whole. Building a computationalsystem—one that had literally dozens ofhardware and software components—thatallowed its subsystems to not only interoper-ate but leverage off one another eventuallyemerged as our project’s chief researchproblem.2 In this,we have not been alone;finding some way of managing similar sys-tems and moving data among their compo-nents was the foremost difficulty raised atthe Intelligent Environments Symposium.1

In some regards,our solution to thismanagement crisis has been a bit drastic:we created a new programming environ-ment,called Metaglue, to meet the room’sfairly unique computational needs,inwhich (at last count) the room’s 80 soft-ware components are distributed among adozen workstations.3 We also formulatedsome general principles for creating intelli-gent rooms,which we now adhere to withreligious fervor.4 These include

• Scenario-based development.We ini-tially spent a great deal of time design-ing overly complex sensory systems forthe room,without much thought regard-ing how we would eventually usethem—not a good idea. In the end,when we started thinking up roomdemos,that is,scenarios to show off ourwork, what we wanted to do our sensorsdidn’t support, and what our sensorssupported, we didn’t want to do. In par-ticular, the sensing needs our desireddemos required were actually simpler,albeit different,from what the complexcomputer-vision systems we had cre-ated provided. Much time could havebeen saved had we thought ahead.

• Do not scavenge. There is an enormoustemptation when designing an intelli-gent room to try and incorporate all yourfriends’thesis work into it. How can youresist adding the latest state-of-the-artsystem that does [your favorite idea]?However, there is a good chance sys-tems not intended to work together willrefuse to do so smoothly and almost

surely will be unable to take advantageof each other’s capabilities. You mustconsider how components will integrateinto the overall system when designingthe individual components.

• Systems should leverage off each other.Because nothing is perfect,particularlyin AI, throw everything you can at aproblem. In the Intelligent Room today,computer-vision systems communicatewith speech-recognition systems. Astrange mix,you might think,but wehave found that as you approach some-thing, such as a projected map, you aremore likely to talk about it. Thus,byvisually locating a person in the room,we can cue the speech-recognition sys-tem with information about what theuser is likely to say. In this way, we gethigher speech-recognition accuracy.This principle generalizes,and manysubsystems in the intelligent roomintercommunicate with one another forsimilar reasons.

Big Brother, post-1984?It is quite easy to trace almost all work in

the intelligent environments back to thevery influential Digital Desk project atXerox PARC in the late 80s and early 90s,which was among the first user interfacesto directly observe people manipulatingreal objects. Using a video camera,itwatched people reading real paper docu-ments on the surface of a real desk—high-light something with your finger, and thesystem proceeds to scan in the delineatedtext. Surprisingly, however, the very firstintelligent environment was proposed inthe late 18th century by well-known Britishphilosopher and would-be prison warden,Jeremy Bentham. Bentham designed thePantopticon,an Orwellian structure inwhich a hive of rooms (or cells) could bekept under the constant scrutiny of an un-seen Observer.5 Denizens of the Panopti-con,which Bentham proposed could beeither inmates,the insane, or students,would never be precisely sure when theywere being observed by the central Ob-server, namely the warden,doctor, or grad-uate advisor. Order would be maintainedexactly because observed infractions mightbe harshly punished at some unknown laterdate.

As pointed out by Bentham,and subse-quently elaborated upon by Michel Fou-cault,the observed confer enormous power

on the Observer, and this might seem themost serious objection to widespread intro-duction of intelligent environments.6 Thepotential for abuse is frightening and shouldcertainly give any technology enthusiast orfuturist pause. However, this need not be afatal objection,and I think it is premature toaddress at present. Given that we have noclear idea what types of new sensing tech-nologies will be developed and deployed inthe future, worrying about security now istherefore somewhat pointless. Whateverprivacy-guaranteeing techniques are devel-oped for today’s technology will almostsurely be irrelevant for tomorrow’s. Moreimportantly, fear of misuse is no reason notto push for something that has the potentialto so greatly revolutionize our lives.

In the end, it will come as no great sur-prise if the widespread acceptance of intelli-gent rooms,homes,cars,and so forth comesas much from clever marketing as fromclever security. If someone were to proposefilling your home with microphones thatanyone in the world could listen to anytime,day or night,you might very likely shudderin horror. Yet,your home is filled withmicrophones—every telephone has one. Butperhaps you object,“Phones can’t be abusedlike that! Other people can hear me onlywhen I let them!”Aha! It sounds as ifyou’ve already been socialized to accepttelephones; it is quite difficult to now viewthem as a realistic threat. I think it is quitereasonable to expect that someday your chil-dren will be similarly comfortable insidetheir intelligent homes.

References1. M. Coen,ed., Proc. 1998 AAAI Spring Symp.

Intelligent Environments,AAAI TR SS-98-02,AAAI Press,Menlo Park, Calif., 1998.

2. M. Coen,“Building Brains for Rooms:Designing Distributed Software Agents,”Proc. Ninth Innovative Applications of AIConf., AAAI Press,1997,pp. 971–988.

3. B. Phillips,Metaglue:A Programming Lan-guage for Multi-Agent Systems, M. Eng.thesis,Massachusetts Inst. of Tech.,Cam-bridge, Mass.,1999.

4. M. Coen,“Design Principles for IntelligentEnvironments,” Proc 15th Nat’ l Conf. Arti-ficial Intelligence, AAAI Press,1998,pp.547–554.

5. J. Bentham,The Panopticon Writings,Verso,London,1995.

6 . M. Foucault,“The Eye of Power,” Power/Knowledge, Pantheon Books,New York,1981.


MARCH/APRIL 1999 11

An intelligent environment mustbe adaptive Michael C. Mozer, University of Colorado

What will the home of the future looklike? One popular vision is that householddevices—appliances,entertainment cen-ters,phones,thermostats,lights—will beendowed with microprocessors that allowthe devices to communicate with oneanother and with the home’s inhabitants.The dishwasher can ask the water heaterwhether the water temperature is adequate;inhabitants can telephone home and re-motely instruct the VCR to record a fa-vorite show; the TV could select news sto-ries of special interest to the inhabitant; thestereo might lower its volume when thephone rings; and the clothes dryer mightmake an announcement over an intercomsystem when it has completed its cycle.

The cost of the hardware infrastructure isnot prohibitive if the devices are mass-produced and if communication is con-ducted over power lines or wireless chan-nels. Even if the cost is too high today, waita few years and the price will drop precipi-tously. Adopting a uniform communicationprotocol is also not an obstacle in principle.The real reason why this vision of homeautomation seems unlikely is that it re-quires a significant programming effort,and worse, the programming must be tai-lored to a particular home and family andmust be updated as the family’s lifestylechanges. Tackling the programming task isfar beyond the capabilities and interest oftypical home inhabitants. People are intim-idated by the chore of programming simpledevices such as VCRs and setback ther-mostats,never mind a much broader arrayof devices with far greater functionality.

Perhaps if you were Bill Gates,youmight hire a full-time team of engineers tocustomize your system and keep it up todate. Some commercially available systemsadopt this strategy on a smaller scale:fol-lowing installation, a technician comes tothe home, consults with the inhabitants,and sets up the initial programming. As theinhabitants’needs change over time, thetechnician can modify the programming,either remotely or on site.

The adaptive house In contrast to existing automated homes

that can be programmed to perform variousfunctions,our research focuses on develop-

ing a home that essentially programs itselfby observing the lifestyle and desires of theinhabitants and learning to anticipate theirneeds.1–3This house’s intelligence lies inits ability to adapt its operation to accom-modate the inhabitants; thus,we call theproject the adaptive house.

Traditional automated homes require auser interface, such as a touchscreen thatdisplays the system state and controls,or aspeech-recognition front end. However,even a well-designed interface impedes theacceptance of an automated home. In con-trast,the adaptive house should be unobtru-sive and require no special interactions.Inhabitants operate the adaptive house asthey would an ordinary home—using lightswitches,thermostats,and on/off and vol-ume controls like those to which they areaccustomed. Unlike an ordinary home,however, these adjustments are monitoredand serve as training signals—indicationsto the house as to how it should behave.

The adaptive house infers appropriaterules of operation of devices from the train-

ing signals and from sensors that provideinformation about the environmental state.As the house becomes better trained, it be-gins to anticipate the inhabitants’needs andsets devices accordingly, gradually freeinginhabitants from manual control of the envi-ronment. For example, it could automati-cally maintain the room temperature to alevel appropriate given the particular occu-pants,activities,manner of dress,and timeof year; it could choose one pattern of light-ing while dinner is being prepared, andanother for a late-night snack; it could turnon the television news during dinner, or playclassical music when water is drawn for abath,based on past selections of the inhabi-tants. Ideally, the house’s operation is trans-parent to the inhabitants,other than the factthat they do not have to worry about manag-ing the various devices in the home.

We might view the home as a type ofintelligent agent that infers the inhabitants’desires from their actions and behavior.Intelligent software agents abound thatattempt to satisfy the information needs of

Michael H. Coenis a doctoral candidate at the MIT Artif icial IntelligenceLab, where he has lead the Intelligent Room Project for the past two years.His research focuses on intelligent rooms,software agent programminglanguages,multimodal sensor fusion,and coordination theories for multi-agent systems. He has SB and SM degrees in computer science from MIT.In 1998,he chaired the AAAI Spring Symposium on Intelligent Environ-ments,the first such gathering of its kind. Contact him at the MIT AI Lab;545 Technology Square; Cambridge, MA 02139; [email protected];www.ai.mit.edu/people/mhcoen.

Michael C. Mozer is an associate professor in the Department of Com-puter Science and the Institute of Cognitive Science at the University ofColorado. His research interests include computational modeling of humancognition,neural network architectures and learning algorithms for tempo-ral pattern processing, and applications of machine learning to problems inengineering. He has a BA in computer science from Brown and an MA inpsychology and PhD in cognitive science and psychology from the Univer-sity of California at San Diego. Contact him at the Dept. of Computer Sci-ence, Eng. Center, Rm. 741,Univ. of Colorado,Boulder, CO 80309-0430;[email protected]; www.cs.colorado.edu/~mozer.

Richard Hasha is the director of engineering for Interactive Home Sys-tems,a division of Corbis Corporation. IHS has overall responsibility forthe specification, design,and implementation of all automated control sys-tems for the Gates Estate at Medina,Washington. Richard was the chiefsystems architect for this project. He has over 27 years of broad-rangingsoftware engineering experience, much of which has been focused on dis-tributed computing. Contact him at IHS (Corbis),15395 SE 30th Pl.,Ste.300,Bellevue, WA 98007; [email protected].

James L. Flanagan is the vice president for research at Rutgers University.His research interests include collaborative computing and multimodalhuman-computer interaction. He received a BS from Mississippi State Uni-versity and an ScD and an SM from MIT, all in electrical engineering. He isa Life Fellow of the IEEE and a winner of the National Medal of Science,and a member of the National Academy of Sciences and the National Acad-emy of Engineering. Contact him at the CAIP, Rutgers Univ., P.O. Box1390,Piscataway, NJ 08855-1390; [email protected]; www.caip.

users of the Web. We extend the idea of in-telligent agents to comfort needs of peoplein natural living environments. Intelligentagents seldom perform perfectly because oftheir limited ability to infer users’ intentions.However, we can minimize this problem innatural environments through the use ofsmart sensors and some general domainknowledge, such as an analysis of typicaltasks performed in an environment.

Residential comfort systems To discuss the adaptive home in concrete

terms,let us focus our discussion on thecontrol of basic residential comfort sys-tems:air heating, lighting, ventilation, andwater heating. The reason for this focus istwofold:

• These devices are prime consumers ofenergy resources,and thus present anopportunity for energy conservation.

• Although some devices in the home arecontrolled with relative ease, such as thestereo or TV, the operation of residentialcomfort systems can be quite complex.

Consider the control problem for airheating. (In Colorado,we worry moreabout heating in the winter than cooling inthe summer.) The thermostat could be setto 70° around the clock. However, this isinefficient because the house doesn’t needto be heated while its inhabitants are atwork, nor does the setpoint have to be ashigh at night. We could use a digital ther-mostat with multiple setback periods tospecify when to lower the setpoint. How-ever, different rules are required for week-days and weekends. Furthermore, the set-back thermostat only lets us specify

first-order rules of occupancy (expecteddeparture and return times based on week-day versus weekend). For efficiency, thethermostat should really know more subtlepatterns of occupancy (expected returntime based on day of week,departure timethat day, weather conditions,and recentschedule, for example).

It must also consider the time required toheat the house, which depends on outdoorweather conditions. If the house has multi-ple furnaces or zoned control, room-occu-pancy patterns must be considered. Further-more, alternative means of heating shouldbe considered, such as opening blinds toallow for passive solar gains,electric spaceheaters to heat individual rooms,or fans tomix the air. Finally, utilities sometimescharge for energy based on time of use,making it more efficient to overheat thehouse during the day than to heat it to theappropriate setpoint immediately before thereturn of the inhabitants. Thus,regulatingthe air temperature in the house to simulta-neously maintain comfort and energy effi-ciency is not a trivial challenge.

ACHE We have constructed a prototype system

in an actual residence. The residence wascompletely renovated in 1992,at whichtime the infrastructure needed for the adap-tive-house project was incorporated intothe building, including nearly five miles oflow-voltage conductor for collecting sensordata and a power-line communication sys-tem for controlling lighting, fans,and elec-tric outlets.

We call the system that runs the homeACHE,an acronym for Adaptive Control ofHome Environments. At present,ACHE cancontrol 22 banks of lights (each having 16intensity levels),six ceiling fans,two elec-tric space heaters,a water heater, and a gasfurnace. ACHE has roughly 75 sensors,which include the following for each roomin the home:intensity setting of the lights,status of fans,status of digital thermostat(which is both set by ACHE and can beadjusted by the inhabitant),ambient illumi-nation, room temperature, sound level, sta-tus of one or more motion detectors (on oroff), and the status of doors and windows(open or closed). In addition, the systemreceives global information such as thewater heater temperature and outflow, out-door temperature and insulation, energyuse of each device, gas and electricity

costs,time of day, and day of week. Figure1 shows a floor plan of the residence, aswell as the approximate location ofselected sensors and actuators.

Objectives. ACHE has two objectives:anticipation of inhabitants’needs andenergy conservation. For the first, lighting,air temperature, and ventilation should bemaintained to the inhabitants’comfort; hotwater should be available on demand. Ifinhabitants manually adjust an environ-mental setpoint,they are indicating thattheir needs have not been satisfied. Forenergy conservation, lights should be set tothe minimum intensity required and hotwater should be kept at the minimum tem-perature needed to satisfy the demand. Al -so,only rooms that are likely to be occu-pied in the near future should be heated;when several options exist to heat a room,the one minimizing expected energy con-sumption should be selected.

Achieving either of these objectives inisolation is fairly straightforward. If ACHEwere concerned only with appeasing theinhabitants,the air temperature could bemaintained at a comfortable 70° at alltimes. If ACHE were concerned only withenergy conservation, all devices could beturned off. In what sort of framework canthe needs of the inhabitants be balancedagainst energy conservation? We haveadopted an optimal control framework, inwhich failing to satisfy each objective hasan associated cost. A discomfort cost isincurred if ACHE does not anticipateinhabitant preferences. An energy cost isincurred based on the use of gas and elec-tricity. ACHE’s goal is to minimize thecombined costs of discomfort and energy.

This framework requires that discomfortand energy costs be expressed in a commoncurrency, which we have chosen to be dol-lars. Energy costs can readily be character-ized in dollars,but some creativity is in-volved in measuring discomfort costs indollars. Relative discomfort is indicatedwhen the inhabitant manually adjusts adevice (such as turning on a light); a mis-ery-to-dollars conversion factor must beused to translate this relative discomfort toa dollar amount. One technique we haveused to specify this factor relies on an eco-nomic analysis in which we determine thedollar cost in lost productivity that occurswhen ACHE ignores the inhabitants’de-sires. Another technique adjusts the conver-


Coming NextIssue

Idea Futures

by Robin Hanson

Robert Wood Johnson Founda-tion Health Policy Scholar,

UC Berkeley

MARCH/APRIL 1999 13

sion factor over a several-month periodbased on how much inhabitants are willingto pay for gas and electricity.

Prediction and control. To minimize com-bined discomfort and energy costs,ACHEmust be able to predict inhabitant lifestylepatterns and preferences,and to model thephysics of the environment. We illustratewith a simplified scenario: It’s 6:00 pm andthe home is unoccupied. ACHE mustdecide whether to run the furnace. On theone hand, if the furnace is turned on for thenext half hour, ACHE predicts that the in-door air temperature will rise to a leveltolerable by the inhabitants,but an energycost will be incurred. On the other hand, ifthe furnace is left off and the inhabitantsreturn home around 6:30,a discomfort costwill be incurred. The decision about thefurnace state depends on the expected en-ergy cost on the one hand and the expecteddiscomfort cost on the other hand (whichdepends on the probability that the inhabi-tants will return).

ACHE thus requires predictors that esti-mate future states of the environment suchas the probability of home occupancy (basedon 30 variables,including recent occupancypatterns) and indoor air temperature (basedon outdoor air temperature and a thermalmodel of the house and furnace),as well asinhabitant preferences such as the tempera-ture required to keep the inhabitant from“punishing”ACHE. The predictors relylargely on neural networks,which are statis-tical pattern-recognition devices inspired bythe workings of the brain. Neural networkscan learn from experience—from data col-lected by the house.

We have conducted simulation studies ofthe air-heating system,1 using actual occu-pancy data and outdoor temperature pro-files,evaluating various control policies.ACHE robustly outperforms three alterna-tive policies,showing a lower total (dis-comfort plus energy) cost across a range ofvalues for the relative cost of inhabitantdiscomfort and the degree of nondetermin-ism in occupancy patterns.

We have also implemented and tested alighting controller in the house.2 To givethe flavor of its operation, we describe asample scenario of its behavior. The firsttime that the inhabitant enters a room(we’ll refer to this as a trial), ACHE de-cides to leave the light off, based on theinitialization assumption that the inhabitant

has no preference with re-gard to light settings. If theinhabitant overrides this de-cision by turning on thelight, ACHE immediatelylearns that leaving the lightoff will incur a higher cost(the discomfort cost) thanturning on the light to someintensity (the energy cost).On the next trial, ACHE de-cides to turn on the light,buthas no reason to believe thatone intensity setting will bepreferred over another. Consequently, thelowest intensity setting is selected. On anytrial in which the inhabitant adjusts thelight intensity upward, the decision chosenby ACHE will incur a discomfort cost,andon the following trial, a higher intensitywill be selected.

Training thus requires just three or fourtrials,and explores the space of decisionsto find the lowest acceptable intensity.ACHE also attempts to conserve energy byoccasionally testing the inhabitant,select-ing an intensity setting lower than the set-ting believed to be optimal. If the inhabi-tant does not complain,the cost of thedecision is updated to reflect this fact,andeventually the lower setting will be evalu-ated as optimal. As described in this sce-nario, ACHE relies on reinforcement-learn-ing techniques.4 ACHE includes a neuralnetwork that predicts when a room is aboutto become occupied, so that the lightingcan be set prior to room entry. The scenariopresented sidesteps a difficult issue:light-ing preferences depend on the context(time of day, current activities,and ambientlight level, for example),thus requiringACHE to learn about desired lighting pat-terns in a context-dependent manner.

Discussion Our research program hinges on a careful

evaluation phase. In the long term, the pri-mary empirical question we must answer iswhether there are sufficiently robust statisti-cal regularities in the inhabitants’behaviorthat ACHE can benefit from them. On firstconsideration,most people conclude thattheir daily schedules are not “regular”; theysometimes come home at 5 pm,sometimesat 6 pm,sometimes not until 8 pm. How-ever, even subtle, higher-order statisticalpatterns in behavior—such as the fact that ifyou’re not home at 3 am,you’re unlikely to

be home at 4 am—are useful to ACHE.These are patterns that people are not likelyto consider when they discuss the irregulari-ties of their daily lives. These patterns areunquestionably present,and our experi-ments to date suggest that they can beexploited to serve as the foundation of anintelligent,adaptive environment.

Acknowledgments We are grateful to Marc Anderson and Robert

Dodier, who helped develop the software infra-structure for the adaptive house. This researchhas been supported by the Sensory Home Auto-mation Research Project (SHARP) of SensoryInc., as well as a CRCW grant-in-aid from theUniversity of Colorado,McDonnell-Pew award97-18,and NSF awards IRI-9058450 and IBN-9873492.

References 1. M.C. Mozer, L. Vidmar, and R.H. Dodier,

“The Neurothermostat: Adaptive Control ofResidential Heating Systems,” in Advancesin Neural Information Processing Systems9, M.C. Mozer, M.I. Jordan,and T. Petsche,eds.,MIT Press,Cambridge, Mass.,1997,pp. 953–959.

2. M.C. Mozer and D. Miller, “Parsing theStream of Time:The Value of Event-BasedSegmentation in a Complex, Real-WorldControl Problem,” in Adaptive Processingof Temporal Sequences and Data Struc-tures, C.L. Giles and M. Gori, eds.,SpringerVerlag, Berlin, 1998,pp. 370–388.

3. M.C. Mozer, “The Neural Network House:An Environment that Adapts to Its Inhabi-tants,” Proc. AAAI Spring Symp. IntelligentEnvironments, M. Coen ed.,AAAI Press,Menlo Park, Calif., 1998,pp. 110–114.

4. R.S. Sutton and A.G. Barto, ReinforcementLearning: An Introduction, MIT Press,1998.

Figure 1. Floor plan of home equipped with the Adaptive Control of HomeEnvironments (ACHE) system.

Needed: A common distributed-object platformRichard Hasha,Interactive Home Systems

Last year, a colleague and I decided weneeded to pull our heads out of a very largehome-automation project we’d been buriedin and go see what the rest of the intelligentenvironments world had been up to. Weattended the AAAI Intellig ent EnvironmentsSymposium at Stanford in the spring of1998. By the end of the conference, I under-stood that a great deal of energy is goinginto building supporting infrastructure justto get to a point where the interesting workcan actually begin. I saw all these reallysmart people spending lots of energy on thesame kinds of plumbing-related work just sothey could begin focusing on their areas ofinterest. What a big waste of gray matter!

For the past five and a half years,I’ vebeen implementing the control system for avery large, very complex private home. Thesystem is based fundamentally on a distrib-uted-object model. Back when we weretrying to figure out just how to structure theproject,I spent a lot of time nailing downthe core constraints and issues:

• It was clearly a distributed problem—lots of hardware and points of user in-teraction spread all over the place.

• You could not ask for a better examplewhere an object-oriented design ap-proach would be beneficial. This wasclearly a modeling effort.

• The owners would want to continuallyincorporate interesting new functional-ity, as it became available.

• This was going to be a large systemconsisting of thousands of object in-stances scattered across lots of comput-ers with tens of thousands of interobjectrelationships.

• We could not find a real-world exampleof such a system having ever been suc-cessfully built.

• People were going to literally live withthis system day in and day out,so itcould not be a fragile Tinkertoy.

• New features should have as little nega-tive impact on existing functionality aspossible.

• The system’s core aspects would bevery hard to change once the house wasoccupied.

This all pointed toward the need for a

general, well-thought-out software plat-form that would stand the test of time asmore and more functionality was piled onit. These facts drove us to spend a high per-centage of our development resources onthe construction of such a platform.

Given our experience, I very much be-lieve that if such a platform were availableto researchers and developers working onintelligent-environment projects,it wouldgreatly amplify their efforts. This essayargues for such a software platform andfurther discusses some of the issues com-mon to complex distributed applicationssuch as those aimed at intelligent-environ-ment behavior.

The problem domainHere, I assume that most intelligent-

environment R&D work will eventuallylead to some form of distributed implemen-tation as the various components of thiswork move from the programmer’s work-bench into a complete system. Lots of com-puters are dedicated to very specific tasks.In addition, the supporting software objectmodel tends to map problem-domain enti-ties,both physical and abstract,very pre-cisely to their real-world counterparts—ifthey don’t, terminal confusion quickly oc-curs. So,we end up with a complex systemof interconnected but independent activeobjects—a network of objects,if you will.This object network can become quitecomplex. In fact,you would like to let itbecome as naturally complex as requiredby the specific project. Without an appro-priate object-network platform (which wecall an object-network OS),the right de-gree of natural object-model complexity isimpossible, thus hampering overall pro-gress in intelligent-environment R&D.

A prototypical object-network OSWhat kind of functionality would be

useful within the intelligent-environmentR&D problem space? What key functionalaspects should our hypothetical object-network OS provide? Some of the areas offunctionality proven to be interestingwithin our project are support for largenumbers of object instances,interobjectreferencing, interobject communication,monitoring and debugging facilities,andconfiguration management.

Support for large numbers of objects. Toallow detailed and accurate problem-domain

modeling, you want to support a distributedset of object instances numbering into thehundreds or, as in our case, the thousands.Both the supporting development tools andthe object-network OS itself should providea very simple means of adding new classes(abstractions) to the problem-domain modelas well as allowing for multiple implemen-tations of these abstractions. Furthermore,any number of runtime instances of theseimplementations should be able to be instan-tiated and allowed to coexist within the run-time object working set.

Within our project,we maintained a reg-istry of both the abstraction derivation hier-archy and, in another dimension off thisclassification tree, zero or more implemen-tation descriptions. These implementationdescriptions describe not only the actualkind of object class used to make up a cor-responding runtime instance but also theclass of object used to actually configureinstances of these runtime objects. Basi-cally, the programmer thinks of this imple-mentation description as a software partdescription. Once registered, any numberof new runtime instances of an implemen-tation can be configured and instantiatedwithin the runtime model.

Also, object instances should be able tolive on their own in an object-network OS.They should not require some other appli-cation-domain component to host them,aswith many of the commercially availabledistributed-object platforms.

Inter object referencing. To support thenaturally complex interobject connectionrequirement,an object-network OS mustprovide a means of cataloging and locatingactive objects. Beyond this,it must alsoprovide a way to express and maintaininterobject reference semantics in someother way than through hard (real) objectreferences. Consider a simple two-objectcross-reference situation: which objectmust come alive first? We obviously have acatch-22 situation.

In a system with hundreds or thousandsof interobject references,not all of the mod-eled objects will be up at the same time—things fail, they’re contained in differentmachines that start up at different times,and so on. And there are other real-worldneeds that lead us toward an environmentthat supports and promotes the idea that it’snatural for objects to come and go in anyorder at any time. This is clearly an inherent


MARCH/APRIL 1999 15

aspect of an active object-programmingmodel.

For example, there’s a class of behaviorwe on this project have termed local sur-vivability. Objects and object clustersshould do the best they can to provide thehighest degree of intended functionality,even if some objects are missing from thecomplete runtime model. There are alsovery practical reasons for wanting to startand stop object instances arbitrarily in anoperational environment. To allow for thesesituations,an object-network OS needs tosupport an object directory service and softobject references.

Object directory service. You assume thatall active objects within the context of anobject-network OS are named uniquely.For our project,we also found it useful tohave every active object instance carry adescription of the kind of abstraction itimplements. We might also want to allowfor the attachment of arbitrary classifica-tion attributes to object instances,thusallowing orthogonal views of the activeobject working set. To this end, an object-network OS needs to support an active(instantiated) object directory that allowsany object within the system to locate anddynamically bind to other objects (at leastby name) within that same system.

Soft object references. The object-networkOS must support formation and mainte-nance of abstract references from oneobject to another without regard to theactual availability of the referencedobject. This feature is key in allow-ing objects to come, go, and rebindin any order and at any time. Thisfacility should also provide refer-enced object-availability notifica-tions to any referencing objects.

Interobject communication With an object directory service

and soft object reference facilitieswithin an object-network OS, wecan construct whatever logicalobject-network topology werequire.

Direct-method invocation. Weassume that, via the soft referencefacilities,we can achieve realobject-method invocation for thosereferenced objects that are active.

This direct object-method invocation is theobvious way two objects communicate. Inour experience, there are at least two otherforms of useful interobject communicationthat an object-network OS should support.These are packaged in the form of eventsand properties.

Distributed events. Generally, events arethe what, when, where, and why of some-thing worth noting within the context of aproblem domain. What specifies the exacttype of event that occurred. Where is theunique identification of the object that gen-erated the event. In our implementation, wefound that hierarchical classification ofevent types was very useful.

The basic event model that we foundworkable was a broadcast-and-subscribemodel. An event-client object informs ourobject-network OS of its desire to receiveevent notifications by listening for specifictypes of events. An event’s producer simplybroadcasts the event out onto the object-network, and all interested objects will re-ceive a corresponding notification. Thisbroadcast-and-subscribe event model sup-ports a very simple, loosely coupled, many-to-many class of interobject communica-tion. Furthermore, by hierarchicallyclassifying event types,we achieve thenotion of abstract event monitoring.

Network properties. The other broad classof useful interobject communication sup-ports the one-to-many relationship. Withinour hypothetical object-network OS, we’ll

call this the support-network properties.This class of communication support pro-vides publish-and-subscribe facilities forthe property client and the property server.The property-server object publishes anamed property to the world. Property-client objects then subscribe to a propertyby providing the name of the property-server object and the name of the publishedproperty. When the value of a property ischanged (typically by the publisher of theproperty), the new property value is distrib-uted to all interested property-client ob-jects. In this model,the publishing object isunaware of any and all client interests,because our object-network OS has the jobof actually distributing the activity. (Theinterobject referencing implied by net-work-property support is assumed to bebuilt using soft object referencing so thatthe attributes of the soft references will beextended to the network properties.)

The combination of one-to-one (direct-method invocation), many-to-many (dis-tributed events),and one-to-many (networkproperty distribution) communication facil-ities are all very useful when building com-plex distributed applications such as thosewithin the intelligent-environment problemspace. Notice that, in each case, bindingbetween objects is symbolic, further sup-porting arbitrarily complex object-inter-connection topologies.

Monitoring and debuggingfacilities

To illustrate the need for integral moni-toring and debugging facilities,letme use our project as an example.At last count,our deployed experi-ment had just over 6,000 activeobject instances,with an average offive external object references perinstance—about 30,000 intercon-nects—all of which are spreadacross over 120 computers. In real-ity, mapping objects to computersis not all that important—in fact,this configuration is viewed as justa big network of 6,000 embeddedcomputational entities. The ques-tion is not if a problem will occurbut when it will occur. Unless ourobject-network OS supports a setof monitoring and problem-isola-tion facilities,we would be hope-lessly lost and confused. In ourimplementation, many mechanisms

.

support this kind of activity. The two mostimportant have proved to be logging andgeneral object-state access.

Logging. I can’t argue strongly enoughfor our object-network OS to support aubiquitous,low-impact logging facility.It’ s equally important that object imple-mentations are fully instrumented with logoutput and that such instrumentation neverbe removed. Once a system is having aproblem, it’ s too late to think about addinglogging to the code. We have had manycomplicated and hard-to-reproduce prob-lems—what I’ ll call nth-order feedbackclasses of problems such as oscillation.Without the ability to snoop around andwatch log activity, problems would benearly impossible to diagnose.

Using our project’s implementation asan example, let me describe what I thinkare the key aspects of a suitable loggingfacility. First, it is very useful to classifylog-record output hierarchically. Typical-ly, there would be classes and subclassesfor maintenance, error, warning, and tracekinds of output. From the programmer’sperspective, the logging primitives shouldbe right there as part of their programmingmodel and runtime environment. We keepa store-and-forward log facility object oneach machine, whose job is to veryquickly buffer local log activity into adisk-based FIFO queue, then forward thatqueue’s content asynchronously toanother facility (typically a central storethat can be monitored on the fly). In addi-tion, we can turn on and off different lev-els of log output at the object-instancelevel. Typically, our system runs objectswith all trace output turned off, but ifwe’re tracking down a problem we canreach in while an object is instantiated andmodify its output log filter. This approachhas proven essential in our system. With-out such logging facilities,a system evenan order of magnitude smaller would beimpossible to operate.

General object-state access. The ability toprobe the general state of an object in-stance while it’s instantiated is also veryuseful. If we think about the general natureof the objects within our object-networkOS, we see a high degree of common be-havior. For example, any object can publishand subscribe to properties,listen forevents,or reference and be referenced by

other objects. In our implementation, thislist of common traits is even longer. Itwould be very nice to be able to ask anyobject in the system to dump its currentstate. As with logging, this kind of facilityis a necessity—for example, to monitor thesystem and verify that all of the referencedobjects are up. This is a key ingredient inbuilding a self-diagnosing system.

Configuration managementSome day, the components that make up

the physical infrastructure of a facility(wires,building, room,city, and so forth)destined to behave as an intelligent envi-ronment might be completely self-identify-ing and, therefore, support self-configura-tion. But this utopian situation does notexist today. So,the question is,how do wedeal with the configuration-managementneeds of an extensively componentizedsoftware environment such as the onewe’re discussing?

From our experience, you need to startwith a single, core, configuration tool thatcan dynamically extend its behaviorthrough implementation-aware configura-tion components. We call these instanceconfigurators. The core configurationtool establishes a hierarchical user-inter-face environment and provides a commonimplementation component frameworkfrom which more application-specific in-stance configurators can be derived. Theclass-inheritance structure for the in-stance configurators parallels the corre-sponding runtime object-implementationclassification hierarchy in almost everycase. This leads to a great deal of reuseand overall enforcement of model-spe-cific semantics within a general configu-ration framework.

ConclusionsWe have demonstrated the need for an

object-network OS that supports modularconstruction,configuration, and deploy-ment of naturally complex software objectmodels aimed at the intelligent-environ-ment R&D problem space. This, in myview, would enable researchers to focusmore on investigating intelligent environ-ments and less on developing the infra-structure to support those investigations.

Through the deployment of appropri-ately complex and accurately modeled sys-tems,the goal of intelligent environmentswill be realized.

Autodirective sound capture:towards smarter conference roomsJames L. Flanagan,CAIP Center, RutgersUniversity

Effective teleconferencing is increas-ingly sought for collaboration amonggroups who are geographically separated.Whether for activities as disparate as cor-porate strategizing, product design,or dis-tance education, a key objective is voicecommunication that approaches the conve-nience and naturalness of face-to-face con-versation—as though all participants werein the same meeting room. This desidera-tum implies “hands-free”sound pickup,where talkers may speak at some distancefrom the microphone system,without theencumbrance of hand-held, body-worn, ortethered equipment. Implied, too,is soundcapture of sufficiently high quality—inwhich deleterious effects of multipath dis-tortion (room reverberation) and interferingacoustic noise have been mitigated.

Enabling technologyWe want,then,performance comparable

to that of a microphone positioned close toeach talker’s mouth—even when the talkermight be in motion (as in lecturing or ex-plicating projected graphics). Several ad-vances coalesce to support new capabilitiesfor achieving this objective.

Recent years have seen development andlarge-scale manufacture of electret trans-ducers. This device is a condenser micro-phone, in which the polymer membraneexposed to the sound wave has one surfacemetalized and the dielectric has been givena permanent electrostatic polarization (abound charge that provides the bias poten-tial for the condenser element).1 Construc-tion is exquisitely simple and inexpensive,and performance approaches that of theclassic air-dielectric condenser microphone(flat amplitude response over a wide fre-quency range, linear phase, large dynamicamplitude, good sensitivity of the order of–40 dBv/Pascal,or about a millivolt of out-put for a strong conversational-level signalat 1 meter). In addition, no high-potentialexternal bias is required, and the transducerhas relatively low source impedance. Inmanufacture, a field-effect transistor pre-amplifier is typically integrated onto thecondenser backplate, with the whole unit,in volume, costing less than a dollar.

Concomitantly, sampled-data theory,


MARCH/APRIL 1999 17

digital-signal processing, and microelec-tronic circuitry have emerged explosively,so that sophisticated single-chip computerscan be economically employed in greatnumbers. These advances,together withnew acoustic understanding in characteriz-ing the behavior of microphone arrays inenclosures (where geometries and acousticproperties are known, or prescribed) permitnew approaches for sound pickup in largeconference groups—approaches that gosubstantially beyond the shared micro-phone on the table or the tethered mikepassed around from hand to hand.

Two steps are involved in achieving im-provement:spatial location of the desiredsound source (usually speech),and transduc-tion of a good-quality replica of the desiredsound. Both steps should be automatic.

Sound-source location. One approach tolocating the position of a sound source in anenclosure (typically a talker) is based onacoustic measurements. Similar to electro-magnetic techniques (Loran or GPS, amongothers) differences in time of flight to re-ceivers of known position provide the rele-vant data. One implemented system usestwo quads of microphones,placed prefer-ably on adjacent orthogonal walls.2 Eachquad provides six distinct pairs of sensorsfor which differences in signal arrival timecan be estimated. (That is,each set of fourmicrophones (n), taken two at a time with-out regard to order (p), or n!/p! (n–p)! = 6.)Each time difference defines a hyperboloid-like surface on which a radiating sourcewould produce the arrival difference. Inter-section of the surfaces define a unique(overdetermined) point in 3D space.

The crucial measurement is that of timedifference of arrival (TDOA). Cross correla-tion of the signals for each sensor pair helpsto reveal the time-difference information.Computational advantages accrue fromusing fast Fourier transforms to accomplishthe correlation in the frequency domain,namely by computing the normalized cross-power spectrum for each pair, inverse trans-forming, and judging the time shift that pro-duces a maximum in the correlationfunction.3,4Also for computationalefficiency, the source position’s x,y,z coordi-nates are estimated by a nondirected gradi-ent descent (using knowledge of enclosuregeometry and sensor positions) to minimizethe square error between the set of 12 mea-sured TDOAs and those computed for any

candidate x,y,z position. In a perfect environ-

ment (or free space),theerror can approach zero,but in a noisy, reverber-ant enclosure theTDOAs are contami-nated. This impacts thepractical choice of spac-ing for sensors in thequad. The spacing needsto be far enough to use-fully measure a timedifference, but not so farthat the signal at eachreceiver is dominated bychaotic multipath reflec-tions and interferingnoise. In many confer-ence rooms,a practicallyuseful spacing is on theorder of 20 cm. Figure 2shows typical source-location accuracies measured by this tech-nique in an industrial conference audito-rium. As the measurements show,accuracies on the order of one-third meterare obtained in this practical environment.

Because most conference facilities in-clude video equipment,source location byvisual means is also an option,and a varietyof techniques have been researched on thisapproach.5,6A combination of acoustic andvisual methods is also an alternative and acurrent topic of research.

Sound captur e. Having determined the lo-cation of the desired source, the next step isto capture a good-quality facsimile. One ofthe simplest means for mitigating reverber-ation and noise is delay-sum beamforming,where a cigar-shaped beam of sensitivitypoints at the source. A simple arrangementis the 1D line array in which delay is sup-plied to each receiver so as to cohere signalsarriving from a prescribed direction.7 Forthat direction (angle to the line),outputs ofthe multiple receivers add voltagewise,while for other directions,the uncorrelatedcomponents add powerwise, providing anarray gain ideally bounded by the numberof receivers. This works well if the environ-ment is reasonably benign. But it has selec-tivity only in two spatial dimensions,and itssignal-to-noise ratio (SNR) degrades mono-tonically as reverberation increases. (This isbecause the beamformer collects all sourcesalong the bore of the beam—including

those images of the source that it “sees”onany reflecting walls intersecting the beam.)What we need is spatial selectivity in rangeas well, giving full 3D selectivity. The tech-nique of matched-filter processing appliedto arrays provides this.

The matched filter requires convolutionof each receiver signal by a causal approxi-mation to the time reverse of the impulseresponse of the propagation path from thedesired source to that receiver. Because themultipath characteristics are included, thematched-filter array can turn reverberantenergy into useful signal,in effect makingarray performance (ideally) immune toreverberation and providing spatial selec-tivity in three dimensions. The computa-tional costs are high,but digital-signal pro-cessing continues to enjoy a revolution ofeconomy. In particular, if the impulseresponse from the desired source (focal)location to the nth sensor of the array ishnf(t),then the array output O(t) for sourcesignal s(t) is given by the convolution

where ϕ(hnf, hnf) is the autocorrelation ofthe impulse response. On the other hand, asource at any location other than the focusproduces a path impulse response hnx(t)and an array output

O t s t h t h t

s t h h

nf nfn

N

nf nfn

N

( ) ( ) ( ) ( )

( ) ( , )

= ∗ ∗ −

= ∗

=

=

∑

∑

1

1

ϕ

(a)

(b)

–4 –2 0 2 4 6 8 10–4

–2

0

2

4

6

8

10

12

Stag

e

Microphone

Actual source positionSource position estimate

Met

ers

Meters

–4 –2 0 2 4 6 8 10–1

–0.5

00.5

11.5

22.5

33.5

Microphone

Actual source positionSource position estimate

Met

ers

Meters

Figure 2. Accuracy of position estimates for acoustic (speech) sources in an indus-trial conference room: (a) top view, (b) side view.

where the cross correlation of the impulseresponses ϕ(hnf,hnx) (or rather the decorre-lation of these impulse responses as thesource moves away from the focal posi-tion) determines the spatial acuity in threedimensions.

The Fourier transform for hnf(t) is Ηnf(jw),where w is angular frequency, and eachmatched sensor contributes an output fil -tered by Hnf(jw) 2 for an on-focus source.For severe multipath this response is typi-cally quite variable with frequency. But,where one sensor may have a spectral defi-ciency (zero),another may contribute fully.If the geometry of the array is chosen pru-dently, and Ν is somewhat large, O(jw) caneasily be rendered sensibly flat over the fre-quency range of interest.

In experiments,we found randomly dis-tributed sensors located on orthogonalwalls to be particularly advantageous.8,9

Figure 3 illustrates computation of thebehavior of a 100-sensor array, positionedon one wall of a moderately large room (20× 16 × 5 m) having little sound absorption(α = 0.1). For this figure, we measured thesignal-to-reverberant-noise ratio for a 4-

kHz bandwidth speech source in this room.When the source is located at the focalposition (14.0,9.5,1.35 m),the improve-ment in SΝR approaches 15 dΒ (which isadequately reflective of the ideal array gainof 10log10N). To keep signal delay tolerablethrough the array processing for this simu-lated enclosure, all impulse responses havebeen truncated to 105 ms. (Two benefitsaccrue to the extent that the impulseresponses can be truncated. Low-level“precursor” signals are diminished in thearray output,and the filter computation ismore economical.10)

The 3D spatial selectivity of the matched-filter array is remarkably valuable in captur-ing a desired signal not only in the presenceof reverberation,but also under adverse con-ditions of noise interference and competingsignals. Two 2D random arrays in the roompreviously mentioned were applied to re-trieving a designated speech signal in thepresence of a simultaneous talker and tworandom noise sources,as Figure 4 shows.Talker 1 was a man and Talker 2 was a wo-man,and each speech source was 4 kHz inbandwidth. All source levels were of compa-rable acoustic power, except Noise 1 was –5dΒ down from the other three. When re-ceived by a single microphone on the wall,this simultaneous signal complex is a totallyunintelligible jumble. When the matched-filter array is focused on a desired source(Talker 1 in this instance),it can extract anintelligible signal. Figure 5 shows thesecomparisons along with the original sourcesignal for Talker 1. The improvement inSNR afforded by the array over the singlemicrophone is measured as 17 dB.

Preliminary systemAs an initial exploration of automatic

source location,coupled with slaved audioand video capture, we have implemented theexperimental system of Figure 6. All com-putation is accomplished on a Pentium PCwith one DSP board, and addressable buck-et-brigade delay lines on each receiver ac-complish the delay-sum beamforming (for a21-element harmonically nested line arrayof first-order gradient electrets). The coordi-nates of the identified source are provided toboth the slaved video camera and the beam-steered microphone array. Owing to limita-tions of arithmetic speed in the existingequipment,location estimates are made onlyat the rate of 2sec–1. Approximately twicethis speed is desirable for tracking fast-mov-

ing talkers. Texas Instruments is sponsoringthe design and construction of a fully digitalhigh-speed system.

Research issuesNumerous research questions and appli-

cations have not been touched on in thisbrief discussion. A small sampling includes

• determination, storage, and truncationof coefficients for matched filtering,both from measurements in the room(with pseudo-random maximum-lengthsequences) and from on-the-fly compu-tation (using room geometry andacoustical characteristics);

• techniques for zoom control of the focalvolume and control of frequency responseunder zoom (a research project addressingthese points is presently in progress undersponsorship of Intel Corp.);

• signal classification to determine thenature of sources (speech, music,noise);

• stability of f ilter coefficients to temper-ature, obstacles,room,and audiencechanges;

• automatic location of multiple movingtalkers;

• sound projection in the receiving roomto duplicate (virtual) source position;and

• sound reinforcement within large halls.

Going to large scale. Convention halls andlarge auditoria pose special problems forsound capture, reinforcement,and projec-tion. Economical DSP and low-cost,high-quality electret microphones bode well formassive use of microphone arrays. Pres-ently under study is one special hardwareprocessor designed to control and operateup to 512 microphone channels.11 Figure 7illustrates a presently implemented 400-element array built for an earlier Bell Labsapplication. Sophisticated tracking of mul-tiple talkers with matched-filter processingrepresents a sizeable technical challengefor arrays of this size.

Software conference manager. As we ad-vance in capabilities for networked collab-oration and large-group teleconferencing,opportunities increase for software agentsand multimodal user interfaces. Combinedwith the technologies of speech recogni-tion, speech synthesis,and talker identifi-cation, autodirective audio and video sys-

O t s t h hnf nxn

N( ) ( ) ( , ),= ∗

=∑ϕ

1


1210

86 11 12 13 14 15 16 17

20

10

0

–10

SNR(dB)

y axis (m) x axis (m)

Figure 3. Computed signal-to-noise ratio for a 4-kHzbandwidth speech source matched-filter process by 100sensors randomly distributed on one wall of a reverber-ant room of dimensions 20 × 16 × 5 m. The acousticabsorption coefficient is α = 0.1, and the array isfocused at coordinates 14, 9.5, 1.35 m.

Noise 1

Noise 1

Talker 1

Talker 2

Figure 4. Multiple competing sources in the simulatedroom of Figure 3: Talker 1 man, Talker 2 woman, andNoise 1 and Noise 2 are Gaussian sources.

MARCH/APRIL 1999 19

tems can alleviate the burdens on confer-ence moderators.

Consider the scenario: Conferees convenefor a meeting with a like group situated cross-country. As each arrives,he or she logs in,speaks an identification phrase, and is imagedby the face finder. Attendees also declaretheir credentials relevant to the conference’stopic by speaking answers to brief questionsfrom the synthetic voice at the enrollmentstation. As the conference unfolds,the slavedmicrophone array and the video camera trackthe active participants. The software manager(using keyword spotting, gisting analysis,talker identification,and the position data fedto the autodirective audio and video system)follows who’s talking, for how long, and onwhat topic. If the “manager” observes that aparticipant of marginal credentials is talkingtoo frequently on topics off the point,itceases to point the microphone array andcamera at that position!

AcknowledgmentsThe NSF, DARPA, NIST, Bellcore, IBM, TI,

Intel, and the New Jersey Commission on Sci-ence and Technology have supported compo-nents of the research described here.

References1. G.M. Sessler and J.E. West,“Self-

Biased Condenser Microphone WithHigh Capacitance,” J. Acoust. Soc.America, Vol. 34,1962,pp. 1787–1788.

2. D.V. Rabinkin et al.,“A DSP Imple-mentation of Source Location UsingMicrophone Arrays,” J. Acoust. Soc.America, Vol. 99,No. 4,Pt. 2,Apr.1996,p. 2503.

3. M. Omologo and P. Svaizer, “Acous-tic-Event Localization Using a CrossPower Spectrum Phase-Based Tech-nique,” Proc. ICASSP, IEEE Press,Piscataway, N.J., 1994,pp. 273–276.

4. D. Rabinkin et al.,“Estimation ofWavefront Arrival Delay Using theCross-Power Spectrum Phase Tech-nique,” J. Acoustical Soc. America,Vol.100,No. 4,Pt. 2,Oct. 1996,p. 2697.

5. Bellcore, “Introducing the AutoAuditorium System,” 1998; http://AutoAuditorium.Bellcore.com.

6. J. Wang,Y. Liang, and J. Wilder,“Visual Information Assisted Micro-phone Array Processing In A HighNoise Environment,” Proc. SPIE,SPIE,Bellingham,Wash.,Vol. 3521,1998,pp. 198–203.

7. J L. Flanagan and E.E.,Jan,“SoundCapture with Three-DimensionalSelectivity,” Acustica, Vol. 83,No. 4,July/Aug. 1997,pp. 644–652.

8. D.V. Rabinkin, Optimum Sensor Placementfor Microphone Arrays, PhD thesis,Dept.of Electrical and Computer Engineering,Rutgers Univ., New Brunswick, N.J., 1998.

9. E.E. Jan and J.L. Flanagan,Image Charac-terization of Acoustic Multipath in ConcaveEnclosures, Tech. Report TR-162,RutgersUniv. CAIP Center, July 1993.

10. E.E. Jan,Parallel Processing of Large ScaleMicrophone Arrays for Sound Capture withSpatial Volume Selectivity, PhD thesis,Dept. of Electrical and Computer Engineer-ing, Rutgers Univ., 1995.

11. H. Silverman et al.,“A Digital ProcessingSystem for Source Location and SoundCapture by Large Microphone Arrays,”Proc. ICASSP 97,IEEE Press,Vol. 1,1997,pp. 251–254.

Host PCCalculates source coordinates

from TDOA estimates,controls camera andbeamforming array

Signalogic Sig32C-8 DSPboard (inside host PC)Calculates TDOA estimates

from sound captured from primary array

2 Quad-positionsensingarrays

Line-array hardwareUses programmable analog

delay lines to perfom beamforming and

provides sound output

Video monitor

Canon VC-C1 camera

Beamformingmicrophone array

Figure 6. Implemented system for automatic source location and autodirective captureof audio and video.

Figure 5. Retrieval of a desired signal from the simultaneouscompeting sources of Figure 4. Two matched-filter arrays of 100randomly distributed sensors focus on Talker 1: (a) originalsource signal for Talker 1; (b) four simultaneous sources receivedby a single microphone; (c) Talker 1 retrieved by the matched-filter array from the simultaneous complex. (The abscissa is timein seconds; the ordinate is frequency in Hertz.)

Figure 7. Planar array of 400 electret microphones for use in a large lecturehall.

(a)

(b)

(c)

Room service, AI-style Rutgers Universitypeople.csail.mit.edu/mhcoen/Papers/stopworrying.pdfto stop...

Documents

Transcript of Room service, AI-style Rutgers Universitypeople.csail.mit.edu/mhcoen/Papers/stopworrying.pdfto stop...