Media in performance: Interactive spaces for dance, theater...

by F. SparacinoG. DavenportA. Pentland

The future of artistic and expressivecommunication in the varied forms of film,theater, dance, and narrative tends toward ablend of real and imaginary worlds in whichmoving images, graphics, and text cooperatewith humans and among themselves in thetransmission of a message. We have developed a“media actors” software architecture used inconjunction with real-time computer-vision-based body tracking and gesture recognitiontechniques to choreograph digital media togetherwith human performers or museum visitors. Weendow media objects with coordinatedperceptual intelligence, behaviors, personality,and intentionality. Such media actors are able toengage the public in an encounter with virtualcharacters that express themselves through oneor more of these agents. We show applicationsto dance, theater, and the circus, which augmentthe traditional performance stage with images,video, music, and text, and are able to respondto movement and gesture in believable,aesthetical, and expressive manners. We alsodescribe applications to interactive museumexhibit design that exploit the media actors’perceptual abilities while they interact with thepublic.

Our society’s modalities of communication arerapidly changing. Large panel displays and

screens are being installed in many public spaces,ranging from open plazas, to shopping malls, to pri-vate houses, to theater stages and museums. In par-allel, wearable computers are transforming our tech-nological landscape by reshaping the heavy, bulkydesktop computer into a lightweight, portable de-vice that is accessible at any time. Such small com-puters, accompanied by a high-resolution private eyefor display and by an input device, have alreadyadded an additional dimension to our five senses

since they allow us to wear technology just as an el-ement of our everyday clothing. Transmission of in-formation through these media requires new author-ing tools that are able to respond reliably not justto mouse point-and-click or drag-and-drop actions,but also to more natural full-body movements, handgestures, facial expressions, object detection, and lo-cation.

Interactive experiences in general benefit from nat-ural interactions, compelling communication, andease of implementation. We show, according to theseprinciples, how interactive media architectures canbe categorized as scripted, responsive, learning, be-havioral, or intentional. We have developed a “me-dia actors” authoring technique: We endow mediaobjects—expressive text, photographs, movie clips,audio, and sound clips—with coordinated percep-tual intelligence, behaviors, personality, and inten-tionality. Such media actors are able to engage thepublic in an encounter with virtual characters thatexpress themselves through one or more of theseagents. They are an example of intentional architec-tures of media modeling for interactive environ-ments.

Since 1995, we have constructed a variety of inter-active spaces,1 and each of them has inspired a re-vision and improvement of our authoring techniques.

rCopyright 2000 by International Business Machines Corpora-tion. Copying in printed form for private use is permitted with-out payment of royalty provided that (1) each reproduction is donewithout alteration and (2) the Journal reference and IBM copy-right notice are included on the first page. The title and abstract,but no other portions, of this paper may be copied or distributedroyalty free without further permission by computer-based andother information-service systems. Permission to republish anyother portion of this paper must be obtained from the Editor.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 0018-8670/00/$5.00 © 2000 IBM SPARACINO, DAVENPORT, AND PENTLAND 479

Media in performance:Interactive spaces fordance, theater, circus,and museum exhibits

In this paper we focus primarily on our performanceand interactive museum work. For dance and the-ater, we have built an interactive stage for a singleperformer that allows us to coordinate and synchro-nize the performer’s gestures, body movements, andspeech with projected images, graphics, expressivetext, music, and sound. For the circus, we created anetworked environment in which participants fromremote locations meet under a virtual circus big topand interact with the performers in a game of trans-formations. For museums, we have authored an in-teractive room as well as a personalized exhibit doc-umentary played in the private eye of a wearablecomputer, using media actors.

In the following section, we first introduce our me-dia modeling approach by doing a critical analysisof existing interactive media authoring techniques.We then present our media modeling taxonomy andexplain in more detail the intentional architecture.In the remainder of the paper we describe our workin interactive dance, theater, circus, and museum ex-hibit design.

Authoring techniques

Recently, in the field of computer graphics, progresshas been made in the creation of lifelike charactersor autonomous agents. Autonomous agents are soft-ware systems with a set of time-dependent goals ormotivations that the agents try to satisfy in a com-plex dynamic environment (such as the real world).2

An agent is autonomous in the sense that it has mech-anisms for sensing and interacting in its environmentas well as for deciding what actions to take so as tobest achieve its goals. In computer graphics, auton-omous agents are lifelike characters driven by au-tonomous goals. They can sense the environmentthrough real or virtual sensors and respond to theuser’s input or to environmental changes by mod-ifying their behavior in accordance with their goals.3–5

Although this approach, called “behavior-based,” hasproven to be successful for a variety of computergraphics problems, it has not yet been fully under-stood or exploited for multimedia presentations, dig-ital storytelling, interactive performance, or newforms of interactive art.

It is not uncommon to read about interactive mul-timedia or artistic experiences that advertise them-selves as behavior-based, whereas in reality theywould be better described as simply responsive orreactive. These experiences are systems in whichshort scripts group together a small number of ac-

tions on the content displayed. These microscriptsare then triggered by sensors, which directly map pre-determined actions of the public to an appropriateresponse of the system. Nevertheless, the fact thatthese scripts often apply to a human character por-trayed in the experience leads to erroneous descrip-tions of them as behavior-based. The scripts do ac-tually show segments of human behavior; however,the term is not used correctly to describe the inter-nal architecture of the system.

To date there is some confusion and a discrepancyin the way the computer graphics community andthe multimedia and electronic art community thinkof behavior-based modeling. We would like to clar-ify some important issues in this respect and delin-eate a direction of work that describes what multi-media and interactive art can learn from computergraphics and how the new behavior-based modelingtechniques can be extended and creatively appliedto a variety of artistic domains. In this section weexplain how the behavior-based approach differsfrom the scripted or reasoning approach and describethe origin, purpose, and advantages of behavior-based modeling.

Scripted versus behavior-based approaches, and be-yond. In the field of multimedia and electronic art,the term “behavior-based” is often ingenuously usedto contrast media modeling from more traditionalarchitectures that separate the content on one endand the routines for orchestrating media for the pub-lic at the other end. This split architecture leads tocomplicated control programs that have to do an ac-counting of all the available content, where it is lo-cated on the display, and what needs to happen whenor if or unless. These systems rigidly define the in-teraction modality with the public as a consequenceof their internal architecture. Often, these programsneed to carefully list all the combinatorics of all pos-sible interactions and then introduce temporal orcontent-based constraints for the presentation. Hav-ing to plan an interactive art piece according to thismethodology can be a daunting task, and the tech-nology in place seems to somehow complicate andslow down the creative process rather than enhanceor expand it.

There is certainly an analogy between these systemsand the more traditional artificial intelligence (AI)applications, which were based on reasoning andsymbol manipulation. Both share the idea of a cen-tralized “brain” that directs the system on the basisof operations on symbols derived from sensory or

SPARACINO, DAVENPORT, AND PENTLAND IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000480

direct input. When used for authoring interactive me-dia or electronic art, we consider this approach sim-ilar to that of having an orchestra director who con-ducts a number of musicians following a given score.This approach leaves very little room for interactiv-ity, and the programmer of the virtual reality expe-rience needs to create break points in the “score”(plot)—somewhat artificially—for the user to be ableto participate. Typical examples of this approach arethe many CD-ROM titles that simulate the presenta-tion of an interactive story or game by careful plan-ning of plot bifurcations and multiple-choice menus.

In our view, interactive art and digital media pre-sentations should not be limited to the constructionof environments where nothing happens until theparticipant “clicks on the right spot.” In most inter-active experiences, visual elements, objects, or char-acters always appear in the same position on thescreen, and their appearance is triggered by the samereactive event-based action. This environment in-duces the public to adopt an exploratory type of be-havior that tries to exhaust the combinatorics of allpossible interactions with the system. The main draw-back of this type of experience is that it penalizesthe transmission of the message: A story, an emo-tion, is somehow reduced to exploring the tricks ofthe gadgetry employed to convey the message. Thefocus, in many cases, turns out to be more on theinterface than the content itself. Navigation is, as amatter of fact, the term that better describes the pub-lic’s activity, the type of interface, and the kind ofexperience offered by systems authored accordingto this approach. These systems are often calledscripted.

Behavior-based is a term originating in artificial in-telligence and is often synonymous with “autono-mous agent research.” It describes control architec-tures, originally intended for robots, that provide fastreactions to a dynamically changing environment.Brooks6 is one of the pioneers and advocates of thisnew approach. Maes2 has extended the autonomousagent approach to a larger class of problems, includ-ing software agents, interface agents, and helpers thatprovide assistance to a human involved in complexactivities, such as selecting information from a largedatabase or exchanging stock. Behavior-based archi-tectures are defined in contrast to function-based ar-chitectures as well as to reasoning-based or knowledge-based systems. The latter approach, correspondingto “traditional AI,” emphasizes operations on sym-bolic structures that replicate aspects of human rea-soning or expertise. It produces “brains,” applying

syntactic rules to symbols representing data from theexternal world, or given knowledge, and which gen-erate plans. These systems work under a closed-worldassumption to eliminate the problem of unexpectedevents in the world. The closed-world assumptiondeclares that all facts relevant to the functioning ofthe system are stored in the system, so that any state-ment that is true about the actual world can be de-duced from facts in the system. This assumption isuseful for designing logic programming environ-ments, but it is untenable in the real world.

Brooks has highlighted the limitations of this cen-tralized and closed-world approach. He has success-fully demonstrated the validity of the behavior-basedapproach by building a number of mobile robots thatexecute a variety of tasks by choosing an appropri-ate action from a hierarchical layering of behaviorsystems (subsumption architecture). Maes has de-veloped an action-selection approach in which in-dividual behaviors have associated an activation levelfor run-time arbitration,7 instead of choosing froma predefined selection mechanism as does Brooks.Blumberg5 has adopted an ethological approach tomodel a virtual dog able to interact with humans aswell as with other behavior-based synthetic creatures.Together with Zeltzer3 and Johnson,4 Blumberg hasprovided an example of how the behavior-based ap-proach can be effective in producing lifelike com-puter graphics animat creatures (animats 5 animal 1automats) that are able to find their bearings in vir-tual worlds, and at the same time can perceive com-mands from a human through the use of real-timecomputer-vision sensors.

The advantage of behavior-based modeling tech-niques is that the designer of the experience doesnot have to think of all the possible sequences andbranchings in defining the interaction between thepublic and the virtual creatures. It suffices to specifythe high-order behaviors, their layering structure,and the goals of the creature to produce a realisticand compelling interaction. Blumberg also intro-duces an abstraction to allow the hard task of co-ordinating the kinematic motion of the articulateddog to be split from the specification of the high-level behavior system. He describes an improved ac-tion-selection mechanism that allows arbitrationamong commands given by a human and the auton-omous drive of the creature determined by its goalsand internal motivations.8 Perlin and Goldberg9 haveapplied a similar approach to animating humans invirtual environments.

IBM SYSTEMS JOURNAL, VOL 39, NOS 3&4, 2000 SPARACINO, DAVENPORT, AND PENTLAND 481

Although behavior-based computer graphics hasproven successful in producing an effective interac-tive experience between the public and a variety ofsynthetic animal creatures, it is important to under-stand and discuss how it can be translated and ap-plied to different application domains. In multi-media, performance, and the electronic arts, thedesigner of the experience and the public are ofteninvolved in more complex forms of interactions andcommunication that require a revision of the cur-rent behavior-based model.

A taxonomy of authoring techniques. The behavior-based approach has proven to be successful whenapplied to mobile robots and to real-time animationof articulated synthetic creatures. In this context, be-havior is given a narrow interpretation derived frombehavioral psychology.10 For animats, behavior is astimulus-response association, and the action-selec-tion mechanism assigning weights to the layered be-haviors can be seen as a result of operant condition-ing10 on the creature. Behavior-based AI has oftenbeen criticized for being “reflex-based,” because itcontrols navigation and task execution through shortcontrol loops between perception and action.

In our view, Skinner’s reductive notion of behavioris insufficient to model many real-life human inter-actions as well as simulated interactions through thecomputer. Multimedia, entertainment, and interac-tive art applications all deal with an articulated trans-mission of a message, emotions, and encounters,rather than navigation and task execution. As wemodel human interaction through computer-basedmedia, we need to be able to interpret a person’sgestures, movements, and voice, not simply as com-mands to virtual creatures but as cues that regulatethe dynamics of an encounter, or the elements of aconversation.

Researchers have used a variety of approaches togive lifelike qualities to their characters and to givethem the ability to interact and respond to a user’scommands. Blumberg and Galyean5 use an ethologi-cal model to build behavior-based graphical crea-tures capable of autonomous action, and who canarbitrate response to external control and autonomy.They introduce the term “directability” to describethis quality. Hayes-Roth11 uses the notion of directedimprovisation to achieve a compromise between “di-rectability” and lifelike qualities. Her research aimsat building individual characters that can take direc-tions from the user or the environment and act accord-ing to those directions in ways that are consistent with

their unique emotions, moods, and personalities (im-provisation). Magnenat-Thalmann and Thalmann12

have built a variety of examples of virtual humansequipped with virtual visual, tactile, and auditory sen-sors to interact with other virtual or real (suited ortethered) humans.13 Perlin and Goldberg describean authoring system for movement and action ofgraphical characters.9 The system consists of a be-havior engine that uses a simple scripting languageto control how actors communicate and make deci-sions, and an animation engine that translates pro-grammed canonical motions into natural noisy move-ment. Terzopoulos provided a fascinating exampleof behavior-based graphical fishes endowed with syn-thetic vision and able to learn complex motorskills. 14,15 Tosa has built characters that can under-stand and respond to human emotion using a com-bination of speech and gesture recognition.16 Batesand the Oz group have modeled a world inhabitedby “woggles” with internal needs and emotions andcapable of complex interactions with the user.17 Theusers’ interface, though, is mainly limited to mouseand keyboard input.

Before discussing extensions or alternatives to thebehavior-based approach, we need to analyze thetype of problems with which we are faced when au-thoring systems in our field of research. In this sub-section we provide a taxonomy of interactive systemsbased on the human-machine and human-contentinteraction modality and the system architecture.This taxonomy does not pretend to be exhaustive.It provides, however, a focus in defining a set of ba-sic requirements, features, and architectures of cur-rent interactive media applications. We suggest clas-sifying interactive systems as: scripted, responsive,behavioral, learning, and intentional. We consider ourfield to encompass multimedia communications—the world of digital text, photographs, movie clips,sounds, and audio—electronic art, (interactive) per-formance, and entertainment in general.

● In scripted systems a central program coordinatesthe presentation of visual or audio material to anaudience. The interaction modality is often re-stricted to clicking on a static interface in order totrigger new material to be shown. These systemsneed careful planning of the sequence of interac-tions with the public and acquire high complexitywhen drawing content from a large database. Thisauthoring complexity often limits the experienceto a shallow depth of content and a rigid interac-tion modality. Examples of scripted authoring tech-


nique can be found in Sawhney et al.18 and in Aga-manolis and Bove.19

● In responsive systems control is distributed over thecomponent modules of the system. As opposed tothe previous architectures, these systems are de-fined by a series of couplings between user inputand system responses. The architecture keeps nomemory of past interactions, at least explicitly, andis event-driven. Many sensor-based real-time in-teractive art applications are modeled accordingto this approach. One-to-one mappings define ageography of responses whose collection shapesthe system architecture as well as the public’s expe-rience. Although somewhat easier to author, re-sponsive experiences are sometimes repetitive: Thesame action of the participant always produces thesame response by the system. The public still tendsto adopt an exploratory strategy when interactingwith responsive systems and, after having tried allthe interface options provided, is often not at-tracted back to the piece. Sometimes simple re-sponsive experiences are successful because theyprovide the participant with a clear understand-ing of how their input—gestures, posture, motion,voice—determines the response of the system. Theprompt timing of the response is a critical factorin being able to engage the public in the experi-ence. Examples of responsive systems are de-scribed by Davenport et al.20 and Paradiso.21

● In behavioral systems or environments the responseof the system is a function of the sensory input aswell as its own internal state. The internal state isessentially a set of weights on the goals and mo-tivations of the behavioral agent. The values ofthese weights determine the actual behavior of theagent. Behavioral systems provide a one-to-manytype of mapping between the public’s input andthe response of the system. The response to a par-ticular sensor measurement or input is not alwaysthe same: It varies according to the context of theinteraction that affects the internal state of theagent. Successful behavioral systems are those al-lowing the public to develop an understanding ofthe causal relationships between their input andthe behavior of the agent. Ideally, the public shouldbe able to narrate the dynamics of the encounterwith a synthetic behavioral agent as they would nar-rate a story about a short interaction with a livingentity, human, or animal. This is one of the rea-sons why behavioral agents are often called life-like creatures.8,9

● Learning systems have the ability to learn new be-haviors or to modify the existing ones by dynam-ically modifying parameters of the original behav-iors. These systems provide a rich set of interactionmodalities and dynamics and offer new interest-ing venues for interactive media architectures.14,22

● Intentional systems are modeled according to a newway of thinking about authoring interactive me-dia, which we present briefly in this section andexpand on later. We introduce an additional layerin the one-to-many mapping between sensory in-put and system response, called the perceptuallayer. Sensor data are first interpreted by the sys-tem as a “percept” and then mapped to an actionselected by the behavior system. Both the inter-pretation and the behavioral mechanisms are in-fluenced by the personality of the agent. The agentgenerates expectations of the public’s behavior andtherefore “feels” frustrated or gratified by its expe-rience with people. The intermediate layer of per-ceptions provides the agent with an interpretationof the interactor’s intentions and can be consid-ered as a primitive “user model” of the system.

The intentional approach allows the system to sim-ulate more closely the dynamics of a human en-counter, such as the communication of emotion.

These architectures are not mutually exclusive. Theydescribe the main concept, structure, and organiza-tion of the system. However, a behavioral system canalso learn or eventually scale to be simply respon-sive, according to the context of the interaction withthe participant.

Intentional systems. In this subsection we introducea new media modeling technique for authoring in-teractive experiences. We describe “media actors”:images, video, sound, speech, and text objects ableto respond to humans in a believable, esthetical, ex-pressive, and entertaining manner. We call applica-tions built with media actors intentional systems. Inconstructing our agents, we have shifted our focusof attention away from a Skinner-like “reflex-based”view of behavior, and we have moved toward build-ing a model of perceptual intelligence of the agent.

Media actors are modeled as software agents whosepersonality affects not only their internal state (feel-ings) but also their perception of the public’s behav-ior (intentions) and their expectations about futureinteractions with their human interactor.


Our media architecture is inspired by a theatricalmetaphor. In theater, the director works with the ac-tors with the goal of drawing the public into the story.In a compelling performance, the actors convey morethan an appropriate set of actions; rather, they cre-ate a convincing interpretation of the story. The au-dience becomes immersed in the performance, sincethey are able to identify, project, and empathize withthe characters on stage. According to our theatricalmetaphor, we see media agents as actors, the pro-grammer or artist as the director of the piece, andthe public as a coauthor who, through interaction,gives life to the characters represented by media ob-jects.

We believe that interpretation is the key not only tocompelling theater but also to successful interactivemedia. Media actors are endowed with the ability tointerpret sensory data generated by the public—room position, gestures, tone of voice, words, headmovements—as intentions of the human interactor.These intentions—friendly, unfriendly, curious, play-ful, etc.—can be seen as a projection of the mediaactor’s personality onto a map of bare sensory data.The media actor’s internal state is given by a cor-responding feeling—joy, fear, disappointment—which, in turn, generates the expressive behavior ofthe agent and its expectations about the future de-velopment of the encounter. In this personalitymodel, feelings reflect a variety of emotional andphysical states that are easily observed by the pub-lic, such as happy, tired, sad, angry, etc., whereas ex-pectations—gratification, frustration, or surprise—stimulate the follow-on action.

Media actors are endowed with wireless sensors toallow natural and unencumbered interactions withthe public. Real-time computer-vision and auditoryprocessing allow the interpretation of simple and nat-ural body gestures, head movements, pregiven ut-terances, and tone of voice. In this type of architec-ture the sensors are not a peripheral part of thesystem. On the contrary, the available sensor mo-dalities, as well as their coordination, contribute tothe model of the perceptual intelligence of the sys-tem.

In line with our theatrical metaphor, media actorsare like characters in search of an author as in Pi-randello’s well-known drama.23 They are media witha variety of expressive behaviors, personalities whoselifelike responses emerge as a result of interactingwith the audience.

At every step of its time cycle a media actor does thefollowing:

● It interprets the external data through its sensorysystem and generates an internal perception fil-tered by its own personality.

● It updates its internal state on the basis of the in-ternal perception, the previous states, the expec-tation generated by the participant’s intention, andits own personality profile.

● It selects an appropriate action based on a rep-ertoire of expressive actions: show, move, scale,transform, change color, etc.

Our model is sensor-driven—which explains why wecall it perceptual—and personality-based, rather thanbehavior-based. By personality we designate the gen-eral patterns of behavior and predispositions that de-termine how a person will think, feel, and act. Wehave modeled feelings rather than emotions becausewe consider emotions to be always in response tosome event, whereas feelings can be assimilated tointernal states. The internal state of a media actorcan then be described with the answer to the ques-tion: “How are you feeling today?” or “How areyou?”

This type of character modeling for multimedia dif-fers from both scripted and purely behavior-based(animat) approaches. With respect to the classicalanimat behavior-based approach we introduce:

● A perceptual layer in which the sensorial input istranslated into a percept that helps define a “usermodel” as it contributes to the interpretation ofthe participant’s intention

● A notion of expectation that the media actor needsto have about the participant’s next action so asto model the basic reactions to an encounter suchas gratification or frustration

● A notion of goal as a desire to communicate, thatis, to induce an emotion or to articulate the trans-mission of a message

● An internal state intended as “feeling” that gen-erates an expressive action

The importance of having an intermediate layer ofsensory representation, and predictions, has alsobeen underlined by Crowley.24 However, Crowley’sarchitecture is limited to the construction of reac-tive visual processes that accomplish visual tasks anddoes not attempt to orchestrate media to convey amessage or to animate a lifelike virtual character.


We now describe the interactive spaces we built us-ing media actors and those that have inspired andled us to this authoring method.

The IVE stage and the real-time computervision input system

Our stage, called “IVE” (Interactive Virtual Environ-ment), is a room-sized area (15 ft 3 17 ft) whoseonly requirements are good, constant lighting anda nonmoving background. A large projection screen(7 ft 3 10 ft) occupies one side of the room and isused as the stage backdrop onto which we orches-trate graphical and image projections. A downward-pointing wide-angle video camera mounted on topof the screen allows the IVE system to track a per-former (Figures 1 and 2). By use of real-time com-puter vision techniques,25–27 we are able to interpretthe performer’s posture, gestures, identity, andmovement. A phased-array microphone is mountedabove the display screen for audio pickup and speechprocessing. A narrow-angle camera housed on a pan-tilt head is also available for fine visual sensing. Oneor more Silicon Graphics computers are used tomonitor the input devices in real time.

The ability to enter the interactive stage just by step-ping into the sensing area is very important. The per-formers do not have to spend time “suiting up,”cleaning the apparatus, or untangling wires. IVE was

built for the more general purpose of enabling peo-ple to participate in immersive interactive experi-ences and performances without wearing suits, head-mounted displays, gloves, or other gear.1,28 Remotesensing via cameras and microphones allows peopleto interact naturally and spontaneously with the ma-terial shown on the large projection screen. IVE cur-rently supports one active person in the space andmany observers on the side. The IVE space is a com-ponent of Pentland’s Smart Rooms research work,and we often refer to it in the scientific literature as

Figure 1 The IVE stage, schematic view from above

SILICON GRAPHICS WORKSTATIONS

CAMERA

MICROPHONES

VIDEOSCREEN

USER

PLANTS

BOOKSHELVESALONG BACK WALL

Figure 2 The IVE stage during rehearsal


synonymous with Smart Room29 or PerceptiveSpace.1

Our real-time computer vision program is calledPfinder, i.e., “person finder.”25 Pfinder is a systemfor body tracking and interpretation of movementof a single performer. It uses only a wide-angle cam-era pointing toward the stage and a standard SiliconGraphics O2** computer (Figures 3 and 4). We haverecently extended the tracking technology to supportmany people or performers at once, using infrared

tracking of the performers from above as we showedin the “City of News” Emerging Technology Exhibitat SIGGRAPH 99.

Performance

Our work augments the expressive range of possi-bilities for performers and stretches the grammar ofthe traditional arts rather than suggesting ways andcontexts to replace the embodied performer with avirtual one. Hence, we call our research “augment-ed performance” by analogy with the term “augment-ed reality,” which contrasts “virtual reality.”

In dance, we have conducted research toward mu-sical and graphical augmentation of human move-ment. We built DanceSpace, a stage in which musicand graphics are generated “on the fly” by the danc-er’s movements. A small set of musical instrumentsis virtually attached to the dancer’s body and gen-erates a melodic soundtrack in tonal accordance witha soft background musical piece. Meanwhile, the per-former projects graphics onto a large backscreen us-ing the body as a paint brush. In this context, therole of the computer is that of an assistant choreog-rapher: The system is able to improvise a soundtrackand a visual accompaniment while the performer iscreating or rehearsing a piece using his or her bodyas the interface. This role is a great advantage whenthe choreographer wishes to create a dance perfor-mance based on the pure expression of body move-ments and not by following a prechosen musicalscore.

In theater, we have done work in gesture, posture,and speech augmentation. In Improvisational The-aterSpace, we create a situation in which the humanactor can be seen interacting with his or her ownthoughts in the form of animated expressive text pro-jected on stage. The text is just like another actor—able to understand and synchronize its performanceto its human partner’s gestures, postures, tone ofvoice, and words. Expressive text, as well as images,extends the expressive grammar of theater by allow-ing the director to show more of the character’s in-ner conflicts, contrasting action and thought mo-ments, memories, worries, and desires, in a wayanalogous to cinema. We followed an interactionmodel inspired by street theater, the mime’s world,and the improvisational theater in general, so as tobypass previously scripted and therefore constrain-ing technological interventions. In this context, therole of the computer, with the use of intentional andexpressive media actors, is that of a co-actor who col-

Figure 3 Pfinder tracking the human body: head, torso,legs, hands, and feet labeled: gray, purple, red, green, orange, and yellow

Figure 4 Continued tracking with Pfinder: notice correct hand tracking even when the hands are in front of the body


laborates with the performer in communicating withthe public. Media actors can also represent the pub-lic’s participation in the improvised performance.Many improvisational theater performances requireand develop the public’s directorial suggestions atspecific plot points during the piece. Media actorscan be used to incorporate some of these suggestionsand become live improvisational characters, sharingthe stage with the human actors.

In the following subsections, we give artistic and tech-nical details on DanceSpace and ImprovisationalTheaterSpace, the virtual studio dance stage, and theDigital Circus.

DanceSpace. DanceSpace is an interactive stage thattakes full advantage of the ability of Pfinder to tracka dancer’s motion in real time. Different parts of thedancer’s body (hands, head, feet, torso) can bemapped to different musical instruments constitut-ing a virtual body-driven keyboard. Moreover, thecomputer can recognize hand and body gestures,which can trigger rhythmic or melodic changes in themusic. A graphical output is also generated from thecomputer vision estimates.30,31

It is an ideal example of a responsive experience withone-to-one mappings between sensory input—thedancer’s hand, feet, head, or center-of-body move-ments—and system output (music and graphics).

The computer-generated music consists of a richlytextured melodic base tune playing in the backgroundfor the duration of the performance. As the dancerenters the space, a number of virtual musical instru-ments are invisibly attached to her body. The dancerthen uses her body movements to generate an im-provisational theme above the background track. Inthe current version of DanceSpace, the dancer hasa cello in her right hand, vibes on her left hand, andbells and drums attached to her feet. The dancer’shead works as the volume knob, decreasing the soundas she moves closer to the ground. The distance fromthe dancer’s hands to the ground is mapped to thepitch of the note played by the musical instrumentsattached to the hands. Therefore, a higher note willbe played when the hands are above the perform-er’s head and a lower note when they are near herwaist. The musical instruments of both hands areplayed in a continuous mode (i.e., to go from a lowerto a higher note, the performer has to play all theintermediate notes). The bells and the drums are,on the contrary, one-shot musical instruments trig-gered by feet movements. More specific gestures of

ALGORITHM

Pfinder uses a multiclass statistical model of colorand shape to segment a person from abackground scene, and then to find and track aperson’s body parts in a wide range of viewingconditions. It adopts a “Maximum A Posteriori”probability approach to body detection andtracking, using simple 2-D models. It incorporatesa priori knowledge about people, primarily tobootstrap itself and to recover from errors.

Pfinder builds the scene model by first observingthe scene without anyone in it. When a humanenters, a large change is detected in the scene,which cues Pfinder to begin constructing a modelof that person, built up over time as a dynamic“multi-blob” structure. The model-building processis driven by the distribution of color on theperson’s body, with blobs added to account foreach differently colored region. Separate blobs aregenerated for the hands, head, feet, shirt, and pants.The building of a blob-model is guided by a 2-Dcontour shape analysis that recognizes silhouettesin which body parts can be reliably labeled.

The computer vision system is composed ofseveral layers. The lowest layer uses adaptivemodels to segment the user from the background,enabling the system to track users without theneed for chromakey backgrounds or specialgarments, while identifying color segments withinthe user’s silhouette. This allows the system totrack important features (hands) even when theyare not discernible from the figure-backgroundsegmentation. This added information makes itpossible to deduce the general 3-D structure of theuser, producing better gesture tracking at the nextlayer, which uses the information fromsegmentation and blob classification to identifyinteresting features: bounding box, head, hands,feet, and centroid. These features can berecognized by their characteristic impact on thesilhouette (high edge curvature, occlusion) and (apriori) knowledge about humans (heads are usuallyon top). The highest layer then uses thesefeatures, combined with knowledge of the humanbody, to detect significant gestures and movements.If Pfinder is given a camera model, it also back-projects the 2-D image information to produce 3-Dposition estimates on the assumption that a planaruser is standing perpendicular to a planar floor.Several clients fetching data from Pfinder can beserviced in parallel, and clients can attach anddetach without affecting the vision routines.


both hands or combinations of hands and feet cangenerate melodic or rhythmic changes in the ambi-ent melody. The dancer can therefore “tune” the mu-sic to her own taste throughout the performance. Themusic that is generated varies widely among differ-ent performers in the interactive space. Neverthe-less, all the music shares the same pleasant rhythmestablished by the underlying, ambient tune and astyle that ranges from “pentatonic” to “fusion” or“space” music.

As the dancer moves, her body leaves a multicoloredtrail across the large wall screen that comprises oneside of the performance space. The color of the trailcan be selectively mapped to the position of thedancer on stage or to more “expressive” motion cuessuch as speed. The graphics are generated by draw-ing two Bezier curves to abstractly represent thedancer’s body. The first curve is drawn through co-ordinates representing the performer’s left foot,head, and right foot. The second curve is drawnthrough coordinates representing her left hand, cen-ter of her body, and right hand. Small three-dimen-sional (3-D) spheres are also drawn to map ontohands, feet, head, and center of the body of the per-former. They serve as a reference for the dancer andaccentuate the stylized representation of the bodyon the screen. The multicolored trail represents thedancer’s virtual shadow, which follows her aroundduring the performance. The variable memory of theshadow allows the dancer to adjust the number oftrails left by the dancer’s body. Hence, if the shadowhas a long memory of trails (more than 30), thedancer can paint more complex abstract figures onthe screen (Figures 5 and 6).

The choreography of the piece varies according towhich of the elements in the interactive space thechoreographer decides to privilege. In one case, thedancer might concentrate on generating the desiredmusical effect; in another case or in another momentof the performance, the dancer may concentrate onthe graphics, i.e., painting with the body; finally, thedancer may focus on the dance itself and let Dance-Space generate the accompanying graphics and mu-sic autonomously. When concentrating on musicmore than dance, DanceSpace can be thought of asa “hyperinstrument.” Hyperinstruments32 are mu-sical instruments primarily invented for people whoare not musically educated but who nevertheless wishto express themselves through music. The computerthat drives the instruments holds the basic layer ofmusical knowledge needed to generate a musicalpiece.

The philosophy underlying DanceSpace is inspiredby Merce Cunningham’s approach to dance and cho-reography.33 Cunningham believed that dance andmovement should be designed independently of mu-sic, which is subordinate to the dancing and may becomposed later for performance, much as a musicalscore is in film.

In the early 1980s, Ann Marion pioneered work ongenerating graphics through body motion in danceat the Massachusetts Institute of Technology (MIT)by use of polhemus sensors to track ballet steps.34

Rolf Gehlhaar35 has built a number of sound spaceswhere multiple users generate soundscapes throughfull body motion. Calvert et al.36,37 have explored avariety of musical and graphical systems for inter-

Figure 5 Graphical output of DanceSpace


active dance composition. Researchers at GeorgiaInstitute of Technology38 and its Center for the Arts,together with the Atlanta Ballet, are also involvedin a dance and technology project. They use activesensing technology by placing sensors on the danc-er’s body to track the performer’s movement on aninteractive stage. DanceSpace differs from the afore-mentioned examples of interactive dance spaces be-cause the computer-vision tracking system providesreal-time information about different body parts ofthe dancers and not just an estimate of gross motionof the performer. Moreover, as opposed to other en-vironments, it does not require the dancer to wearspecial clothes or active sensors. Anyone can justwalk in the space and generate sound and graphicsby body movements or gestures. The principal draw-back of DanceSpace is that currently the computer-vision-based sensing technology reliably tracks onlyone performer at a time.

DanceSpace was first completed in February 1996.It has been tried by a large number of people andperformers during several demonstrations at the MITMedia Laboratory, including a one-day open housewith people of all ages. Semi-professional dancersfrom Boston Conservatory have choreographedshort pieces for the interactive stage under the su-pervision of choreographer Erica Drew (Figure 7).During these performances, the choreographer madean effort to enhance or underline the expressivenessof the human body as opposed to the “coldness” ofthe musical and graphical output by the computer(her words). The dancers were fascinated by the col-ored virtual shadow that followed them on stage andsoon modified their pieces so as to better exploit the

“comet” effect of the computer graphics trails. Non-performers who attended the open house seemedto be more interested in exploring the space to ob-tain a desired musical effect.

We also used DanceSpace to create an original cho-reography together with choreographer Claire Mal-lardi at Radcliffe College, Harvard Extension, in the

Figure 6 Performer using DanceSpace during rehearsal

Figure 7 Performer Jennifer DePalo in DanceSpace


spring of 1996 (Figure 8). In this case we had pre-recorded the virtual shadows generated by onedancer at the Media Lab in the IVE stage, and thenprojected them—noninteractively—on the dancers’bodies during the performance (Figure 9). Since thedancers were wearing white unitards, their bodieswere virtually painted by the projected computergraphics image. As a consequence, they changed thechoreography of their piece to have more still-bodyposes to better exploit this body-painting effect. An-other interesting effect occurred on the backdrop ofthe stage where a parallel performance was takingplace: The dancers’ black shadows were dancing inreal time with the colored shadows generated by thevirtual dancer (not on stage). The public reacted pos-itively and encouraged us to further explore this typeof mixed-media performance.

Further improvements to be made to DanceSpaceinclude having a large number of different back-ground tunes and instruments available for thedancer to use within the same performance. We arecurrently expanding DanceSpace to allow for agreater variety of musical mappings and differentgraphical representations of the dancers.

The virtual studio dance stage. Virtual studios arenew forms of TV video productions that combine real

foreground images shot in the studio with 3-D com-puter-generated background scenes. Current virtualstudio systems need the following: cameras shoot-ing the foreground action against a blue background,tracking to provide camera position information, ren-dering to generate images with correct perspectivein the virtual scene, and z-mixing to layer backgroundand foreground correctly with respect to depth in-formation.

Research in virtual studios is closely related to thework of building immersive and interactive virtualenvironments in which people can interact with vir-tual objects, with animated 3-D creatures, or amongthemselves. Ultimately these two venues of researchconverge toward the creation of low-cost, real-time,virtual studios in which the 3-D set is an interactivevirtual environment and the compositing of multi-ple participants in the same set opens up new pos-sibilities for TV and stage production, game playing,collaborative storytelling, or immersive telepresence.

By using the real-time computer vision techniquespreviously described (Pfinder), we have built a low-cost virtual studio that does not need a blue screento operate. Hence, such a setup can be used on stageand allows one to perform special effects that aretoday only possible in the TV studio. In addition, weuse computer-vision-based real-time gesture recog-nition to enable synchronized interaction among theperformers and the objects and creatures in the vir-tual setting.

The type of dance performance we designed is onein which the entire stage becomes a large-scale IVEspace with a very large projection screen for com-positing. As the dancers move on stage, their imageis composited in real time in a virtual 3-D set on theprojection screen behind them, and the music is gen-erated simultaneously. We modeled an enchantedforest to be the virtual set of our dance performance(Figures 10 and 11). Although our first applicationof virtual sets is one with circus performers39 (shownlater in Figures 15 and 16), we conducted an exper-imental study of a dance performance that virtuallytakes place in an enchanted forest (Figures 12 to 14).In this type of performance, the public sees the per-formers dancing inside the virtual 3-D set made bythe enchanted forest and also, in conjunction withthe dancers’ movements, sees virtual butterflies andrabbits following and reacting to the performers,scale transformations, and animated trees dancingalong, all of which contribute to create an Alice-in-Wonderland or Disney’s “Fantasia” effect on stage.

Figure 8 Performer Diana Aubourg in DanceSpace


Figure 9 Performance at Radcliffe College with choreographer Clair Mallardi and dancers: Malysa Monroe and Naomi Housman


Although real-time interpretation of the perform-ers’ actions is essential for interactivity, authoring atimely, appropriate, and coordinated response by thevariety of virtual objects on the set can be a complextask. In order to facilitate authoring and to ensurea timely interaction, we have endowed the virtual ob-jects and creatures that inhabit the 3-D studio set withautonomous responses and behaviors. By using amixture of purely responsive and behavior-based AItechniques, we are able to distribute the authoringcomplexity from a central program to the various ob-jects or characters in the set. Objects or creaturesin the set can respond appropriately according to thecontext of interaction without our having to scriptin advance all the possible combinatorics of time andaction sequences on the virtual set.

Our virtual studio dance stage is based on specificfeatures of Pfinder that we now describe. The pro-cess of detection and tracking of the human partic-ipant is guided by a two-dimensional (2-D) contourshape analysis that recognizes a body silhouette in-side which single body parts can be reliably labeled.This silhouette is used to cut out the foregroundvideo image of the person composited in the 3-D set.This process, which achieves the compositing with-out blue screen, is done in two steps. First, a supportgraphical silhouette is constructed inside the 3-D envi-ronment, and successively the corresponding videoimage is texture-mapped onto it. Z-mixing is pro-vided by giving Pfinder a camera model of the “homestudio.” The system back-projects the 2-D imagetransformation to produce 3-D position estimates us-ing the assumption that a planar user is standing per-pendicular to a planar floor. Back-projection ensurescorrect perspective as well as correct depth layer-ing. Given that our virtual studio uses a fixed cam-era, we do not address the camera tracking issue.

Networked Digital Circus. Although many interac-tive experiences and video games tend to focus onexploring new worlds and killing monsters, we havedeveloped a virtual studio application based on thetheme of transformations. Participants are engagedin a game of transformations that involves the ap-pearance or disappearance of objects, scaling the im-age of other participants very large or small, trigger-ing events such as firing a cannon woman from avirtual cannon, or interacting with butterfly creaturesthat inhabit the set. All of these actions are inter-preted and guided by the real-time gesture recog-nition and compositing features of our computer vi-sion system as well as the responsive and behavioralauthoring of the virtual set. We believe it is impor-tant to develop, together with the new technology,new genres of collaborative experiences that offerroom for exchange, communication, and maybetransformation. Digital Circus moves some stepstoward a more educational and artistic genre of net-worked game playing.39

We see the circus as a communication process withits own language. In the circus, information is trans-mitted mainly in a nonverbal, iconic, and symbolicway. A circus performer can give his or her act inany country, without the need to know much aboutthe language or the culture of that particular coun-try. In this sense he or she is a transcultural per-former. Moreover, although the circus is usually con-nected to childhood, the circus performer usually

Figure 10 The enchanted forest world view

Figure 11 Surroundings of the cabin in the enchanted forest


addresses (and enchants) everyone, without any dis-tinction of age.

The Digital Circus is composed of a basic architec-tural setting: a big top, a circus cannon, a chair, anda gramophone, together with a series of magic ob-jects that appear or disappear according to the needsof the participant. This digital environment is shownon a large screen that comprises one side of our homevirtual studio. The audience experiences the circusby seeing its video image projected onto the virtualenvironment. All the participants’ images are shownin the virtual environment.

We start our game with our basic transformationalcircus grammar by placing a chair in the real space.This chair projects symmetrically onto a multicol-ored virtual chair in the Digital Circus. When a per-son from the audience sits on the real chair, he or

Figure 12 Study for a composited dance performance in real time for the virtual studio dance stage

Figure 13 Study for a composited solo performance in the enchanted forest

Figure 14 Performers dancing around the magic rock in the virtual enchanted forest


she will see his or her video avatar sitting on the vir-tual chair. The system then recognizes the sitting ges-ture and triggers “magic events.” A gramophone sud-denly appears and starts to play music. If the personstands up again, the gramophone disappears, and themusic stops.

This intrinsic transformational nature of the circusmakes the circus story similar to an “open-eyesdream.” In particular, the inversion of cause-effectrelations, the “functional sliding” of objects, as well

as the iconic-symbolic representation of most ele-ments, recalls the “logic of dreams.” To convey thispoint, we have programmed a transformation ruleinto the system that pops up an umbrella when amember of the audience raises a hand, as though heor she was actually holding an umbrella. Our systemalso recognizes a flying gesture that virtually lifts theaudience up in the clouds, operating a digital plane.This gesture operates literally as a 3-D scene cut thatradically transforms the environment and drags theaudience into another dream.

THE LANGUAGE OF THE CIRCUS

The language of the circus is essentially transforma-tional. Objects have a meaning according to their rolein the circus act. An elephant might use a phone, playa musical instrument, or eat a meal at the table, like ahuman. A clown produces sequences of inconsistentbehavior. Similarly, the basic rules of balance arechallenged or denied. When a circus object appears,it brings a whole set of possible meanings. This setdepends on the “encyclopedia” of the audience,whereas its actual meaning is chosen depending onits role in the circus act. An example of this“functional sliding” of the meaning of circus objects isgiven by the chair. It is a defense tool for a lion tamer;it can be a “character” or a musical instrument for aclown; it is a co-performer for a tightrope walker thatplaces it on a rope and then sits on it. When a chair ispresented just as a chair, it is usually for horses andelephants to sit on and it functions as a humanizingfactor for the animals.

The circus is a transgressive territory: transgressionof laws of ordinary physics by the acrobats (ortransgression of the laws of physics for ordinarypeople), transgression of politeness by the clowns,transgression of perception by the magicians,transgression of roles by the animals. Normally, lionsare not mild, donkeys do not do arithmetic, ordinarypeople do not walk on tightropes, elephants do noteat at the table, objects or people do not disappear orappear or transform in an instant, people do not throwcakes at each other’s faces if they disagree. In thisrespect, the cartoon is a genre close to the circus.There are a specific cartoon physics, cartoon senseof humor, and cartoon description of a character.Cartoon characters are close to humans but not quitehumans. They have magic powers, follow laws ofphysics that violate regular physics, and usually

interact in a way that would be judged violent orimpolite in our regular way of interacting with otherpeople. This process of transformation, inversion,parody, and caricature of the mainstream culture andits objectives and values produces an ambivalentattitude toward the circus. A strong interest is oftenblended with a scornful smile. For this reason, thecircus is usually mostly appreciated by thoseindividuals that have not been completely integratedwithin a culture, but instead are marginal, such aschildren, artists, or poets. These are among thereasons that make the circus an interestingplayground for our research: its structural transforma-tional nature and its elected audience. In this respectwe believe it can appeal to an audience at least aslarge as that of the current consumers of videogames,if not larger. Its appeal would also be based on thefact that, along with Bouissac,40 we see the circus notas a collection of acts but as a set of transformationrules that can generate a potentially infinite series ofacts.

Once we have established that the circuscommunicates through its own gestural language, westill need to ask what is the narrative structure of acircus act and what is its relationship with theaudience in the process of co-construction ofmeaning. Most circus acts evolve through successivesteps that bear close resemblance to the developmentof the folktale.40 They are, with some variance:identification of the hero (performer), qualification trial,principal trial, glorification trial (usually preceded bydrum playing), and final recognition from the public.We have so far established a grammar of interactionamong participants and among participants and thevirtual set. Our current work is aimed at endowing thedigital circus with a more robust story architecture.


Two people in the circus can “play the clown” witheach other (Figure 15). The first one who touchesa “magic box” and then lifts or lowers his or her armscauses the other participant to grow very tall or veryshort. When short, the participant needs to hide soas not to be kicked by the other clown, otherwise heor she is transformed into a ball. A virtual butterflycan “save” the small clown and bring him or her backto original size. If one participant is holding the um-brella and the other touches the magic box, the um-brella lifts the person into the air. Touching the fly-ing butterfly will bring the participant back to theground. Each participant has his or her own butter-fly to help when needed. In order to gain more magicpowers or to confuse the other player, a participantcan create one or more clones of himself or herselfon the set. A clone is an exact replica of the com-posited image of a participant and moves along withthe “main” image of the person, but at a differentposition in the 3-D set.

As of now, the main character of our circus is a can-non woman (Figure 16). She is fired out of the cir-cus cannon simply by pushing a button situated onthe cannon. She allows us to reach a wider audiencethan the one physically gathered around the homevirtual studio. She is fired from the cannon in theDigital Circus and lands on the World Wide Web(WWW) on an applet that can be viewed by other net-worked participants. Through characters like the can-non woman, who are distributed across the differ-ent virtual networked home studios, we hope togather an active audience of collaborating and en-gaged participants.

In order to create a compelling experience, it is im-portant not only to design a visually rich 3-D set, butalso to give participants the ability to modify and in-teract with the virtual objects or synthetic creaturesinhabiting the set. Authoring an interactive game formultiple participants can be a very complex task, un-less an appropriate methodology of modeling inter-action is used. Scripted systems are those in whicha central program rigidly associates people’s actionsor inputs with responses of the system by keepinga careful accounting of timing constraints and se-quencing. Such systems are in general to be avoidedbecause authoring complexity grows very fast if a va-riety of interactive features are implemented. In ad-dition, they do not allow for spontaneous and un-expected actions or interactions to happen and aretherefore inadequate for many real-life situationssuch as “live shows” or interactive games.

We have bypassed the complexity of scripted systemsby building a responsive application that is definedby a series of couplings between the participant’s in-put and the response of the system. Responsive ap-plications are easier to author because they do nothave a central program that takes into account allthe sequenced inputs, actions, and outputs. They de-fine instead a geography of one-to-one mappings forwhich the participant’s actions trigger specific sys-tem responses. Although easier to author, thesesystems do not take into account complex story

Figure 15 Two remotely connected participants in the circus with their respective butterflies

Figure 16 The circus virtual studio with participant pointing toward the cannon woman


structures and elaborate timing or sequencinginformation.

All the objects present in the circus are embeddedwith their own set of responses—the cannon, the um-brella, the chair, the gramophone, the cannonwoman, and even the set itself—and participate au-tonomously in the game of transformations, appear-ances, and disappearances regulated by the partic-ipants’ actions. We have authored the virtualbutterfly character instead by using a behavioral ap-proach. In particular, the butterfly senses the size ofeach person’s image and location on the set.

Improvisational TheaterSpace. In April 1996 we cre-ated the Improvisational TheaterSpace.30 Improv-isational TheaterSpace provides us with an idealplayground for an IVE-controlled stage in which em-bodied human actors and media actors generate anemergent story through interaction among them-selves and the public. An emergent story is one thatis not strictly tied to a script. It is the analog of a“jam session” in music. Like musicians who play, eachwith their unique musical personality, competency,and experience, and create a musical experience forwhich there is no score, a group of media actors andhuman actors perform a dynamically evolving story.Media actors are used to augment the play by ex-pressing the actor’s inner thoughts, memory, or per-sonal imagery, or by playing other segments of thescript. Human actors use full body gestures, tone ofvoice, and simple phrases to interact with media ac-tors. Among the wide variety of theater styles andplays, we have chosen to stage improvisational the-ater. This genre is entertaining and engaging and al-lows the audience to drive part of the play. An ex-perimental performance using the IVE setup wasgiven in late February 1997 on the occasion of theSixth Biennial Symposium on Arts and Tech-nology41 (Figure 17). We also did a public rehearsalat the MIT Media Lab on the occasion of a DigitalLife Open House on March 11, 1997 (Figure 18).Both featured improvisational actress Kristin Hall.

We conceived a theatrical situation in which a hu-man actor could be seen interacting with his or herown thoughts appearing in the form of animated ex-pressive text projected onto a large screen on stage.We modeled the text to be just like another actor,able to understand and synchronize its performanceto its human partner’s movements, words, tone ofvoice, and gesture. In a series of very short plays, weshowed an actress in the process of interrogating her-self in order to make an important decision. A me-

dia actor in the form of projected expressive text playsher “alter ego” and leads her to a final decision. Thetext actor has sensing abilities: It can follow the useraround on stage, it can sense a set of basic gestures,and it understands simple words and sentences. Itsexpressive abilities include showing basic moodsthrough typographic behaviors, such as being happy,sad, angry, or excited.

The stage has a large projection screen along oneside and a small color camera aimed at the entireacting space. Using the video image from the cam-era, the screen is transformed into a mirror that alsocontains a superimposed expressive text actor. Acamera and a microphone are used by the typo-graphic actor as sensors for the real world and tointerpret the performer’s actions. The audience gath-ers around the space. The video image of the actorand typographic actor can eventually be broadcastto a larger audience. Any member of the audiencecan step onto the stage and take the place of the hu-man performer to interact with the typographic ac-tor at any time during the performance. Other sce-narios can be envisioned in which performers andpublic play improvisational theater games with thedigital actors.42

The idea of conveying emotions through typographicattributes of the text has been largely explored ingraphic design.43–45 The idea of introducing text asa character on stage was inspired by readings ofBakhtin46 and Vygotsky.47 Both authors have under-lined the role of a dialogical consciousness. ForBakhtin our utterances are “inhabited” by the voicesof others: We construct our utterance in the antic-ipation of the other’s active responsive understand-ing. Vygotsky viewed egocentric and inner speechas being dialogic. We have staged theater work inwhich the text takes the active role of representingdialogically the character’s inner thoughts while heor she is constructing an action or making a deci-sion. Also, the text can represent the different lay-ers of the anticipated response of the audience and,in special cases, the audience itself. By taking thisapproach, we investigate further along the directiontraced by Brecht who, at crucial points of the per-formance, asked the audience to comment about acharacter’s decisions and motivations. The text ac-tor can also be thought of as a modern equivalentof the chorus in ancient Greek tragedy.48 Just as inthe ancient Greek theater, the chorus, or text actor,engages in a dialog with the “hero,” represents thepublic’s collective soul or thoughts, and comments,


interprets, and highlights the actor’s choices in theplot.49

Early examples of introducing text as part of a the-atrical performance can be found in the Futurist the-ater. In 1914, Giacomo Balla wrote a futuristic per-formance piece called “Printing Press” (MacchinaTipografica).50 In this theater piece, each of the 12performers became part of a printing press machineby repeating a particular sound. Their movementwould also reproduce the movement of the machine.The performance was to take place in front of a dropand wings that spelled out the word “TIPOGRAFIA”(typography) in large black letters.

The software architecture that models the typo-graphic actor has two main layers. One layer groupsthe basic skills of the actor, such as simple unitaryactions it can perform. The above layer groups high-level behaviors that make coordinate use of the low-level skills to accomplish a goal. This type of archi-tecture, which separates low- and high-level skills foranimating synthetic creatures, was first introducedby Zeltzer3 and Blumberg.5

The typographic actor also has a level of energy thatcan be high, low, or medium. Text draws energy fromthe user’s speed of movement and loudness ofspeech. It spends energy while performing basicskills. Emotional states affect the manner of the pre-

Figure 17 Actress Kristin Hall during a performance at the Sixth Biennial Symposium on Arts and Technology

Figure 18 Improvisational performer Kristin Hall in the IVE stage, at the MIT Media Lab, during the 1997 Digital Life Open House


sentation by picking an appropriate time functionfor execution (which function), and the energy level

of the typographic actor determines the speed of ex-ecution of the basic skills (how fast).

Examples of use of the text actor are: a word or shortsentence rapidly crossing the projection screen at theheight of the performer’s head to show a thoughtcrossing the performer’s mind, a word jumping fromone hand to the other of the performer to show athought wanting to penetrate the performer’s con-sciousness, text following the performer around onstage to show an obsessive thought, rapidly flashingwords above the performer’s head to show a varietyof contrasting simultaneous thoughts, and animatedtext playing the alter ego of the performer, eventu-ally driven by the public.

Ongoing progress is directed toward a more accu-rate IVE-based gesture recognition, exploring thechallenges and advantages of multimodal interactionand rehearsing a variety of multibranching improv-isational plays according to the suggestions of theaudience. Inspired by Forsythe’s choreographies,51

we are also exploring modalities of use of the textactor in dance.

Through this project we learned that media actorsare a promising approach to innovative theatricalperformances for three main reasons:

1. Media actors (versus script-based theater) are aflexible tool, both in the case of the improvisa-tional (or street) theater in general, or for clas-sical scripted theater that the director and the ac-tors need to interpret and, therefore, modify.

2. The system is tolerant of human error and actu-ally encourages actors to enrich or change the per-formance according to the reaction of the audi-ence.

3. The system can scale from a performance spaceto an entertainment space. Behavior-based the-ater can allow public participation either duringor after the performance without requiring theparticipants to learn all the script in advance.

This approach allows the use of flexible media cho-reography and contrasts scripted or rule-based ap-proaches. The main drawback of scripted media isthat the director and the actor have to rigidly followa script for the system to be able to work. For in-stance, it is not uncommon in theater for both theactors and the director to change the script eitherduring rehearsals or even right before or during thefinal performance.52 In our view, a rule-based,scripted system lacks the responsiveness that creative

THE TYPOGRAPHIC ACTOR

Following are high-level behaviors for thetypographic actor: (1) say typical phrase, (2) attractattention, (3) show off, (4) entertain, (5) daydream,(6) tell a story or explain, (7) suggest what to donext, (8) follow actor (or any of his or her parts:head, hands, feet, center of body).

Behaviors coordinate the execution of basic skillssuch as: (1) set string, (2) read text from HTML(Web page), (3) read text from file, (4) set color,(5) set font.

Other basic skills occur through time and canhappen in different time scales or forms accordingto how the action is executed, based on whatemotion the typographic actor wishes to convey.These are: (6) fade to color, (7) glow, (8) scale,(9) jump, (10) goto, (11) rock.

Also, according to which behavior is acting, thetext can read in different ways: (12) read word byword, (13) fade word by word, (14) fade letter byletter.

The actions executed by the different skills can becombined as long as they do not use the samegraphical resources; i.e., for the color degree offreedom, “glow” cannot happen at the same timeas “fade to color.”

Time functions that will affect the way the basicskills are executed are: (1) linear, (2) quadratic, (3)quadratic decay, (4) inverse quadratic, (5) inversebiquadratic, (6) biquadratic, (7) sinusoidal, (8)quasi-sigmoid, (9) oscillating up, (10) exponential,(11) logarithmic, (12) hyperbole modulated by asinusoidal.

The emotional state of the text will affect howtime-dependent skills will be executed accordingto these time functions. Examples of emotionalstates of the text are: (1) happy, (2) sad, (3) angry,(4) scared, (5) disgusted, (6) interested, (7) loving,(8) surprised, (9) no particular emotional state.

For example, if the text is “surprised,” the activetime function will be “oscillating up.” In the case of“happy,” the biquadratic, exponential, andsinusoidal time function will be chosen duringexecution of time-dependent skills.


artists demand. A scripted system cannot easily com-pensate for human errors or be responsive whensome nonplanned “magic” between the actors hap-pens on stage. It tends to force human interpretersto rigidly follow a predefined track and therefore im-poverishes the quality of the performance.

Interactive museum exhibit design

Applications of technology to museums have so farmainly focused on making extensive and attractiveWeb sites with catalogs of exhibits. Occasionallythese Web sites also present introductory or com-plementary information with respect to what is showninside the physical space of the museum. However,unless the public is interested in retrieving specificinformation about an artist or artwork, they will endup spending time scrolling across photographs andtext in static pages and likely will not be involved inan engaging or entertaining experience. Presentinglarge bodies of information in the form of an elec-tronic catalog usually does not stimulate learning orcuriosity.

Museums have recently developed a strong interestin technology since they are more than ever beforein the orbit of leisure industries. They are faced withthe challenge of designing appealing exhibitions, han-dling large volumes of visitors, and conserving pre-cious artwork. They look at technology as a possiblepartner to help achieve a balance between leisureand learning, as well as to help them be more effec-tive in conveying story and meaning.

One of the main challenges that museum exhibit de-signers are faced with is to give life to the objects ondisplay by telling their story within the context de-termined by the other objects in the exhibit. Tradi-tional storytelling aids for museums have been pan-els and labels with text placed along the visitors’ path.Yet the majority of visitors express uneasiness withwritten information.53 Usually time spent reading la-bels interrupts the pace of the experience and re-quires a shift of attention from observing and con-templating to reading and understanding.53

Another challenge for museums is that of selectingthe right subset of representative objects among themany belonging to the collections available. Usually,a large portion of interesting and relevant materialnever sees the light because of the physical limita-tions of the available display surfaces.

Some science museums have been successfully en-tertaining their public, mainly facilitated by the na-

ture of the objects they show. They engage the vis-itor by transforming him or her from a passive viewerinto a participant by use of interactive devices. Theyachieve their intent by installing button-activateddemonstrations and touch-sensitive display panels,among other things, to provide supplementary in-formation when requested. They make use of prox-imity sensors to increase light levels on an object orto activate a process when a visitor is close by.

Other museums—especially those that have largecollections of artwork such as paintings, sculptures,and manufactured objects—use audiovisual mate-rial to give viewers some background and a coher-ent narrative of the works they are about to see orthat they have just seen. In some cases, they provideaudio tours through headphones along the exhibit.In others, they dedicate sections of the exhibit to theprojection of short audiovisual documentaries aboutthe displayed material. Often, these movies show-ing artwork together with a description of their cre-ation and other historical material about the authorand his or her times are even more compelling thanthe exhibit itself. The reason is that the documen-tary has a narration, and the visuals are well orches-trated and come with music and dialogs. The vieweris then offered a more unified and coherent narra-tion than what would be available in the fragmentedexperience of the visit. A visit to a museum demands,as a matter of fact, a certain amount of effort, knowl-edge, concentration, and guidance for the public toleave with a consistent and connected view of thematerial presented.

On the basis of the above observations, we have iden-tified two areas of technological intervention thatcontribute to engage the public and enrich its expe-rience during a museum visit. They are: InformationOverlay in Smart Rooms29 (adding technology to themuseum space) and Spatialized Interactive Narra-tive with Smart Clothes54 (adding technology to thevisitor). In this section we describe our museum workin the smart room/IVE space environment. The sub-sequent section will outline our museum work in-volving wearable computers (smart clothes).

Louis Kahn interactive: Unbuilt Ruins. By using avariety of wireless sensors (primarily cameras) andby placing audiovisual devices (projectors and speak-ers) in the museum area, we can use technology tovirtually enlarge and augment the exhibit surface. Itis then possible to select and show more objects fromthe ones available in the collection—in the form ofimages that can be virtually layered on one another


and selected by meaningful interaction with the pub-lic.

In January 1999, we created a museum experienceto satisfy both the needs of the curator, who needsto be able to feature a great quantity of material ina limited physical space, and those of the viewer, whobenefits from a coherent narration of the spatiallyfragmented objects on display. Prompted by archi-tect Kent Larson and designer Ron MacNeil, we de-signed an exhibit space, called Unbuilt Ruins, to showa variety of architectural designs by the influential20th-century American architect, Louis Kahn. Theexhibit interactively features radiosity-based, hyper-realistic computer graphics renderings of eight un-built masterworks by Louis Kahn. It is housed in alarge room, approximately 50 ft 3 50 ft, and in thecenter contains a square table above which are

mounted a camera and a projector pointing down-wards. The table is surrounded by four large back-projection screens, parallel to the four sides and sit-uated a few feet away to provide an ideal viewingsurface. The table surface contains the projection ofthe image of the schematic floor plan of one of theeight projects and 3-D printed physical models of eachof the projects, on bases 3 in 3 3 in, located two toa side, at the perimeter of the table (Figure 19). Eachmodel contains a digital tag that identifies it.

The schematic plan projected on the table containsgraphics symbols indicating camera positions for theselected projects (hot spots). A standard color cam-era aimed at the table, linked to a computer run-ning a real-time computer vision system, tracks a col-or-tagged cylindrical object (active cursor). A viewis selected when the viewer places the active cursor

Figure 19 Schematic diagram of the Unbuilt Ruins exhibit

ARCHIVAL MATERIAL20-INCH MONITOR

ARCHIVAL MATERIAL20-INCH MONITOR

54 INCH X 54 INCH TABLE COMPUTER VISION INTERFACE

2 PROJECTORS

2 PROJECTORS

2 PROJECTORS 2 PROJECTORS

ENTRANCE

TEXTPANEL

REAR PROJECTION SCREEN

15-MINUTE VIDEO


over a camera position symbol. When a view is se-lected, the side screens show a rendering of what avisitor would see if he or she were standing insidethe physical construction in the location determinedby the position of the symbol on the schematic planand looking toward the direction indicated in the po-sition symbols (Figures 20, 21, and 22). If no viewis selected within one minute, the system automat-ically scrolls through all views of the selected proj-ect until a new view or architectural plan is chosen.

With the eight unbuilt projects presented in the ex-hibit, Kahn developed and tested ideas quite differ-ent from the modern architecture of his time: a con-figuration of space as discrete volumes, complexambient light and deep shadow, a celebration ofmass, the use of materials with both modernist andarchaic qualities, monumental openings uncompro-mised by frames, and a concept of “ruins wrappedaround buildings.” By looking in depth at projectsleft unbuilt, the exhibit attempted to shed light onhow Kahn came to develop the architecture he did.

Since these eight projects were incomplete and sche-matic, the design of each had to be resolved by ex-trapolating from archival material at the Kahncollection of the Architectural Archives at the Uni-versity of Pennsylvania. A series of three-dimensionaldigital stage sets were created, and photographs fromKahn’s Salk Institute, Exeter Library, and Yale Cen-ter for British Art were used to develop high-res-olution texture map montages for each unique sur-face plane. Essential to the accurate interpretationof these projects was the physically accurate treat-ment of light. Images were rendered with the Light-scape** visualization system, a radiosity-based ren-derer, to capture the complex inter-reflections ofambient light in space.

The exhibit was open for one month at the Comp-ton Gallery at MIT in February 1999, and later in theyear for another month at the University of Penn-sylvania’s gallery exhibit space. It received enthusi-astic feedback from the visiting public, architects, cu-rators, and classrooms.

Unbuilt Ruins was also used as a model for a Mu-seum of Modern Art (MoMA) exhibit called “The Un-private House” curated by Terence Riley and shownat MoMA in the summer of 1999. We built an initialprototype in which we adapted the original setup we

had at the MIT Compton Gallery to display archi-tectural plans of houses projected on a table. A cus-tom tag system, developed at MIT, was used to selectwhich architectural plan would be explored by thevisitor at one time. The tags were embedded insidepostcards of the displayed houses, and when placedon the table, they would be detected, and the cor-responding floor plan would be projected. The com-puter vision system tracked three objects in the formof small dolls. When visitors placed an object overa hot spot on the plan, they triggered the system todisplay photographs showing what would be seenfrom that point in the house, as well as text that wouldfunction as a guide for the exhibit. The three dollsrepresented the point of view of the curator, the ar-chitect of the displayed house, and the owner. Eachof these dolls placed on a hot spot told the story ofthe exhibit from a slightly different viewpoint. Thecommunication between the tag reader, the com-puter vision software, and the graphical display soft-ware was networked such that more communicat-ing tables with displays would be able to exchangeinformation, or be aware of the behavior or explo-ration of the other visitors to this exhibit (Figures23 and 24).

Figure 20 Projected views from one hot spot from the architectural plan of the Meeting House of the Salk Institute, La Jolla, California, 1959–1965

FRONT

REAR

LEFT

RIG

HT


Figure 21 Images of the Unbuilt Ruins exhibit taken at the Compton Gallery at MIT

Figure 22 Visitors placing the active cursor on a hot spot on the map and discussing the displayed views

Figure 23 Views of one of the chosen houses for the prototype of “The Un-private House” interactive exhibit at MoMA


Wearables for museums and performance

A short time and technological leap separates today’sthin laptops from Walkman**-sized computers thatcan be easily carried around by individuals at alltimes. Such small computers, accompanied by a high-resolution private eye for display and an input de-vice, are already reshaping our technological land-scape as they allow us to wear technology just as anelement of our everyday clothing. Yet the conse-quences of offering an uninterrupted connection tothe Internet, a TV or computer monitor embeddedin eyeglasses, as well as a variety of input, sensing,and recording devices constantly active and attachedto our body, is likely to have a profound impact onthe way we perceive and relate to the world. Togetherwith, and in addition to, our five senses we will soonhave access to an additional dimension that will makeus see, remember, access information, monitor ourhealth and time schedule, and communicate in waysin which we had not thought possible before.Starner,55 Pentland,54 and Mann56 describe earlywork in wearable computing and anticipate some fu-ture trends.

In this section we describe how “wearables” can beused in enriching the museum visit and enhancingthe expressive capabilities of theater performers. Westart by illustrating Wearable City, the precursor ofall our wearable work. We then describe WearableCinema, which generates an interactive audiovisualnarration driven by the physical path of the wearerin a museum space. Wearable Performance illus-trates the wearable version of Improvisational The-ater Space. All this work benefits from the mediaactors authoring technique described earlier.

Wearable City. Wearable City is the mobile versionof a 3-D WWW browser we created, called “City ofNews.” City of News57 fetches and displays URLs (uni-form resource locators) so as to form skyscrapers andalleys of text and images that users can visit as if theywere exploring an urban landscape of information(Figure 25). The system is given a map of a knowncity at the start. When browsing, the text and imagesfrom Web pages are remapped into the facades ofthe virtual buildings growing from their footprint onthe city map. The city is organized in districts, whichprovide territorial regrouping of urban activities.Similar to some major contemporary cities, there isa financial district, an entertainment district, and ashopping district. One could think of these districtsas urban quarters associated with the different con-

ceptual areas of one of the many currently availablesearch engines on the Internet.

To date we have grown cities of news from maps ofBoston, New York, Stuttgart in Germany, theSIGGRAPH 99 floor map, and a few imaginary loca-tions. The 3-D browser operates in three stages. Atthe start, the system analyzes the geometry of thegiven city map and saves it in memory. Then, whenbrowsing, a parser fetches all the data from the tar-get URL, using a TCP/IP (Transmission ControlProtocol/Internet Protocol) socket and extracts allthe available page layout information. The parserrecognizes a large set of HTML (HyperText MarkupLanguage) tags and builds a meta-representation ofthe page that contains the text and image data in onefield and its formatting information in the associatedfield. This meta-representation is passed on to thegraphical engine. The engine maps the text and im-ages into graphics and textures, and reformats thisinformation according to the real estate available

Figure 24 Floor plan of chosen house

1 Entry2 Shorthand Foyer/Study3 Living/Dining Area4 Kitchen/Shorthand Pantry5 Laundry6 Garage

0 2 4 8 10


from the assigned location on the map. The loca-tion is chosen automatically by an order defined onthe map or manually by the user who makes a mean-ingful association with the given architecture. Thesoftware is written with the Open Inventor** graph-ics toolkit and C11 and runs under the Linux** op-erating system. A wireless network connection en-sures a constant link to the WWW within its range.

We showed an early version of Wearable City at theSIGGRAPH 99 Millennium Motel demonstrationfloor58 (Figure 26). A large number of people triedthe system during the one-week exhibit and ex-pressed considerable interest and curiosity.

The wearable is a jacket that has an embedded CPU,a sensing system for location identification, high-res-olution color head-mounted display glasses, and atouch-sensitive threaded keypad as input (Figure 27).The jacket is a commercial denim long-sleevedjacket, decorated with an illuminated pattern on theback as embellishment.

In order to run high-end graphics on the wearablein 3-D, we have chosen to use an off-the-shelf CPUrather than a custom-made one. We selected aToshiba Pentium** II, 366-MHz ultra-thin laptopand detached the liquid crystal display (LCD) screenand all unnecessary parts to minimize its weight. We

Figure 25 City of News, a 3-D Web browser that dynamically builds an urban landscape of information

Figure 26 Browsing using a “memory map”–the wearable plays movie clips interactively according to the wearer’s pathat the Millennium Motel of SIGGRAPH 99


designed a custom lining to attach the CPU to thejacket, internally, on the back. We also added twoside pockets to the lining to carry the batteries forthe display and other devices when necessary.

The input system is a compact keyboard (Figure 28),sewn on one sleeve and made with touch-sensitiveconductive thread. The keyboard is designed for gen-eral use with wearables, and contains as keys thenumbers 0–9 and symbols representing VCR (video-cassette recorder) controls, which can all be easilyremapped for use in a variety of applications.

The display consists of commercial lightweight SVGA(super video graphics array) color head-mountedglasses (Figure 29), sold as non-see-through for im-mersive viewing. We modified the display for aug-mented reality applications by manually removingall the electronics from one eye (Figure 30). Whenwearing the display, after a few seconds of adapta-tion, the user’s brain assembles the world’s image ofone eye with the display’s image seen by the othereye into a fused augmented reality viewing.

The location system is made by a network of tiny in-frared devices that transmit a location identificationcode to the receiver worn by the user and attachedto the jacket. The transmitters are approximatelycoin-sized and are powered by small lithium batter-ies that last for about a week. They are built arounda PIC** microcontroller, and their signal can be de-tected as far away as about 14 feet within a cone rangeof approximately 30 degrees.

Wearable Cinema. We envision and work towardmaking the museum visit indistinguishable fromwatching a movie or even a theater play. This movieis slightly different from the ones we are accustomedto: It unfolds a story for us as we wander around themuseum space but is as engaging and immersive astraditional movies. Interactive movies have been alargely explored topic of research in multimedia andentertainment since the last decade. Davenport didearly work that influenced this research field.59,60 Yetmany problems, such as the type of input device, thechoice of breakpoints for interaction, and the frag-mented and therefore nonimmersive experience thatresults from interaction, are still unsolved. In somecases, a multithreaded plot appears unjustified andunsatisfactory, especially for fiction, because peopleexpect to see a “meaningful conclusion”—the moralof the story—and a well-edited visual narrative, bothof which are hard to do interactively.

The museum context provides a great platform ofexperimentation for interactive documentaries, sincethe interactive input is “naturally” and seamlesslyprovided by the visitor’s path inside its aisles. Thus,it does not require pauses, breakpoints, or loops inthe content presentation. It is therefore logical tofuse together the audiovisual documentary that il-lustrates and extends an exhibit with the visitor’s pathinside that exhibit, using a wearable computer. Wecreate a new type of experience making the visit in-distinguishable from seeing a movie from inside themovie set. This immersive experience cannot beachieved by a simple associative coupling betweeninputs and outputs. It requires a “perceptive layer”that constantly monitors the input and creates an out-put having a model of the user, its goals and an “un-derstanding” of the content itself, the logical con-

Figure 27 Wearable computer: jacket and display

Figure 28 Touch-sensitive keypad


nection among the parts, and their emotional impacton the user. We previously described perceptive me-dia modeling in a June 1999 paper.61

Our current work transforms our research lab intoa museum space. We have gathered a variety of his-torical footage and authored an interactive presen-tation for a wearable computer using a Wearable City3-D graphics presentation to situate the user in thespace. The audiovisual presentation of the footageand its description are authored using Macromedia’sFlash** authoring environment. A perceptive me-dia modeling of the content unfolds the wearable cin-

ema as the visitor walks around the space, and thecamera attached to the wearable recognizes its pres-ence in specific locations or relevant objects.

Oliver et al.62 developed a wearable computer witha visual input as a visual memory aid for a varietyof tasks, including medical, training, or education.This system allows small chunks of video to be re-corded and associates them with triggering objects.When the objects are seen again at a later moment,the video is played back. Wearable Cinema differsfrom this application in many ways. Its scope is tocreate a cinematic experience and to situate it withinthe containing architecture. Its focus is to orches-trate content and to carefully construct an immer-sive experience guided by the perception of thesenses of the wearable, using perceptive media mod-eling. As opposed to the cited application, WearableCinema is not a simulation running on a desktopcomputer connected to a head-mounted display. Itactually runs on a wearable especially designed forit, and the computer vision runs in real time on thewearable CPU.

The main distinctive characteristic of this wearablesetup is that it uses real-time computer vision as in-put for easier and faster location finding. This char-acteristic has the additional advantage that no timeneeds to be spent distributing the infrared locationemitters around the space and substituting the bat-teries when necessary. A quick training on the lo-cations or objects to be recognized is the only setupneeded for the computer vision system.

The wearable is made by sandwiching two CPUs to-gether (Figure 31). One is dedicated to processingthe input and the other to producing the outputshown on the wearable display. We use a thin un-geared Pentium II Toshiba laptop, stripped down toits motherboard, connected to a super-thin SonyVAIO** Pentium II on a local network. The Toshibaruns the Linux operating system and is used for real-time computer-vision sensing. The VAIO runs the Mi-crosoft Windows** 98 operating system and uses acombination of Flash animations and Open Inven-tor 3-D graphics to generate the cinematic experience.

These two very thin and lightweight computers arehosted inside a stylized backpack. The wearable isconnected to a small wide-angle camera worn on theuser’s shoulder and to a high-resolution SVGA dis-play.

Figure 29 SVGA color head-mounted display screen,with a view of the SIGGRAPH 99 Wearable City demo

Figure 30 Modified Sony SVGA color Glasstron™ display


The computer vision system uses a combination ofcolor histograms and shape analysis to identify ob-jects or locations. Segmentation of regions to be an-alyzed is achieved by Gaussian modeling of the tar-get color, as in Wren et al.25 Shape analysis is basedon contour extraction and calculation of centralizedmoments on the contour points. The connection be-tween the input and output software is done usingthe remote procedure call (RPC) protocol. The vi-sion program runs as a server and communicates theresult of its processing to a Java** client through RPC.On the other CPU, a Java server broadcasts this in-formation to the Flash program via JavaScript. TheOpen Inventor graphics connects to the vision pro-gram on the other CPU directly with RPC.

Wearable performance. Wearable computers will de-termine a migration of the computational enginefrom the house or the laboratory onto the usersthemselves. An analog to this transformation can befound in the transition from the drama played in thetheater building to street theater.

Street and outdoor performance has a long histor-ical tradition. Its recent form is motivated by the needto bring performance art to the people rather thanpeople to the theater. We have found it thereforenatural to try to merge the world of the street per-formers with the one of the wearable computer andto explore synergies between them. Based on the ob-servation that many street performers are actuallyskilled craftsmen of their own props, and that somehave good technological skills or are at least attractedby the potential offered by technology,63 we have in-vestigated how some street performers could ben-efit from the use of an affordable wearable computer.

We believe that wearable computing can contributeto street performance in three ways: (1) It can re-duce the amount of “stuff” that the performer needsto carry around by creating “virtual props” or vir-tual no-weight musical instruments. (2) It can aug-ment and enrich the performance by adding digitalactors to collaborate with the performer in the piece.(3) It can allow new types of street performances thatwere not possible before the introduction and spreadof wearable computers.

In our street performance application, we are inter-ested in exploring how point of view transforms ourperception of reality. The wearable acts as a “seman-tic lens” and offers to members of the audience anew, transformed interpretation of the story told bythe performer. This lens provides a view-dependent

reality augmentation, such that different people ob-serving the performer from various angles are ledto a different interpretation of the story told by theactor.

We distribute wearable computers among the pub-lic (or assume that in a not-so-distant future every-one will carry his or her own wearable), whereas theperformer is technology-free. The wearable com-puter is made up of a lightweight CPU, an augmentedreality private eye display, and a camera aiming atthe performer (Figure 31). The private eye showsvideo and graphics superimposed on the user’s real-surround view (Figures 29 and 30), just as in Improv-isational Theater Space (Figure 18), except that nowthere is no more need for a stage and a projectionscreen. By presenting the viewer with the computer-generated images and graphics on one eye and byleaving the view of the other eye unobstructed, weallow the human brain to fuse the real world and thegraphical augmentation into a unified augmented re-ality. A very small and thin camera, attached to theprivate eye of the wearable analyzes the perform-er’s body movements using the same processing de-scribed earlier in the paper. The public sees an ex-pressive text-and-images augmentation of the actor,leading to a different interpretation of the play orimprovisational theater, based on the wearer’s po-sition relative to the performer. Registration of thetwo images is provided by the computer vision pro-gram running on board.

Figure 31 Wearable Cinema setup, showing the backpack, two CPUs, camera, and lightweight color SVGA Glasstron display

GLASSTRON SVGA DISPLAY

CAMERA

CPU1

CPU2


In Sparacino et al.64 we also describe a variety of sce-narios using wearables in performance. These in-clude the “one man orchestra” which is a wearableversion of DanceSpace, the Augmented Mime, andthe Networked News Teller.

By customizing a networked, multimedia computerthat can be worn as clothing or is built into the per-former’s or the audience’s clothes, we offer the streetperformer new and powerful tools for expression.We hope that the creation of a community of wear-able augmented performers and public with a com-mon set of experiences and needs will also serve asa push toward future improvements of this new tech-nology. Media actors and body-tracking technolo-gies can scale and enrich this new venue of researchand technological endeavor.

Conclusions

The future of artistic and expressive communicationis tied to our ability to perceive the space and peo-ple around us with sensors capable of understand-ing natural full body movements, hand gestures, andfacial expressions, and to perform object detectionand location identification. In parallel, to constructinteractive experiences that benefit from natural in-teractions, compelling communication, and ease ofimplementation, we need authoring techniques thatcan support more than a simple mapping betweeninputs and outputs. Media modeling requires a “per-ceptive layer” to constantly monitor the input andcreate an output having a model of the user and itsgoals, as well as an “understanding” of the contentitself, the logical connection among the parts, andtheir emotional impact on the user. We have devel-oped a “media actors” authoring technique: We en-dow media objects—expressive text, photographs,movie clips, audio, and sound clips—with coordi-nated perceptual intelligence, behaviors, personal-ity, and intentionality. Such media actors are able toengage the public in an encounter with virtual char-acters that express themselves through one or moreof these agents. They are an example of intentionalarchitectures of media modeling for interactive en-vironments. Using this type of sensing and author-ing technique, we have built applications for dance,theater, and the circus, which augment the traditionalperformance stage with images, video, graphics, mu-sic, and text and are able to respond to movementand gesture in believable, aesthetical, and expres-sive manners. We also discussed work in interactivemuseum exhibit design and described applicationsthat use either a Smart Space (IVE space) or Smart

Clothes (wearable computers) technological frame-work.

Acknowledgments

Christopher Wren, Ali Azarbayejani, Trevor Dar-rell, and Thad Starner have worked on Pfinder atdifferent stages of development. The author wouldlike to thank Kristin Hall for her wonderful Improv-isational Theater interpretations and for her manyhours of rehearsal, discussion, useful suggestions, andencouragement to our work. Many thanks also tochoreographers Claire Mallardi and Erica Drew andto dancers Malysa Monroe, Jennifer DePalo,Chung-fu Chang, Erna Greene, Naomi Housman,and Diana Aubourg for their collaboration, inter-est, and encouragement. MIT undergraduates Ter-esa Hernandez and Jeffrey Bender did the 3-Dmodeling work of the enchanted forest usingAliasuWavefront’s Maya**. We thank Suguru Ish-izaki for his suggestions of the use of temporal ty-pography in performance. Akira Kotani provided hisSGI MIDI (Musical Instrument Digital Interface) li-brary, and Chloe Chao contributed to the computergraphics of DanceSpace. Edith Ackerman suggestedreadings on Bakhtin. Dimitri Negroponte helped di-rect and film the first rehearsals of ImprovisationalTheaterSpace. Tom Minka provided gracious sup-port, useful comments, and suggestions.

**Trademark or registered trademark of Silicon Graphics, Inc.,Autodesk Inc., Sony Electronics, Inc., Linus Torvalds, Intel Corp.,Microchip Technology Inc., Macromedia, Inc., Microsoft Cor-poration, or Sun Microsystems, Inc.

Cited references

1. C. R. Wren, F. Sparacino, A. J. Azarbayejani, T. J. Darrell,T. E. Starner, A. Kotani, C. M. Chao, M. Hlavac, K. B. Rus-sell, and A. P. Pentland, “Perceptive Spaces for Performanceand Entertainment: Untethered Interaction Using ComputerVision and Audition,” Applied Artificial Intelligence (AAI)Journal 11, No. 4, 267–284 (June 1996).

2. P. Maes, “Modeling Adaptive Autonomous Agents,” Artifi-cial Life 1, 135–162 (1994).

3. D. Zeltzer, “Task Level Graphical Simulation: Abstraction,Representation and Control,” Making Them Move, N. Bad-ler, B. Barsky, and D. Zeltzer, Editors, Morgan KaufmannPublishers, San Francisco (1991).

4. M. Johnson, A Testbed for Three Dimensional Semi-Auton-omous Animated Characters, Ph.D. thesis, MIT, Cambridge,MA (1994).

5. B. Blumberg and T. Galyean, “Multi-Level Direction of Au-tonomous Creatures for Real-Time Virtual Environments,”Computer Graphics, SIGGRAPH 95 Proceedings 30, No. 3,47–54 (1995).

6. R. A. Brooks, “Intelligence Without Reason,” IJCAI-91, Vol.1 (1991), pp. 569–595.

7. P. Maes, “Situated Agents Can Have Goals,” Designing Au-


tonomous Agents: Theory and Practice from Biology to Engi-neering and Back, P. Maes, Editor, MIT Press/BradfordBooks, Cambridge, MA (1990).

8. B. Blumberg, “Action-Selection in Hamsterdam: Lessonsfrom Ethology,” Third International Conference on the Sim-ulation of Adaptive Behavior, Brighton, England, MIT Press,Cambridge, MA (1994), pp. 108–117.

9. K. Perlin and A. Goldberg, “Improv: A System for ScriptingInteractive Actors in Virtual Worlds,” Computer Graphics,SIGGRAPH 96 Proceedings, ACM (1996), pp. 205–216.

10. The Selection of Behavior: The Operant Behaviorism of B. F.Skinner: Comments and Consequences, A. C. Catania andS. Harnad, Editors, Cambridge University Press, Cambridge,UK (1988).

11. B. Hayes-Roth and R. van Gent, “Improvisational Puppets,Actors, and Avatars,” Proceedings of the Computer Game De-velopers’ Conference, Santa Clara, CA (1996).

12. N. Magnenat-Thalmann and D. Thalmann, “The ArtificialLife of Synthetic Actors,” invited paper, IEICE Transactions,Japan J76-D-II, No. 8, 1506–1514 (August 1993).

13. L. Emering, R. Boulic, R. Balcisoy, and D. Thalmann, “Multi-Level Modeling and Recognition of Human Actions Involv-ing Full Body Motion,” Proceedings of the International Con-ference on Autonomous Agents 97, ACM Press (1997), pp. 504–505.

14. D. Terzopoulos, X. Tu, and R. Grzeszczuk, “Artificial Fish-es: Autonomous Locomotion, Perception, Behavior, andLearning in a Simulated Physical World,” Artificial Life 1, No.4, 327–351 (December 1994).

15. D. Terzopoulos, “Artificial Life for Computer Graphics,”Communications of the ACM 42, No. 8, 32–42 (August 1999).

16. N. Tosa and R. Nakatsu, “Life-like Communication Agent—Emotion Sensing Character ‘MIC’ and Feeling Session Char-acter ‘MUSE’,” Proceedings of the International Conferenceon Multimedia Computing and Systems (1996), pp. 12–19.

17. J. Bates et al., “An Architecture for Action, Emotion, andSocial Behavior,” Proceedings of the Fourth European Work-shop on Modeling Autonomous Agents in a Multi-Agent World(1992).

18. N. Sawhney, D. Balcom, and I. Smith, “Authoring and Nav-igating Video in Space and Time: An Approach Towards Hy-pervideo,” IEEE Multimedia 4, No. 4, 30–39 (October–De-cember 1997).

19. S. Agamanolis and M. Bove, “Multilevel Scripting for Re-sponsive Multimedia,” IEEE Multimedia 4, No. 4, 40–50 (Oc-tober–December 1997).

20. G. Davenport et al., “Encounters in Dreamworld: A Workin Progress,” CaiiA, University of Wales College, Wales (July1997).

21. J. A. Paradiso, “The Brain Opera Technology: New Instru-ments and Gestural Sensors for Musical Interaction and Per-formance,” Journal of New Music Research 28, No. 2, 130–149 (1999).

22. T. Jebara and A. Pentland, “Action Reaction Learning: Au-tomatic Visual Analysis and Synthesis of Interactive Behav-iour,” Computer Vision Systems, First International Conference,ICVS ’99 Proceedings, Las Palmas, Canary Islands (January1999), pp. 273–292.

23. L. Pirandello, “Six Characters in Search of an Author,” Na-ked Masks, E. Bentley, Editor, E. Storer, Translator, E. P.Dutton & Co., New York (1952).

24. J. L. Crowley and J. M. Bedrune, “Integration and Controlof Reactive Visual Processes,” 1994 European Conference onComputer Vision (ECCV-’94), Stockholm (May 1994).

25. C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland,

“Pfinder: Real-Time Tracking of the Human Body,” IEEETransactions on Pattern Analysis and Machine Intelligence 19,No. 7, 780–785 (July 1997).

26. T. Darrell, B. Moghaddam, and A. P. Pentland, “Active FaceTracking and Pose Estimation in an Interactive Room,” Pro-ceedings of the IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition 96 (CVPR 96) (1996),pp. 67–72.

27. N. Oliver, A. Pentland, and F. Berard, “LAFTER: A Real-Time Lips and Face Tracker with Facial Expression Recog-nition,” Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition 97 (CVPR 97),San Juan, Puerto Rico (June 1997), pp. 123–129.

28. T. Darrell, P. Maes, B. Blumberg, and A. Pentland, “A NovelEnvironment for Situated Vision and Behavior,” Proceedingsof IEEE Workshop for Visual Behaviors, IEEE Computer So-ciety Press, Los Alamitos, CA (1994).

29. A. Pentland, “Smart Rooms,” Scientific American 274, No.4, 68–76 (April 1996).

30. F. Sparacino, Choreographing Media for Interactive Virtual En-vironments, master’s thesis, MIT Media Laboratory, Cam-bridge, MA (October 1996), submitted for approval in Jan-uary 1996.

31. J. A. Paradiso and F. Sparacino, “Optical Tracking for Mu-sic and Dance Performance,” Fourth Conference on Optical3-D Measurement Techniques, Zurich, Switzerland (Septem-ber 29–October 2, 1997).

32. T. Machover, HyperInstruments: A Composer’s Approach tothe Evolution of Intelligent Musical Instruments, CyberArts,William Freeman, San Francisco (1992), pp. 67–76.

33. J. Klosty, Merce Cunningham: Dancing in Space and Time,Saturday Review Press, New York (1975).

34. A. Marion, Images of Human Motion: Changing Representa-tion of Human Identity, M.S. thesis, MIT, Cambridge, MA(1982).

35. R. Gehlhaar, “SOUND 5 SPACE: An Interactive MusicalEnvironment,” Contemporary Music Review 6, No. 1, 59–72(1991).

36. T. Calvert, C. Lee, G. Ridsdale, S. Hewitt, and V. Tso, “TheInteractive Composition of Scores for Dance,” Dance No-tation Journal 42, No. 2, 35–40 (1986).

37. T. Calvert and S. Mah, “Choreographers as Animators: Sys-tems to Support Composition for Dance,” Interactive Com-puter Animation, N. Magnenat-Thalmann and D. Thalmann,Editors, Prentice-Hall, Inc., Upper Saddle River, NJ (1996),pp. 100–126.

38. K. Sukel, G. Brostow, and I. Essa, Towards an Interactive Com-puter-Based Dance Tutor, Concept Paper, Georgia Instituteof Technology, Atlanta (1998).

39. F. Sparacino, K. Hall, C. Wren, G. Davenport, and A. Pent-land, “Digital Circus: A Computer-Vision-Based InteractiveVirtual Studio,” IMAGINA, Monte Carlo, Monaco (January18–20, 1999).

40. P. Bouissac, Circus and Culture: A Semiotic Approach, Indi-ana University Press, Bloomington, IN (1976).

41. F. Sparacino, K. Hall, C. Wren, G. Davenport, and A. Pent-land, “Improvisational Theater Space,” The Sixth BiennialSymposium on Arts and Technology, Connecticut College, NewLondon, CT (February 27–March 2, 1997).

42. S. Viola, Improvisation for the Theater, Northwestern Univer-sity Press, Evanston, IL (1979).

43. Y. Y. Wong, Temporal Typography, master’s thesis, MIT, Me-dia Laboratory, Cambridge, MA (1995).

44. E. Spiekermann and E. M. Ginger, Stop Stealing Sheep and


Find Out How Type Works, Adobe Press, Mountain View, CA(1993).

45. R. Massin, Letter and Image, Van Nostrand Reinhold, NewYork (1970).

46. M. M. Bakhtin, The Dialogical Imagination, M. Holquist, Ed-itor, University of Texas Press, Austin, TX (1981).

47. L. S. Vygotsky, “Thinking and Speech,” The Collected Worksof L. S. Vygotsky, Vol. 1: Problems of General Psychology, R. W.Reiber and A. S. Carton, Editors, N. Minick, Translator, Ple-num Publishers, New York (1987).

48. T. B. L. Webster, The Greek Chorus, Methuen, London (1970).49. A. Terzakis, “The Chorus in the Greek Tragedy,” Preuves,

nn. 215–216 (February–March 1969), pp. 8–15.50. M. Kirby, Futurist Performance, E. P. Dutton & Co., New York

(1971).51. H. Gilpin, “William Forsythe: Where Balance Is Lost and

the Unfinished Begins,” Parkett 45, Parkett Publishers, NewYork (1995).

52. P. Brook, The Open Door, Theatre Communication Group,New York (1995).

53. L. Klein, Exhibits: Planning and Design, Madison Square Press,New York (1986), pp. 70–71.

54. A. Pentland, “Smart Clothes,” Scientific American 274, No.4, 73 (April 1996).

55. T. Starner, S. Mann, B. Rhodes, J. Levine, J. Healey,D. Kirsch, R. W. Picard, and A. Pentland, “AugmentedReality Through Wearable Computing,” Presence, special is-sue on augmented reality (1997).

56. S. Mann, “Wearable Computing: A First Step Toward Per-sonal Imaging,” Computer 30, No. 2, 25–31 (February 1997).

57. F. Sparacino et al., “City of News,” Ars Electronica Festival,Linz, Austria (September 8–13, 1997).

58. F. Sparacino, G. Davenport, and A. Pentland, “WearableCinema/Wearable City: Bridging Physical and Virtual SpacesThrough Wearable Computing,” IMAGINA 2000, MonteCarlo, Monaco (January 31–February 3, 2000).

59. G. Davenport, T. Aguierre Smith, and N. Pincever, “Cine-matic Primitives for Multimedia,” IEEE Computer Graphicsand Animation, special issue on multimedia (July 1991), pp.67–74.

60. G. Davenport, R. Evans, and M. Halliday, “Orchestrating Dig-ital Micromovies,” Leonardo 26, No. 4, 283–288 (August1993).

61. F. Sparacino, G. Davenport, and A. Pentland, “Media Ac-tors: Characters in Search of an Author,” IEEE MultimediaSystems ’99, International Conference on Multimedia Comput-ing and Systems (IEEE ICMCS’99), Centro Affari, Firenze,Italy (June 7–11, 1999).

62. N. Oliver, T. Jebara, B. Schiele, and A. Pentland, http://vismod.www.media.mit.edu/;nuria/dypers/dypers.html.

63. B.Mason,StreetTheatreandOtherOutdoorPerformance,Rout-ledge Publishers, New York (1992).

64. F. Sparacino, C. Wren, G. Davenport, and A. Pentland, “Aug-mented Performance in Dance and Theater,” InternationalDance and Technology 99 (IDAT99), Arizona State Univer-sity, Tempe, AZ (February 25–28, 1999).

Accepted for publication May 15, 2000.

Flavia Sparacino MIT Media Laboratory, 20 Ames Street, Cam-bridge, Massachusetts 02139-4307 (electronic mail:[email protected]). Ms. Sparacino is a Ph.D. candidate at the MIT MediaLaboratory. She is designing perceptual intelligence for interac-tive media, with applications to augmented performance, infor-mation architecture, and museum displays. Perceptual intelligence

emerges from the integration of multiple sensor modalities anddetermines the expressive abilities of digital text, photographs,movie clips, music, and audio. Her work was featured at Ars Elec-tronica 97 and 98, ISEA 97, SIGGRAPH 96 and 99, ICMCS 99,and ICHIM99, as well as many other international venues. Sheholds undergraduate degrees in electrical engineering and robot-ics, and master’s degrees in cognitive sciences and media arts andsciences. Ms. Sparacino received a number of scholarships andawards for her academic work and career, including those fromthe European Community, the Italian Center for National Re-search, the French Center for National Research, and Fulbright,among others. She spent some time in film school, theater school,and a small-town circus and has done travel photography in manycountries around the world.

Glorianna Davenport MIT Media Laboratory, 20 Ames Street,Cambridge, Massachusetts 02139-4307 (electronic mail: [email protected]). Ms. Davenport is the director of the Interactive Cin-ema Group at the MIT Media Laboratory. Trained as a docu-mentary filmmaker, she has achieved international recognitionfor her work in the new media forms. Her research explores fun-damental issues related to the collaborative co-construction ofdigital media experiences, where the task of narration is splitamong authors, consumers, and computer mediators. Ms. Dav-enport’s recent work focuses on the creation of customizable, per-sonalizable storyteller systems that dynamically serve and adaptto a widely dispersed society of audience. She has taught, lectured,and published internationally on the subjects of interactive mul-timedia and story construction. More information can be foundat http://www.media.mit.edu/;gid.

Alex (Sandy) Pentland MIT Media Laboratory, 20 Ames Street,Cambridge, Massachusetts 02139-4307 (electronic mail: [email protected]). Dr. Pentland is the academic head of the MITMedia Laboratory and cofounder and director of the Center forFuture Health and the LINCOS Foundation. His research focusincludes both smart rooms (e.g., face, expression, and intentionrecognition; word learning and acoustic scene analysis) and smartclothes (e.g., augmenting human intelligence and perception bybuilding sensors, displays, and computers into glasses, belts, shoes,etc.). These research areas are described in the April 1996 andNovember 1998 issues of Scientific American. He is one of the 50most-cited researchers in the field of computer science, and News-week has recently named him one of the 100 Americans most likelyto shape the 21st century. Dr. Pentland has won awards from sev-eral academic societies, including the AAAI, IEEE, and Ars Elec-tronica. See http://www.media.mit.edu/;pentland for additionaldetails.


Media in performance: Interactive spaces for dance, theater...

Documents

Transcript of Media in performance: Interactive spaces for dance, theater...