Information and viewpoint dependence in face recognition

22
Cognition 62 (1997) 201–222 Information and viewpoint dependence in face recognition a, b c * Harold Hill , Philippe G. Schyns , Shigeru Akamatsu a Department of Psychology, University of Stirling Stirling, FK94LA UK b Dept. of Psychology, University of Glasgow Glasgow, G12 8QB UK c ATR Human Information Processing Laboratories,2-2 Hikaridai, Seika-cho Soraku-gun Kyoto 619-02 Japan Received 16 May 1996, final version 20 June 1996 Abstract How we recognize faces despite rotations in depth is of great interest to psychologists, computer scientists and neurophysiologists because of the accuracy of human performance despite the intrinsic difficulty of the task. Three experiments are reported here which used three-dimensional facial surface representations to investigate the effects of rotations in depth on a face recognition task. Experiment 1, using ‘‘shape only’’ representations, showed that all the views used (full-face, three-quarter and profile) were equally well recognized when all had been learned. Performance was better when the same views were presented in an ani- mated sequence, rather than at random, suggesting that structure-from-motion provides useful information for recognition. When stimuli were presented inverted, performance was worse and there were differences in the recognizability of views, demonstrating that the familiarity of upright faces affects generalization across views. Experiments 2 and 3 investigated generalization from single views and found performance to be dependent on learned view. In both experiments, generalization from learned full-face fell off with increasing angle of rota- tion. With shape only stimuli, three-quarter views generalized well to each other, even when inverted, but for profiles generalization was equally bad to all unlearned views. This differ- ence may be explained because of the particular relationship of the profile to the axis of sym- metry. In Experiment 3, addition of information about superficial properties including color and texture facilitated performance, but patterns of generalization remained substantially the same, emphasizing the importance of underlying shape information. However, generalization from the three-quarter view became viewpoint invariant and there was some evidence for bet- ter generalization between profiles. The results are interpreted as showing that three-dimen- sional shape information is fundamental for recognition across rotations in depth, although superficial information may also be used to reduce viewpoint dependence. 1997 Elsevier Science B.V. All rights reserved. * Corresponding author. E-mail: [email protected]. 0010-0277 / 97 / $17.00 1997 Elsevier Science B.V. All rights reserved PII S0010-0277(96)00785-8

Transcript of Information and viewpoint dependence in face recognition

Cognition 62 (1997) 201–222

Information and viewpoint dependence in facerecognition

a , b c*Harold Hill , Philippe G. Schyns , Shigeru AkamatsuaDepartment of Psychology, University of Stirling Stirling, FK9 4LA UK

bDept. of Psychology, University of Glasgow Glasgow, G12 8QB UKcATR Human Information Processing Laboratories, 2-2 Hikaridai, Seika-cho Soraku-gun Kyoto

619-02 Japan

Received 16 May 1996, final version 20 June 1996

Abstract

How we recognize faces despite rotations in depth is of great interest to psychologists,computer scientists and neurophysiologists because of the accuracy of human performancedespite the intrinsic difficulty of the task. Three experiments are reported here which usedthree-dimensional facial surface representations to investigate the effects of rotations in depthon a face recognition task. Experiment 1, using ‘‘shape only’’ representations, showed that allthe views used (full-face, three-quarter and profile) were equally well recognized when allhad been learned. Performance was better when the same views were presented in an ani-mated sequence, rather than at random, suggesting that structure-from-motion provides usefulinformation for recognition. When stimuli were presented inverted, performance was worseand there were differences in the recognizability of views, demonstrating that the familiarityof upright faces affects generalization across views. Experiments 2 and 3 investigatedgeneralization from single views and found performance to be dependent on learned view. Inboth experiments, generalization from learned full-face fell off with increasing angle of rota-tion. With shape only stimuli, three-quarter views generalized well to each other, even wheninverted, but for profiles generalization was equally bad to all unlearned views. This differ-ence may be explained because of the particular relationship of the profile to the axis of sym-metry. In Experiment 3, addition of information about superficial properties including colorand texture facilitated performance, but patterns of generalization remained substantially thesame, emphasizing the importance of underlying shape information. However, generalizationfrom the three-quarter view became viewpoint invariant and there was some evidence for bet-ter generalization between profiles. The results are interpreted as showing that three-dimen-sional shape information is fundamental for recognition across rotations in depth, althoughsuperficial information may also be used to reduce viewpoint dependence. 1997 ElsevierScience B.V. All rights reserved.

* Corresponding author. E-mail: [email protected].

0010-0277/97/$17.00 1997 Elsevier Science B.V. All rights reservedPII S0010-0277( 96 )00785-8

202 H. Hill et al. / Cognition 62 (1997) 201 –222

1. Introduction

How we are able to recognize faces despite considerable variations in viewingconditions is one of the great problems for psychologists, computer scientists andneurophysiologists. To illustrate, consider the top five pictures of Fig. 1. As can beseen, different rotations in depth of a face produce very different images.Successful recognition requires that these images all be identified as beingdifferent pictures of the same face, even though they are more different from each

Fig. 1. Examples of the stimuli used in Experiment 1 and 2 in the sequence for a single trial. The toprow shows the five views used in both experiments: left profile, left three-quarter, full-face, rightthree-quarter and right profile. In Experiment 1, all views were presented in the learning stage, either inan animated sequence or randomly. In Experiment 2, subjects were shown a single view to learn. Thetesting stage following each learning phase was the same for both experiments—a target and adistractor face in the same view as each other were presented sequentially with order randomized.Subjects were asked to respond for each one ‘‘Yes’’ if they thought the view was of the learned faceand ‘‘No’’ otherwise. Testing was followed by the learning phase of a new trial involving a new face.

H. Hill et al. / Cognition 62 (1997) 201 –222 203

other than are pictures of different people seen from the same viewpoint (see Fig.1, the lower pictures).

Effects of rotations in depth have received considerable attention in the objectrecognition literature, because it is believed they reveal the nature of theunderlying representations. Two questions are of particular importance: (1) Arecertain views of an object best or ‘‘canonical’’, and (2) how is shape consistencymaintained across variations in viewpoint? These issues are also important for facerecognition because rotation in depth is one of the most common sources ofvariation in the viewing conditions of a face.

Evidence from object recognition studies on viewpoint dependence will bereviewed in the context of theories of recognition, and then the particular case offace recognition considered.

1.1. Viewpoint-dependent object recognition.

For non-face objects, there is evidence that certain views are canonical. Subjectslabel such views as ‘‘better’’ and are quicker and more accurate to name objectsshown in these views (Palmer et al., 1981). Also, increasing the angle of rotationrelative to the canonical perspective monotonically increases reaction times, aneffect often interpreted as evidence for mental rotation (Palmer, 1983; Shepherdand Cooper, 1982; Tarr and Pinker, 1989). There is also evidence for viewpoint-dependent recognition, in that performance is a function of misorientation relativeto the learned view(s), with a variety of conditions and types of stimuli (e.g.

¨Edelman and Bulthoff, 1992; Farah et al., 1994; Rock and Di Vita, 1987; Tarr and¨Pinker, 1989, 1990). For example, Bulthoff and Edelman (1992) trained subjects

to recognize different views of unfamiliar ‘‘paper-clip-like’’ objects. In a latertesting stage, it was shown that performance was best for the learned views, butalso that different new views elicited different patterns of generalization. Spe-cifically, generalization performance was best for new views interpolating thetrained views, than for new views extrapolating the trained views, and worst fornew views orthogonal to the trained views.

One theoretical interpretation of viewpoint dependence is that objects are stored¨in memory as collections of discrete views (Tarr and Pinker, 1989; Bulthoff and

Edelman, 1992). Different mechanisms have been proposed that allow new objectviews to be compared to stored views—for example, alignment (Ullman, 1989),linear combinations of views (Ullman and Basri, 1991), or interpolation betweenstored views (e.g. Poggio and Edelman, 1990). There seems to be neurophysiologi-cal support for representations in terms of discrete views, with the formation ofview-centered neurons in monkeys trained to recognize objects across changes inviewpoint (Logothetis et al., 1994).

View-based theories contrast with structural, or model-based accounts whichpropose that objects are represented as sets of parts and their relations (e.g.Biederman, 1987; Marr and Nishihara, 1978). A strong interpretation of structuraltheories implies viewpoint independence, at least over a limited range ofviewpoints, because the component parts of objects are defined in terms of

204 H. Hill et al. / Cognition 62 (1997) 201 –222

‘‘non-accidental’’ properties. Non-accidental properties allow the recovery of 3Dshapes from their projections on 2D views, as long as the same parts remainvisible over the range of considered views (Lowe, 1987). Biederman andGerhardstein (1993) summarizes three conditions for viewpoint dependence inpart-based theories of object recognition: (1) the objects must be decomposableinto their parts, (2) different objects must be distinguished by different parts, and(3) the same part-based description must be recoverable from the differentviewpoints.

An alternative account to view- and part-based theories is agnostic about thespecific format of object representations; instead it emphasizes the informationavailable from a particular image (shape, part, color, texture, or other derived cues)that is diagnostic for a particular task or categorization of the object (e.g. in thecase of faces, according to identity, sex, or ‘‘race’’). Viewpoint dependence wouldthen be determined by the availability of diagnostic information in different views,rather than by how that information is represented. In this paper, we use viewpointdependence to test for such task- and viewpoint-specific information, rather than totest between accounts of the specific representation used for this information [but

¨see Biederman and Gerhardstein (1995); Tarr and Bulthoff (1995) for a debate onthe representational interpretation].

1.2. Face recognition

While much face recognition research has concentrated on the highly familiar‘‘front-on,’’ full-face view, differences between individual views and effects ofchanging view have been shown. For example, the three-quarter view, a 458 degreerotation in depth around the vertical axis from the full-face, is often used to portrayfaces in photographs, paintings and caricatures and may be canonical (Krouse,1981; Logie et al., 1987; Bruce et al., 1987). A possible explanation would be thatthis view contains highly diagnostic information, revealing most useful features torecover the identity of a face, or giving the best representation of its three-dimensional shape. The three-quarter view is geometrically very stable (i.e.non-accidental) in that small amounts of misorientation do not qualitatively affectthe visible features. Thus, a face identification task could require featuralinformation which is best realized by the geometry of a three-quarter view.

Alternatively, the three-quarter preference could result from the way our brain is‘‘hardwired’’ to represent faces. Neurophysiological evidence suggests that whilemost face-selective cells are tuned to full-face and profile views, relatively broadtuning (608) would mean that the three-quarter would activate both sets of neurons(Perrett et al., 1985). However, evidence of Bruce et al. (1987) suggest that theadvantages associated with the three-quarter view may be limited to the task ofmatching unfamiliar faces, and would not reflect more fundamental properties ofthe representation of faces.

In contrast to a three-quarter view preference, the profile view appearsespecially bad for many tasks, including face identification (Bruce et al., 1987;Hill and Bruce, 1996). The profile view may be poor for identification because

H. Hill et al. / Cognition 62 (1997) 201 –222 205

important information such as the configuration of internal features is not visible(Diamond and Carey, 1986). Moreover, the profile and the full-face views are bothgeometrically unstable (i.e. accidental) in that small rotations in depth willqualitatively change the appearance of a face—for example, small rotations fromthe full-face will disrupt its symmetry and result in occlusion by the nose while forprofile parts of the other half of the face will become visible.

The poor recognition performances for profile and full-face views contrastsomewhat with the neurophysiological findings that many cells seem to be tuned tothese views, at least in the macaque superior temporal sulcus (Perrett et al., 1991).However, cells tuned to face views are isolated while the macaque accomplishes aparticular task, which could be quite different from identity judgments. Forexample, Perrett et al. (1992) argue that the full-face and profile could be involvedin the control of social interactions, and therefore convey the information morediagnostic for these tasks.

As well as differences in the recognizability of individual views there isevidence that rotation in depth between views reduces recognition for faces as well

¨as for other objects (Bruce, 1982; Hill and Bruce, 1996; Schyns and Bulthoff,¨1994; Troje and Bulthoff, 1995). However, effects of rotation in depth on

information for recognition have received little attention in their own right,although the similarity of faces and the need to make within-class discriminationsfor face identification is likely to result in marked effects of changing view.

1.3. Faces vs. objects

Although both face and object recognition appear to be viewpoint dependent insome respects, there is some evidence that faces may be ‘‘special.’’ Three mainlines of evidence have been cited in this context (Ellis and Young, 1986; see alsoBruce and Humphreys, 1994). First, there is neuropsychological evidence for adouble dissociation between face and object recognition deficits. Although mostprosopagnosics have problems with other classes of objects than faces, it has beenclaimed that some patients have problems with faces but not objects (e.g. deRenzi,1986), while others have problems with objects but not faces (e.g. McCarthy andWarrington, 1986). Even if there are pure cases of prosopagnosia, a questionwhich remains controversial, this does not necessarily imply that faces areprocessed in a unique manner. An alternative possibility would be that faces andobjects are reliant to differing degrees on the same information. For example, faceprocessing may be largely holistic while detailed, part-based processing is moreimportant for objects (Farah, 1991). Damage to one or other of the systemsprocessing this common information could then result in the observed patterns ofdissociation without the need to postulate distinct systems for faces and objects.

A second line of evidence cited that faces are special is the disproportionatelylarge detrimental effect of figural inversion—a 1808 rotation in the pictureplane—on face (compared to object) recognition (Yin, 1969). However, demon-stration of a similar effect for another class of objects highly familiar to thesubjects and distinguished by configural information (i.e. breeds of dog for dog

206 H. Hill et al. / Cognition 62 (1997) 201 –222

breeders) has been used to argue explicitly that faces are not special (Diamond andCarey, 1986).

The final line of evidence that faces are special is the early ontogeny of faceprocessing skills. This has been taken to imply that certain mechanisms specific toface processing are ‘‘hard-wired’’ (Goren et al., 1975; Morton and Johnson, 1991).

Whether or not faces are special, or even unique, they differ from other commonobject classes in a number of ways. First, faces are highly familiar stimuli,especially compared to the deliberately novel stimuli required by some experimen-

¨tal designs (e.g. Bulthoff and Edelman, 1992; Schyns and Murphy, 1994). This‘‘natural’’ expertise in faces may induce different patterns of viewpoint-depen-dence than those observed for less familiar objects. There is evidence thatfamiliarity with an object class affects recognition, even when the individualexemplars used were unknown prior to the experiment. For example, the ‘‘otherrace effect’’ (Brigham, 1986), where subjects recognize novel faces of their ownrace better than those of another race, demonstrates how expertise with a class isnot immediately transferable to another class of similar objects.

Secondly, faces and objects also differ in that faces are comparatively moresimilar in shape and they share the same parts. In this respect, faces would notsatisfy the second criteria for viewpoint invariance outlined in Biederman andGerhardstein (1993)—that of being distinguished by different parts. In fact, theimages of two faces seen from the same view are more similar on many objectivemeasures than the images of a single face seen from different views (Adini et al.,1994). Thus, face recognition may be importantly different from object recognitionbecause of the overall geometric similarity and familiarity of the stimuli.

Thirdly, face recognition experiments typically involve what would be within-category discriminations for object recognition (e.g. distinguishing between brandsof cars, types of chairs, and so forth). This, together with high similarity, imposesparticular task demands, e.g. the need for finer discriminations, which are likely toaffect aspects of processing, including viewpoint dependence.

In summary, effects of viewpoint on recognition performance are unlikely to beabsolute, but dependent on expertise, object information and the recognition /categorization task. Together, the information available from the object and thenature of the task may influence the diagnosticity of specific cues, and familiaritymay affect the ability to use these cues. To illustrate, consider the task ofrecognizing a face as a face, a between-class categorization. In these circum-stances, it is quite likely that most views, even the unfamiliar upside-downfull-face view, could provide sufficient information on which to base a decision.However, imagine another task in which you must decide whether the particularface is X. In this case, it is more important that you get a ‘‘good’’ view of the face,to ensure that you gather sufficient information to identify the input as theparticular face. Finally, imagine that you must recognize X for the first time at theairport from a single view of his /her face, for example from a photograph. Thislatter task is particularly difficult because X’s real face will almost certainly appearunder different viewing conditions (including viewpoint) from those in thephotograph and thus may look quite different. Under these constraints, whichpicture would be best?

H. Hill et al. / Cognition 62 (1997) 201 –222 207

In the experiments reported here, we investigated how viewpoint affected facerecognition. All experiments used the same basic recognition task, in order to keeptask demands constant. Each experiment varied as to the face information madeavailable to subjects for recognition. The first two experiments were carried outusing representations based on shape alone (see Fig. 1), as 3D shape is the primarydeterminant of how the projected image varies with viewpoint. Limiting considera-tion to the familiar full-face, three-quarter and profile views, we first testedwhether any view was canonical, or inherently easier to recognize. We also testedthe role of familiarity by repeating the experiment with inverted faces. Next, wetested recognition after learning a single face view, to test whether any view wasbetter for learning, and to see whether the pattern of generalization was viewpointdependent. This should provide clues as to the nature of the information availablefor recognition and generalization. Again, a part of this experiment was repeatedwith inverted faces. In the third experiment, we investigated the effect of addinginformation about superficial properties, including skin color and texture, to theshape-based representations. Generalization from a single learned view was againtested and we expected superficial information to provide a variety of cues usefulfor generalization across viewpoint.

2. Experimental

2.1. Experiment 1

The first experiment tested whether any view of a face is especially wellrecognized (or canonical) using shaded surfaces of 3D faces (see Fig. 1). Theviews tested were the familiar full-face, three-quarter and profile. As discussed inthe introduction, the three-quarter view is often chosen for artistic representation,and may lead to better recognition of unfamiliar faces. One explanation for athree-quarter view advantage is that this view provides a particularly goodrepresentation of three-dimensional shape. The stimuli, which contained onlyshape information, were well suited to test this hypothesis.

The experiment also tested the role of structure-from-motion cues in recogni-tion. In each experimental trial, subjects were exposed to a sequence of all fiveviews of the target face and were then immediately tested on their recognition ofone of these views (see Fig. 1). For one group, the learning sequence presented theviews in order, producing the impression of the face rotating in depth. For theother group, the order of views was random and produced no coherent overallimpression. Although both groups saw the same views for the same amount oftime during learning, we expected that structure-from-motion cues available whenthe views were presented in order would provide information about three-dimensional shape useful for subsequent recognition of individual views at testing.

As all the subjects learned all the views, any differences in performance shouldbe a function of the view that they were tested on. During a testing phase of eachtrial, subjects were presented with the views of a target face and a distractor faceand had to decide which face they had learned (see Fig. 1). Given the normalized

208 H. Hill et al. / Cognition 62 (1997) 201 –222

conditions of training, any canonical view should be recognized faster and moreaccurately when shown at test, and any decrements associated with particularviews should also be apparent. By testing performance as a function of test viewthe experiment also provided baseline data for the subsequent experimentsexamining the effect of learned view.

2.1.1. Methods

2.1.1.1. SubjectsSubjects were 24 Japanese ATR employees with normal or corrected vision who

volunteered their time to participate in the experiment. Equal numbers of subjectswere assigned to each condition. All subjects were unfamiliar with the faces used,allowing subjects’ familiarity with the stimuli to be controlled.

2.1.1.2. StimuliExperiment 1 and Experiment 2 used the same set of stimuli, derived from 3D

models of 30 Japanese faces (15 males and 15 females). These were used toproduce five views of each face in the form of images with 256 gray-levels.Twenty of the faces served as targets and ten as distractors.

TMThe face models were produced from Cyberware laser-scanned 3D co-ordinates of real faces (Watanabe and Suenaga, 1991). Each head was recon-structed by approximating the face data with a bicubic B-Spline surface. Laserhead scans have a lot of noise around the hair and so we trimmed them leaving theface area alone. This removed cues to hairstyle which may be important in profileand three-quarter views. The five views produced of each face were left profile(2908), left three-quarter (2458), full-face (08), right three-quarter (458) and rightprofile (908). Faces were illuminated by a directional light source located 30degrees above the line of sight with a relative intensity of 1 and an ambient lightsource from directly above with a relative intensity of 0.3. Surfaces were assigneda mid-gray matte reflectance and raytraced on a mid-gray background. A largeviewing distance was used to approximate orthographic projection. The resulting

TM300 3 300 pixel images were presented on a Silicon Graphics workstation usingsoftware written for the purpose.

2.1.1.3. DesignA 5(Test View) 3 2(Sequence) mixed design was used. The within subjects

factor Test View had five levels (left profile, left three-quarter, full-face, rightthree-quarter, or right profile). The between subjects factor Sequence had twolevels (animated or random). The main dependent variable was dL, a combinationof hits and correct rejections as described below. Response times were alsorecorded. Comparisons were planned to test for differences between individuallevels of test view. The order of trials was randomized for each subject as was thepairing of distracters with targets.

For all experiments in this paper, Hits and Correct Rejections were combined to

H. Hill et al. / Cognition 62 (1997) 201 –222 209

give a measure of sensitivity dL—an equivalent of d9 based on logistic dis-tributions (Snodgras and Corwin, 1988).

dL 5 lnh[Hits 3 (1 2 False Alarms)] / [(1 2 Hits) 3 False Alarms]j

where

Hits 5 (number of Hits 1 0.5) /(number of targets 1 1)

and

False Alarms 5 (number of False Alarms 1 0.5) /(number of distractors 1 1)

In these experiments, perfect performance on a testing view corresponded to 4Hits out of 4 and no False Alarms out of 4 and would give a dL of 4.39. Chanceperformance gives a dL of 0.

2.1.1.4. ProcedureFor each subject, there were twenty trials each consisting of a learning stage

followed by a testing stage (see Fig. 1). A different target face was used for eachtrial, allowing complete control of subjects’ exposure to that stimulus and ensuringthat no learning occurred during testing. The five views of the target face wereeach shown twice for 100 ms, making a total presentation time of 1 s.

The order of presentation of the learning views depended on the group. For theanimated group, the views were shown in order so that the target face appeared torotate through the five views, once clockwise and once counter-clockwise. Therewere 10 possible ordered sequences which were chosen at random. In the randomgroup, the same views were presented the same amount but in a completelyrandom order. This produced a disjointed sequence with no impression of rotation.

For each trial, a testing stage immediately followed the learning stage. Thisconsisted of presenting the target face and a distracter face one after the other in arandom order. Both faces were shown in the same single view, one of the viewslearned. For each image, subjects had to decide whether or not the face was thelearned face. Responses were made using one of two keyboard keys. Trialsdiffered as to which view was shown at test. For all subjects, each of the fivepossible views was used four times, giving the total of 20 trials.

2.1.2. Results and discussionThe results of Experiment 1 are summarized in Fig. 2. Although identical views

were presented in both conditions, subjects viewing the animated sequenceperformed significantly better (dL52.99) that those viewing the random condition(dL52.25). A 5(Test View)32(Sequence) analysis of variance showed a maineffect of Sequence, F(1, 22)57.5, p,0.05. It appears that structure-from-motioncues facilitated perception of the shape of the face, which in turn facilitatedsubsequent recognition of the individual views. This effect cannot be attributed tothe individual views shown, as these were the same in both conditions. Instead, itsuggests better integration of information in the animated condition.

210 H. Hill et al. / Cognition 62 (1997) 201 –222

Fig. 2. The results of Experiment 1, recognition of previously learned views for ‘‘shape only’’ stimuli.Performance on each test view is ordered as a function of presentation sequence. dL is a logisticmeasure of sensitivity combining hits and false alarms with chance corresponding to a value of 0, andperfect performance to 4.4 in these experiments. Error bars show standard errors.

This experiment was also designed to look for differences between views butthere was no significant effect of Test View, F(4, 88)51, n.s., or any interaction,F(4, 88)50.7, n.s. Planned t-tests between all possible pairs of test views alsoshowed no differences between any two views (all p.0.1). Although there was nosignificant interaction between Test View and Sequence differences appear largerfor some views than for others (see Fig. 2). For the full-face view, where thedifference was largest, a post hoc pairwise independent sample t-test was notsignificant, t(22)52.4, p.0.01 (0.05/5 possible comparisons).

Response Times (RTs) for hits were also analyzed. The effect of Sequenceapproached significance, F(1,21)53.8, p50.06 (1246 ms for animated and 1900ms for random), but there was no effect of test view or any interaction ( p.0.1).The procedure was designed to avoid repetition of trials and response times weretherefore necessarily based on relatively few RTs per cell (the maximum of fourbeing reduced by errors) and may not constitute meaningful averages; RTs are notreported for subsequent experiments.

The absence of an effect of Test View may be because the views used were allhighly familiar and each could have provided sufficient information for recogni-tion. In order to test this, the experiment was repeated using inverted (upside-down) views of the same faces. Face inversion is well-known to adversely affectrecognition (e.g. Yin, 1969), although it does not change the information in theimage. A comparison of the upright and inverted experiments revealed a cleareffect of orientation, F(1, 48)525.1, p,0.05, with upright faces being moreaccurately matched (dL52.62) than inverted faces (dL51.27). This experimentusing inverted faces revealed a significant interaction between test view and

H. Hill et al. / Cognition 62 (1997) 201 –222 211

Table 1Mean dL for the repeat of Experiment 1 with all stimuli presented inverted. Standard errors are shownin brackets.

Test view

Sequence Left profile Left three-quarter Right full-face Right three-quarter Profile

Animated 2.1 (0.4) 1.2 (0.5) 0.4 (0.4) 1.9 (0.4) 1.7 (0.5)Random 0.5 (0.3) 1.0 (0.4) 1.9 (0.4) 1.8 (0.6) 0.1 (0.4)

sequence F(4,100)54.8, p,0.05 (see Table 1)—differences between test viewswere different for animated and random presentations. For inverted faces animatedpresentation appeared to favour views rotated with respect to the plane of the faceperhaps because it gave a better impression of 3D shape, while randompresentation favoured the full-face. No main effect of test view independent ofSequence was observed, F(4, 100)51.6, n.s., suggesting that the views used donot inherently differ in recognizability. Instead, recognition of inverted facesshowed that familiarity with upright faces improves performance, and also that theway views are presented affects subjects’ ability to recover useful informationfrom them. Views were found to be best learned when presented upright in ananimated sequence, even though the same pictures containing the same in-formation were always shown.

2.2. Experiment 2

It is possible that although all the views tested appear equally recognizable theymay not be equally good for learning. In Experiment 2, we sought to testgeneralization from single learned views, including how the pattern of generaliza-tion changes as a function of the view learned. In a design similar to Experiment 1we only presented one view during learning. As there were no significantdifferences between test views in Experiment 1 any differences found here arelikely to be a function of the view learned

Generalization from a single view is difficult due to the very different image ofthe novel view. For some completely novel objects, generalization from a single

¨view falls off with increasing angle of rotation (e.g. Edelman and Bulthoff, 1992).However, faces share particular properties (such as bilateral symmetry) which maychange this basic pattern of generalization (Vetter et al., 1994). Relatedly,familiarity with faces may have led subjects to use to class-based knowledgewhich can facilitate generalization even for novel faces. For example, class-basedtransformations may ‘‘predict’’ the effect of a rotation in depth on the projectedimage.

We expected patterns of generalization to depend trivially on the learned view,if only because picture recognition would be sufficient when the same view ispresented at test. For example, generalization may simply depend on the angle ofrotation from the learned view, with performance dropping off as rotationincreases. However, supplementary viewpoint-dependent performance could il-

212 H. Hill et al. / Cognition 62 (1997) 201 –222

luminate the nature of viewpoint-dependent face recognition processes. If prop-erties such as symmetry are important we would expect some novel views to bebetter recognized than others. For example, left and right three-quarter and profileviews might be expected to generalize well to each other because of symmetry.Thus, we sought to test for differences in generalization from single views toprovide clues as to the information used.

2.2.1. Methods

2.2.1.1. SubjectsThirty-six Japanese ATR employees who did not take part in Experiment 1

volunteered for this experiment.

2.2.1.2. StimuliStimuli were the same as in Experiment 1: 5 views of each of 30 laser-scanned

faces divided into 20 targets and 10 distracters.

2.2.1.3. ProcedureProcedural details were the same as for Experiment 1, except for the learning

phase. In Experiment 2, subjects were allocated at random to see one of the fivepossible static views in the learning stage. This view was presented for the sametime of 1 s as the whole sequences shown in Experiment 1.

2.2.1.4. DesignThe design was a 3(Learning View)35(Test View) mixed design. Learning

View was a between-subjects factor with levels full-face, three-quarter or profile.Subjects were equally divided between these three groups. For three-quarter andprofile views, half the subjects learned the right view and half the left. It wasplanned to test for differences between left and right views. The levels of test viewwere same-side profile, same-side three-quarter, full-face, other-side three-quarter,and other-side profile. For the full-face group, same-side views correspond to leftviews and other-side views to right. It was also planned to test if generalizationwas better to the symmetric views, other profile or other three-quarter, than toother unlearned views for the learned profile and three-quarter groups, respective-ly.

2.2.2. Results and discussionThe results of Experiment 2 are plotted in Fig. 3. Overall, performance was

lower in Experiment 2 (dL52.14) than in the animated upright condition ofExperiment 1 (dL52.99). A linear contrast of left against right learned three-quarter and profile views—i.e. (left profile 1 left three-quarter) /2 vs. (right profile1 right three-quarter) /2—showed no difference between learning left or rightviews, F(1, 31)50.17, n.s. (mean dL were 2.19 and 2.01, respectively). Therewere also no differences between left profile and right profile, t(5)50.27, n.s.(mean dL 1.89 and 1.64, respectively), or between left three-quarter and right

H. Hill et al. / Cognition 62 (1997) 201 –222 213

Fig. 3. The results of Experiment 2, generalization from a single view for ‘‘shape only’’ stimuli.Learning views, full-face, three-quarter and profile, are shown collapsed across left and right views.The conditions of test view were same-side profile, same-side three-quarter, full-face, other-sidethree-quarter, and other-side profile. For full-face, ‘‘same-side’’ views correspond to left views and‘‘other-side’’ to right. Error bars show standard errors.

three-quarter, t(5)50.10, n.s. (mean dL 2.48 and 2.38 respectively), and so thesewere collapsed as planned.

A 3(Learning View)35(Testing View) analysis of the collapsed data revealed nomain effect of learned view (mean dL: full-face 52.22, three-quarter 52.43, andprofile 51.77), F(2,33)51.3, n.s., contrary to the idea that one view wasinherently best for learning. A significant interaction Training View3Testing Viewwas found, F(8, 132)53.4, p,0.05, confirming that the pattern of generalizationto test views depended on learned view. For learned full-face there was an invertedU shape pattern of generalization, with performance falling off with increasing

¨angle of rotation from the learned view (see also Schyns and Bulthoff, 1994). Forthe three-quarter view, the most noticeable feature is a peak for the oppositethree-quarter, while for the profile view there was no such peak, generalizationbeing equally poor to all unlearned views. Analysis of simple main effects of TestView showed effects for learned full-face, F(4, 132)53.9, p,0.05 and three-quarter, F(4, 132)54.1, p,0.05 but not for profile, F(4, 132)51.7, n.s., althoughthe learned view appears best recognized. Planned linear contrasts showed thatwhen three-quarter was learned, there was a significant advantage for otherthree-quarter over other unlearned views, F(1,132)55.6, p,0.05, but no sucheffect for other profile when profile was learned, F(1,132),1, n.s.

Generalization from the three-quarter view was also tested in a repeat of thiscondition with inverted stimuli, to test for a possible explanation of the reportedsymmetry in terms of familiarity with upright three-quarter. Mean dL and standarderrors are shown in Table 2. Although performance was lower (dL51.58) thanwhen the stimuli were presented upright (dL52.43), F(1,24)56.1, p,0.05, there

214 H. Hill et al. / Cognition 62 (1997) 201 –222

Table 2Mean dL for the repeat of Experiment 2 testing generalization from the inverted three-quarter view.Standard errors are shown in brackets.

Test view

Learned Same profile Same three-quarter Full-face Other three-quarter Other profile

Three-quarter 0.75 (0.4) 3.13 (0.4) 0.94 (0.5) 2.27 (0.4) 0.80 (0.4)

was a clear effect of test view for the inverted stimuli, F(4, 52)516.0, p,0.05.The effect of test view was very similar to that for upright faces and did notinteract with orientation, p.0.1, n.s. Again, a planned comparison showed thatother three-quarter was better recognized than the other unseen views, F(1,52)58.7, p,0.05. These data with inverted faces suggest that the pattern ofgeneralization for three-quarter is not a function of familiarity with this view, but areflection of the geometry of the face.

In summary of Experiment 2, although no training view produced significantlybetter learning than the other views, different patterns of viewpoint dependencewere found for different learned views. In all conditions, performance dropped offsharply from the learned view. In the three-quarter condition, however, ageneralization peak to the opposite three-quarter indicated that this view wasrecognized better than the other unseen views. This symmetric peak was notobserved for the learned profile, although the opposite view also approximates amirror image. We leave interpretation of this difference to the general discussionafter reporting how addition of information about superficial properties to shapedata affected generalization from single views.

2.3. Experiment 3

Experiment 3 tested the generalizability of our previous findings using morenaturalistic stimuli. These included information about surface properties, forexample color and texture, in addition to shape-based information (see Fig. 4).This information was encoded as 24-bit red /green/blue (RGB) data that wastexture mapped onto the underlying 3D shape.

Fig. 4. Example of the texture mapped stimuli used in Experiment 3. The face is shown in left profile,left three-quarter, full-face, right three-quarter, right profile. In the experiment stimuli were presented incolor.

H. Hill et al. / Cognition 62 (1997) 201 –222 215

Shape-based information is central to most theories of object recognition (e.g.Marr and Nishihara, 1978; Biederman, 1987; Ullman, 1989; Poggio and Edelman,1990). For example, object recognition appears as good in tasks using simple linedrawings as in tasks using full color photographs (Biederman and Ju, 1988).However, when the items to be distinguished are perceptually similar, and thecategorization tasks require greater differentiation within stimuli, informationabout superficial properties has been found to facilitate performance (Price andHumphreys, 1989; Wurm et al., 1993). Given that faces constitute a perceptuallyhighly homogenous category and that their recognition requires fine within-classdiscriminations to be made, superficial cues would be expected to facilitateperformance. Unlike other objects, faces are better recognized from photographsthan line drawings (Davies et al., 1978). Also, superficial information of the typeused here has been shown to affect categorization tasks such as sex and racejudgements (Hill et al., 1995).

We expected that the addition of superficial information to the shapes of faceswould facilitate generalization across viewpoints for the following reasons. Someof this information (e.g. skin color) may be viewpoint invariant and available fromall the views. Such information would be expected to raise performance in allviews, without affecting patterns of generalization. Other superficial informationmay depend on viewpoint, such as information about pigmented features visibleonly in some views. If critical, such information might affect patterns ofgeneralization. Thus Experiment 3 again tested generalization from single viewsbut with additional superficial information available for the task.

2.3.1. Methods

2.3.1.1. SubjectsFifty-four Japanese ATR employees volunteered their time to participate to the

experiment. Subjects had not taken part in Experiments 1 or 2.

2.3.1.2. StimuliThe face data were the same 30 laser-scanned faces but with 24-bit RGB video

data, recorded at the same time and from the same face, texture mapped onto theTMthree-dimensional models using the ALIAS solid modelling software (see Fig.

4).

2.3.1.3. Design and procedureThese were identical to Experiment 2 except that presentation time was reduced

to 500 ms for learning views after pilot work showed that a presentation time of 1s resulted in ceiling performance.

2.3.2. Results and discussionThe results of Experiment 3 are summarized in Fig. 5. Overall, despite the

reduced presentation time, performance was better in this experiment (dL52.44)than in the ‘‘shape only’’ experiment (dL52.14). This, together with the ceiling

216 H. Hill et al. / Cognition 62 (1997) 201 –222

Fig. 5. Results of Experiment 3, generalization from a single view for texture mapped stimuli.

performance in pilot work, suggests that superficial information facilitates per-formance on this task, generalization from single views. However, the necessitatedreduction in presentation time led to similar levels of performance and makesstrong conclusions difficult to draw.

Initial analyses again showed no differences between equivalent left and rightlearning views: combined left profile and left three-quarter vs. combined rightprofile and right three-quarter, F(1, 49)50.28, n.s., between left profile vs rightprofile, F(1, 49)50.14, n.s.; or left three-quarter vs right three-quarter, F(1,49)51.2, n.s. (mean dL were left profile52.67, left three-quarter52.36, rightthree-quarter52.82 and right profile52.52), and so these were collapsed asplanned.

A 3(Learning View)35(Testing View) analysis gave a Learned View3TestView interaction, F(8, 204)57.2, p,0.05, confirming that the pattern ofgeneralization was different for different learned views. There was no overalleffect of learning view, F(2, 51)51.7, n.s., but there were simple main effects oftest view for learned profile, F(4, 204)59.7, p,0.05 and full-face, F(4, 204)5

7.6, p,0.05, but not three-quarter, F(4, 204)51.4, n.s. For full-face and profile,effects of test view were found mainly because these views did not generalize wellto each other, suggesting the use of some superficial information invariant onlyover a limited angle of rotation. Information about pigmented or textured features,like the eyes and eyebrows, which appear very different in full-face and profileviews is a possible candidate. For learned full-face there was an unexpectedasymmetry but, in general, performance fell off with increasing angle of rotation.For learned profile, repeated views were especially well recognized and thereappears to be some advantage for the opposite profile. There was some evidencethat the other profile was recognized better than full-face, t(17)52.7, p,0.05, andother three-quarter, t(17)51.9, p,0.05, although this would not be significant if

H. Hill et al. / Cognition 62 (1997) 201 –222 217

correction were made for the number of possible post hoc pairwise comparisonsbetween test views (ten). Reflectionally invariant superficial information wouldaccount for this. Generalization from three-quarter did not depend on test view.This is consistent with this view combining properties of the full-face and theprofile view resulting in a overlap in the information available for recognition.That there was no advantage for the other three-quarter view here reflects theimprovement in performance for the full-face and profile views rather than any falloff for this view. However, performance was not higher overall for the learnedthree-quarter view.

In order to compare Experiments 2 and 3, both were combined into a singleanalysis with Experiment as a between subjects factor. This analysis gave a TestView3Learned View interaction, F(8, 336)58.7, p,0.05, as for the individualanalyses. However, there was no effect of Experiment or any interactionsinvolving this factor (all p.0.1). The absence of a main effect of Experiment canbe attributed to the reduction in presentation time—performance was at anequivalent level despite this. The lack of high order interactions emphasizes thesimilarity of the Learned View3Test View interaction between experiments. Thismay be because the shape information common to both experiments is the primarydeterminant of performance. The three-dimensional shape will determine whatsuperficial information is available as well as what actual cues to shape areavailable in the image.

In summary, Experiment 3 showed that the addition of superficial informationleft the pattern of generalization from single views substantially the same,depending on an interaction between learned and tested view. However, there weresome differences. The level of performance was at least as good despite thereduction in presentation time allowed for learning. Also, generalization from threequarter view was viewpoint invariant and there was some evidence for bettergeneralization between profile views. The possible nature of the information addedwill be considered in the general discussion, together with possible mechanismsfor shape-based generalization.

3. General discussion

The results of the three experiments presented in this paper can be summarizedas follows. Experiment 1 showed that when all the views were learned all wereequally well recognized, at least when presented upright. However, presentingexactly the same views in an animated sequence lead to better performance thanrandom presentation. Experiments 2 and 3 showed that generalization from asingle view was dependent on the view learned. The similarity of patterns ofgeneralization between experiments 2 and 3 suggests that the shape informationcommon to both determined the effect of viewpoint on both the shape andsuperficial cues. Addition of information about superficial properties did support agood generalization across viewpoint despite reduced presentation time.

218 H. Hill et al. / Cognition 62 (1997) 201 –222

The lack of evidence for a canonical view is not inconsistent with previousevidence which suggests that any three-quarter advantage is of limited generality(Bruce et al., 1987). Inverted face stimuli suggested that the absence of a preferredview is in part a result of the familiarity of subjects with upright viewpoints.Although the faces used were novel, knowledge of upright faces facilitates theprocessing of all the views used reducing any differences between them. In fact,when the views were presented inverted they were not equally well recognized, thepattern of differences being determined by the sequence in which they werelearned.

The effect of presentation in both upright and inverted conditions shows thatperformance is not a simple function of the information in particular views—which is a function of the image and remains the same under all the differentconditions of presentation. Animated presentation provided additional cues toshape (structure-from-motion cues) which may contribute to a fully 3D representa-tion, or a partial 2.5D sketch constructed from view-dependent shape information(Marr, 1982).

In Experiments 2 and 3, performance was found to be determined by theinformation in the view learned. In both experiments, generalization was triviallyviewpoint dependent in that the learned view was subsequently best recognized.With shape only stimuli, generalization from full-face fell of with increasing angleof rotation (a phenomenon clearly consistent with the viewpoint dependence

¨observed for other objects, e.g. Tarr and Pinker, 1989; Bulthoff and Edelman,1992), for the profile generalization was equally bad to all unlearned views, but forthe three-quarter showed a peak for the opposite three-quarter. This pattern forthree-quarter view was replicated with inverted stimuli suggesting it is notdependent on familiarity, although the overall drop in performance suggests thatface specific knowledge does facilitate single view generalization. While three-quarter views share reflectionally invariant properties (which would be maintainedunder inversion and might inform recognition) so do the profiles which did notgeneralize well to each other, at least for shape only stimuli.

There appear to be two possible explanations for the difference betweenthree-quarter and profile views in this respect. The first is in terms of Poggio andVetter’s (Poggio and Vetter, 1992) hypothesis concerning ‘‘virtual views’’. Thesecan be generated from a single view of an object by transformations in the imageplane for objects with bilateral (or greater) symmetry like faces. Such views havebeen found to be well recognized just as the other three-quarters were here (Vetteret al., 1994). One condition for the generation of virtual views is that pairs ofsymmetric feature points be identifiable in the image, as it is to these that thetransformations are applied. While this is perfectly possible for the three-quarterview it is not possible for the profile because the symmetric feature points willocclude each other (such a restriction would apply to other bilaterally symmetricobjects). Thus, generalization using symmetry from image data could only bepossible with three-quarter (such generalization would also be possible fromprofile, but only if knowledge about faces as a class was added to an algorithmwhich could then infer the location of the occluded points from the image data).

H. Hill et al. / Cognition 62 (1997) 201 –222 219

An alternative explanation is that the profile view is an ‘‘accidental’’ view in away that the three-quarter is not (Lowe, 1987). Moderate rotations away for thethree-quarter view will not result in qualitative changes in the 2D three-quarterprojections of the corresponding 3D face. Thus, this view may preserve 2Dproperties for recovering 3D information such as colinearity, curvilinearity,parallelism, symmetry and cotermination. However, the particular orientations ofprofile and full-face relative to the plane of symmetry is such that some of theseimage properties will be accidental properties of the viewpoint—for example, thelining up of symmetrical points in the profile. Non-accidental properties maytherefore provide reflectionally invariant cues supporting generalization fromthree-quarter to other three-quarter but not to the accidental viewpoints, full-faceand profile.

The addition of information about superficial properties in Experiment 3 clearlyaffected generalization from single views. Performance was slightly better overall,despite reduced presentation time showing superficial information provides usefulcues for recognition across changes in viewpoint. Some of these cues, for exampleskin color, may be viewpoint invariant and could inform generalization across anychange in view where such information was visible. However, generalizationbetween full-face and profile views was poor, suggesting that some superficialcues are viewpoint dependent. Information about pigmented features, like the eyesand eyebrows, which appears very different in these two views would be apossible candidate. Also, the similarity in the patterns of generalization betweenExperiments 2 and 3 suggests that the underlying 3D shape information remainscritical, although adding superficial information did affect patterns of generaliza-tion. For learned three-quarter performance was viewpoint independent whensuperficial cues were available. Superficial properties may support generalizationacross limited rotations in depth when, for example, the same pigment-definedcues would be visible. Also there was some evidence for enhanced generalizationfrom profile to other profile, suggesting that some useful superficial cues arereflectionally invariant.

In summary, while all the views used appear to provide sufficient informationfor recognition, at least when upright, generalization from individual views isdependent on learned viewpoint. Across all experiments, generalization from alearned full-face fell off with increasing angle of rotation, while the three-quarterview showed a peak in generalization performance for the opposite symmetricview. This effect appears less strong for the profile view, probably because of theparticular relationship of this view to the axis of symmetry. Availability ofinformation about superficial properties appears to reduce viewpoint dependency,although underlying three-dimensional shape information remains the primarydeterminant of patterns of generalization. Structure-from-motion as well as statictwo-dimensional cues appears to facilitate the perception of this three-dimensionalshape information. In conclusion, and to answer the question posed at thebeginning of the paper regarding recognizing someone from a picture, a coloredthree-quarter view of a person’s face may give the best chance of recognizing thatperson from a novel viewpoint.

220 H. Hill et al. / Cognition 62 (1997) 201 –222

Acknowledgments

This research describes a full-fledged version of the pilot experiments presented¨in Schyns and Bulthoff (1994), MIT AI MEMO [1432 and MIT CBCL Paper

[81, which was also presented at the XVI Meeting of the Cognitive ScienceSociety, Atlanta. Some of the work presented here was also reported at ECVP1995. The experiments were done at ATR Human Information ProcessingLaboratories, when Harold Hill was a research associate and Philippe Schyns aninvited scientist. The authors would like to thank Vicki Bruce, Tomaso Poggio,Mike Tarr and three anonymous reviewers for their insightful comments anddiscussions.

References

Adini, Y., Moses, Y. and Ullman, S., 1994. Face recognition: the problem of compensating for changesin illumination direction. Technical Report CS93-21. Rehovot: Weizmann Institute of Science.

Biederman, I., 1987. Recognition-by-components: A theory of human image understanding. Psycholog-ical Review, 94, 115–147.

Biederman, I. and Gerhardstein, P.C., 1993. Recognizing depth rotated objects: Evidence and conditionsfor three-dimensional viewpoint invariance. Journal of Experimental Psychology: Human Percep-tion and Performance, 18, 121–133.

Biederman, I. and Gerhardstein, P.C., 1995. Viewpoint dependent mechanisms in visual object¨recognition: Reply to Tarr and Bulthoff (1995). Journal of Experimental Psychology: Human

Perception and Performance, 21, 1506–1514.Biederman, I. and Ju G., 1988. Surface vs. edge-based determinants of visual recognition. Cognitive

Psychology, 20, 38–64.Brigham, J.C., 1986. The influence of race on face recognition. In: H.D. Ellis, M.A. Jeeves, F.

Newcombe and A.W. Young (Editors), Aspects of Face Processing. Martinus Nijhoff, Dordrecht.Bruce, V., 1982. Changing faces: Visual and non visual coding processes in face recognition. British

Journal of Psychology, 73, 105–116.Bruce, V. and Humphreys, G., 1994. Recognising Objects and Faces. Visual Cognition, 1(2 /3),

141–180.Bruce,V.,Valentine, T. and Baddeley A., 1987. The basis of the 3/4 view advantage in face recognition.

Applied Cognitive Psychology, 1, 109–120.¨Bulthoff, H.H. and Edelman, S., 1992. Psychophysical support for a two-dimensional view theory ofobject recognition. Proceedings of the National Academy of Science, USA, 89, 60–64.

Davies, G.M., Ellis, H.D. and Shepherd, J.W., 1978. Face recognition accuracy as a function of mode ofrepresentation. Journal of Applied Psychology, 63, 180–187.

deRenzi, E., 1986. Current issues in prosopagnosia. In: H.D. Ellis, M.A. Jeeves, F. Newcombe, andA.W. Young (Editors), Aspects of Face Processing. Martinus Nijhoff, Dordrecht.

Diamond, R. and Carey, S., 1986. Why faces are not special: An effect of expertise. Journal ofExperimental Psychology: General, 115, 107–117.

¨Edelman, S. and Bulthoff, H.H., 1992. Orientation dependence in the recognition of familiar and novelviews of three-dimensional objects. Vision Research, 32, 2385–2400.

Ellis, H. and Young, A., 1986. Specificity. In: H.D. Ellis, M.A. Jeeves, F. Newcombe and A.W. Young(Editors), Aspects of Face Processing. Martinus Nijhoff, Dordrecht.

Farah, M.J., 1991. Patterns of co-occurrence among the associative agnosias: Implications for visualobject representation. Cognitive Neuropsychology, 8, 1–19.

Farah, M.J., Rochlin, R. and Klein, K.L., 1994. Orientation invariance and geometric primitives inshape recognition. Cognitive Science, 18, 325–344.

H. Hill et al. / Cognition 62 (1997) 201 –222 221

Goren, C., Sarty, M. and Wu, R., 1975. Visual following and pattern discrimination of face-like stimuliby new-born infants. Paediatrics, 56, 544–549.

Hill, H., Bruce, V. and Akamatsu, S., 1995. Perceiving the sex and race of faces: the role of shape andcolour. Proceedings of the Royal Society of London, B 261, 367–373.

Hill, H. and Bruce, V., 1996. Effects of lighting on the perception of facial surfaces. Journal ofExperimental Psychology: Human Perception and Performance, 22, 986–1004.

Krouse, F.L., 1981. Effects of pose, pose change, and delay on face recognition performance. Journalof Applied Psychology, 66, 651–654.

Logie, R.H., Baddeley, A.D. and Woodhead, M.M., 1987. Face recognition, pose, and ecologicalvalidity. Applied Cognitive Psychology, 1, 53–69.

¨Logothetis, N.K., Pauls, J., Bulthoff, H.H. and Poggio, T., 1994. Viewpoint-dependent objectrecognition by monkeys. Current Biology, 4, 401–414.

Lowe, D., 1987. Three-dimensional object recognition from single two-dimensional images. ArtificialIntelligence, 31, 355–395.

Marr, D., 1982. Vision. Freeman, San Francisco, CA.Marr, D. and Nishihara, H., 1978. Representation and recognition of the spatial organization of

three-dimensional shapes. Proceedings of the Royal Society of London, B 200, 269–294.Morton, J. and Johnson, M., 1991. The perception of facial structure in infancy. In: G.R. Lockhead and

J.R. Pomerantz (Editors), The Perception of Structure. American Psychological Association,Washington.

McCarthy, R.A. and Warrington, E.K., 1986. Visual associative agnosia: A clinico–anatomical study ofa single case. Journal of Neurology, Neurosurgery and Psychiatry, 49, 1233–1240.

Palmer, S., Rosch, E. and Chase, P., 1981. Canonical perspective and the perception of objects. In: J.Long and A. Baddeley (Editors), Attention and Performances IX. Lawrence Erlbaum, Hillsdale, NJ.

Palmer, S., 1983. The psychology of perceptual organization: A transformational approach. In: J. Beck,B Hope and A. Rosenfeld (Editors), Human and Machine Vision. Academic Press: New York.

Perrett, D., Hietenan, J., Oram, M. and Benson, P., 1992. Organization and function of cells responsiveto faces in the temporal cortex. Philosophical Transactions of the Royal Society of London B, 335,23–30.

Perrett, D.I., Oram, M.W., Harries, M.H., Bevan, R., Benson, P.J. and Thomas, S., 1991.Viewer-centredand object-centred coding of heads in the macaque temporal cortex. Experimental Brain Research,86, 159–173.

Perrett, D.I., Smith, P.A.J., Potter D.D., Mistlin, A.J., Head, A.S., Milner A.D. and Jeeves, M.A., 1985.Visual cells in the temporal cortex sensitive to face view and gaze direction. Proceedings of theRoyal Society of London, B 233, 293–317.

Poggio, T. and Edelman, S., 1990. A network that learns to recognize three-dimensional objects.Nature, 343, 263–266.

Poggio, T. and Vetter, T., 1992. Recognition and structure from one 2D model-view: Observations onprototypes, object classes, and symmetries. AI MEMO [ 1347, Artificial Intelligence Laboratory,MIT, Cambridge, MA.

Price, C. and Humphreys, G., 1989. The effects of surface detail on object categorization and naming.Quarterly Journal of Experimental Psychology, 41A, 797–828.

Rock, I. and Di Vita, J., 1987. A case of viewer-centered object representation. Cognitive Psychology,19, 280–293.

¨Schyns, P.G. and Bulthoff, H.H., 1994. Viewpoint dependence and face recognition. Proceedings of theXVI Meeting of the Cognitive Science Society, 789–793.

Schyns, P.G. and Murphy, G.L., 1994. The ontogeny of part representation in object concepts. ThePsychology of Learning and Motivation, 31, 305–349.

Shepherd, R.N. and Cooper, L.A., 1982. Mental Images and their Transformations. MIT Press,Cambridge, MA.

Snodgras, J.G. and Corwin, J., 1988. Pragmatics of measuring recognition memory: Applications todementia and amnesia. Journal of Experimental Psychology: General, 117(1), 34–50.

¨Tarr, M.J. and Bulthoff, H.H., 1995. Is human object recognition better described by geon–structural-descriptions or by multiple-views? Comment on Biederman and Gerhardstein (1993). Journal ofExperimental Psychology: Human Perception and Performances, 21, 1494–1505.

222 H. Hill et al. / Cognition 62 (1997) 201 –222

Tarr, M.J. and Pinker, S., 1989. Mental rotation and orientation-dependence in shape recognition.Cognitive Psychology, 21, 233–282.

Tarr, M.J. and Pinker, S., 1990. When does human object recognition use a viewer-centered referenceframe? Psychological Science, 1, 253–256.

¨Troje, N.F. and Bulthoff, H.H., 1995. Face recognition under varying pose: The role of texture and¨ ¨shape. Technical Report, Max Planck Institut fur biologische Kybernetik, Arbeitsgruppe Bulthoff,

No. 17.Ullman, S., 1989. Aligning pictorial descriptions: An approach to object recognition. Cognition, 32,

193–254.Ullman, S. and Basri, R., 1991. Recognition by linear combinations of models. IEEE Transactions on

Pattern Analysis and Machine Intelligence, 13, 992–1005.¨Vetter, T., Poggio, T. and Bulthoff, H.H., 1994. The importance of symmetry and virtual views in

three-dimensional object recognition. Current Biology, 4, 18–23.Watanabe, Y. and Suenaga, Y., 1991. Synchronized acquisition of three-dimensional range and color

data and its applications. In: M. Patrikalakis (Editor), Scientific Visualization of Physical Phenom-ena. Springer-Verlag, Tokyo.

Wurm, L., Legge, G.E., Isenberg, L.M. and Luebker, A., 1993. Color improves object identification inlow and normal vision. Journal of Experimental Psychology: Human Perception and Performance,19, 899–911.

Yin, R., 1969. Looking at upside-down faces. Journal of Experimental Psychology, 81, 141–145.