EEG Analysis of Implicit Human Visual Perception...EEG Analysis of Implicit Human Visual Perception...

EEG Analysis of Implicit Human Visual PerceptionMaryam MustafaTU Braunschweig

[email protected]

Lea LindemannTU Braunschweig

[email protected]

Marcus MagnorTU Braunschweig

[email protected]

ABSTRACT

Image Based Rendering (IBR) allows interactive scene ex-ploration from images alone. However, despite considerabledevelopment in the area, one of the main obstacles to bet-ter quality and more realistic visualizations is the occurrenceof visually disagreeable artifacts. In this paper we present amethodology to map out the perception of IBR-typical arti-facts. This work presents an alternative to traditional imageand video quality evaluation methods by using an EEG deviceto determine the implicit visual processes in the human brain.Our work demonstrates the distinct differences in the percep-tion of different types of visual artifacts and the implicationsof these differences.

Author Keywords

Perception;artifacts;EEG;visual processing;HVS;

ACM Classification Keywords

H.5.m. Information Interfaces and Presentation (e.g. HCI):Miscellaneous.

INTRODUCTION

The advantage of perceptual based graphics has been appar-ent for a long time, and while a considerable amount of workhas been done in measuring conscious, cognitive processing[13, 2, 1], much less has been done in Computer Graphicsto take advantage of covert(brain) measurements and visualprocessing. In this paper we present a new approach thatuses an electroencephalograph (EEG) to interface with thehuman brain and measure the output of implicit (covert) vi-sual processing in the brain in response to artifacts in imagesequences. The way videos are perceived by people is becom-ing increasingly important in visual media. Image and videobased rendering techniques allow for the creation of compli-cated 3D scenes and videos from a few images. The ubiqui-tous use of 3D cinema, affordable display technology, and themerging of real world scenes with computer graphics allowsfor the creation and pervasive use of realistically rendered im-ages and videos for movies such as Avatar. Similarly appli-cations like Google Street View use a sparse set of imagesto create complex visualizations, and Microsoft photosynthe-sis uses image based rendering to transition between images

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.CHI’12, May 5–10, 2012, Austin, Texas, USA.

Copyright 2012 ACM 978-1-4503-1015-4/12/05...$10.00.

Figure 1. Left: ghosting. Right: dashed: area for static blurring and

popping, solid: moving frame for blurring and popping on person.

[12]. However, one of the main areas that still require a lotof research is the assessment and perception of the renderedoutput. Rendering systems can now closely approximate areal scene but physical or statistical accuracy does not neces-sarily amount to visual accuracy. During rendering visuallyobjectionable artifacts can arise which limit the applicationof most rendering algorithms. The most common artifactsthat occur in rendered images are ghosting, blurring and pop-ping [13]. In this paper we perform an experiment to analyzethe perception of these artifacts. Traditionally the quality ofa video has been judged by either user-studies or the use ofquality assessment algorithms. Apart from requiring a largenumber of participants, user-studies can at best only measurethe explicit output of the visual cognitive process [5].Qualityratings obtained by user studies are always filtered by somedecision process which, in turn, may be influenced by the taskand/or rating scale the participants are given [9]. The subjec-tive judgment of a viewer is often biased by external factorssuch as mood, expectation or past experience.While currentwork in computer vision and graphics focuses on the explicit(overt)output of the human visual system(HVS) we proposeto use the implicit(covert) processing of the HVS [4] to deter-mine the perception and quality of a video. There is extensiveliterature dealing with implicit and explicit processing in thehuman brain [10]. Koch and Tsuchiya [4] discuss the evi-dence showing that visual processing can occur without con-scious perception or attention and that conscious awarenessof a stimulus is preceded by complex visual decision makingprocesses. We propose to use this covert processing in thehuman brain to assess the quality of a video with artifacts. Inour study we also specifically chose to look at motion in ren-dered sequences as motion plays an important role in percep-tion owing to effects such as speed and direction of motion,visual tracking of moving objects and motion saliency. Giventhe complexity of the HVS in perceiving motion, traditionalsystems based on it have not been able to effectively modeltemporal aspects of human vision [9].

There are several advantages to developing methodologiesthat merge covert human visual processing with traditionalcomputer graphics and vision techniques. First the analysisof visual information processing that occurs in the absence ofconscious attention will allow boosting of traditional maskingand rendering algorithms and introduce the robustness andflexibility associated with human vision. Secondly, such ananalysis of the covert visual processes reveals aspects of howthese artifacts are viewed by the HVS that have not yet beenaccurately modeled by computer vision algorithms. Giventhat it is more important for rendered images and videos tobe perceptually accurate as opposed to physically accurate,rendering times can be shortened by eliminating computa-tions that calculate image features that are not apparent tothe human eye [8]. Computer graphics algorithms can takeadvantage of the limitations of the human eye to compute justenough detail in an image or video to convince the observerwithout a perceived deterioration in quality. Specifically, ourmain contribution is a methodology for the analysis of ar-tifacts in videos using an EEG to interface with the humanbrain and mine the covert visual processing.

In this paper we present the design and results from our ex-periments with video stimuli containing five different types ofartifacts. We present an analysis and comparison of the covertvisual decision making about the video quality obtained fromthe EEG with the conscious video quality judgment obtainedvia direct user feedback which serve as initial proof of con-cept for our ideas.

RELATED WORK

There has been recent interest in studying visual processingfor image rendering and analysis techniques [2]. Howeververy little work has been done in using implicit visual de-cision making processes for video assessment. Shenoy et al.[10] present the idea of Human-Aided Computing which usesan EEG to label images implicitly. They use brain processesto show that users can implicitly categorize pictures basedon content. Their work however required users to memorizethe images and to be attentive to the content viewed. Ourwork looks to analyze the implicit visual processing behindviewing videos with motion and not static images. Similarly,Vangorp et al. [13] conduct psychophysical experiments tounderstand the perception of artifacts in synthetic rendering.They looked at the user feedback from rendered sequencesthat moved over facades of buildings. This work focused onthe output of conscious visual cognitive processes.

Recently Lindemann et al. [6, 7] have been using an EEG toassess the quality of compressed images and video artifacts.They reported that when shown different images of decreas-ing quality the participants EEG results showed correspond-ing detected changes in image quality. Their work showedthat the brain response varied with the image compressionvalue.

EXPERIMENT

This experiment was designed to measure the covert (im-plicit) visual processing associated with three basic types ofartifacts that typically occur in image based rendering. 8 (3

male, 5 female) healthy participants with an average age of25 and with normal or corrected-to-normal vision took partin the experiment. All participants had average experiencewith digital footage and no involvement in professional im-age/video rendering or editing. The basic stimulus for theexperiment was a 5.6 second video (resolution: 1440x1024,30 fps) of a person walking along a park trail from left toright. The occurrence of the artifact was delayed by 4 frames( 132ms) to avoid locking the participants attention to a fixedtime. Five different kinds of artifacts were incorporated intothe scene. These artifacts included both temporal and spatialaspects. The following 6 test cases were shown (Figure 1):

1. Popping on Person: a small rectangular area (marked withsolid line in Figure 1) containing the walking person freezesfor one frame

2. Popping: A static rectangular area of the image (markedwith dashed line in Figure 1) freezes for one frame.

3. Blurring on person: a small rectangular area containing thewalking person is blurred with a Gaussian kernel with asize of 15 pixels in 10 successive frames. The blurring areamoves along with the motion of the person

4. Blurring: A static rectangular area (Figure 1) in the centerof the scene is blurred with a Gaussian kernel with a sizeof 15 pixels in 10 successive frames.

5. Ghosting on Person: A partly transparent silhouette of theperson stays behind for 10 frames, fading to invisibility inthe last 5 frames (left part of Figure 1).

6. Ground truth: No artifacts.

One trial consisted of a ready screen followed by the videowith artifacts which was instantly followed by the quality as-sessment screen. Participants were instructed to follow themoving person with their gaze and rate the quality of everytest case on an integer 1(worst) to 5(best) mean opinion score(MOS) scale [3]. The participants were not informed aboutthe presence of artifacts in the videos.

They were instructed orally and received a training in whichevery one of the 6 videos was shown 3 times. This preparedfor the procedure and showed the whole range of availablevideo qualities. During the main experiment all videos wereshown 30 times resulting in 180 trials per participant. Thevideos were played in a block-wise randomized order and thesame video was not shown twice in a row.

An EEG was recorded with 32 electrodes attached accordingto the international 10-20 system. Additionally a 4 channelEOG and mastoids were recorded which were used as a ref-erence to remove data with accidental eye movements. Therecorded data were referenced to the mastoids and filteredwith a high-pass filter with a cutoff frequency of 0.1 Hz toremove DC-offset and drifts. Trials of a length of 1.2 secondstime locked to the appearance of the artifact occurrence wereextracted from the continuous data. All trials with blinks,severe eye movements and too many alpha waves were man-ually removed.

Figure 2. The ERPs for artifacts Popping and Popping on Person , Blurring, Blurring on Person, and no artifact for comparison.

Figure 3. The ERPs for artifacts Ghosting and no artifact for compari-son.

The eye movements from watching the video were the samefor all trials and all artifacts over all participants. All videoswere exactly the same including the reference video (groundtruth) and induced identical eye movements. Due to the iden-tical nature of the stimulus the results can be compared.

RESULTS

Figures 2, and 3 show the different ERPs averaged over allparticipants over all trials and over electrodes PO4, PO3 andOz and as compared with the no artifact ERP with time 0 cor-responding to the appearance of the artifact. Firstly, all arti-facts were detected by the brain, albeit with varying strength.The artifact which evoked the greatest response was poppingon the moving person which has a latency of 264ms and reachesa maximum amplitude of 5.758µV. This is followed closelyby blurring on the moving person. Static popping and blur-ring evoke a smaller response. Popping is a more obviouslyperceived artifact and evokes a quicker response. Ghostinghowever requires the brain to process the perceived distortionbefore a response occurs. This latency due to processing ofthe perceived stimuli is also seen with blurring, which is alsoa less obvious artifact. Table 1 shows the detailed latencyand maximum potential responses for all artifacts. From boththe ERP figures and Table 1 the second result that becomesclear is the difference in the perception of artifacts related tomotion and those independent of motion. Both popping andblurring linked with the motion of the person produce a muchlarger response potential than popping and blurring not linkedwith the motion of the person.

Figure 4. Participants average responses to videos

Figure 4 shows an average of the participant responses forall trials for each video. As can be seen participants ratedthe quality of the video with the ghosting artifact as the bestafter ground truth. This is in contradiction with what theERPs showed where although the latency of both blurringand ghosting was the same the ghosting artifact evoked a re-sponse with a maximum potential of 3.082µV as opposed to2.681µV for blurring. The most obvious difference betweenthe explicit quality rating and the implicit brain reaction canbe seen with the Popping on Person and Popping artifact. Par-ticipants rated the two with almost the same rating (1-1.5)whereas the ERPs show a marked difference in the responseof the brain to either artifact. This same difference is alsoobserved in the two blurring artifacts. The participants ratedthem both equally whereas the ERPs show a marked differ-ence between the two (5.58µV and 2.68µV) with the samelatency.

We also ran a standard two-tailed t-test to confirm the valid-ity of the curves as shown in Figures 2 and 3. For all artifactcases the two tailed t-test was run for the stimuli time pe-

Artifacts Peak Latency Peak Amplitude(ms) (µV)

Popping on pers. 264 5.758Popping 200-250 3.928

Blurring on pers. 400 5.58Blurring 400 2.681Ghosting 400 3.082

Table 1. Peak latency and amplitude for all artifacts.

riod against the same ground truth time period. In all casesthe null hypothesis was rejected. The p-values were poppingp=4.5 × 10−9, popping on person p=1.1 × 10−36, ghostingp=5.89 × 10−5, blurring p=0.0031 and blurring on personp=3.14 × 10−13. Given the rejection of the null hypothesisin all cases and the fact that the probabilities are all less than0.05(5%) there is sufficient evidence for the statistical signifi-cance of the results. We also ran the two-tailed t-test betweenpopping and popping on person and blurring against blurringon person. Both of these tests also rejected the null hypothesiswith p-values of p=6.3 × 10−12 and p=5.03 × 10−5. Giventhese values we can safely assume a statistically significantdifference between the responses over all artifacts.

To determine the deviation of individual participants from theaverage we also ran a two-tailed t-test of one randomly se-lected participant against the averaged results of all the otherparticipants. For all stimuli the null hypothesis was acceptedand the p-values were popping p =0.14, popping on personp=0.854, blurring=0.85, blurring on person=0.97 and ghost-ing p=0.74. Given these p-values and the acceptance of thenull hypothesis it is clear that any one participants responseis close to the average. There are no statistically significantdifferences.

The analysis of the ERPs from this experiment also indicatea potential emotional reaction to the artifacts. Previous datafrom EEG studies and emotion has provided evidence of lat-eralization of emotion in the frontal cortex [11]. This theorypredicts right hemisphere dominance for negative emotions.The results from our experiment show an increased output inthe right frontal cortex for test cases with more severe artifactswhere the maximum output was for popping on the movingperson. This can theoretically be explained by the negativeemotions elicited by bad video quality.

CONCLUSION

We showed that the covert (implicit) and overt (explicit) out-put of the human visual processing does differ and in somecases the difference is striking. We also showed that the brainresponds very differently to not only different types of arti-facts but also to artifacts specifically linked with motion. Ar-tifacts linked with motion evoke a much larger response inthe brain. We also showed that it is possible to categorizeartifacts based on how they are perceived. This provides in-formation on the perception of videos which has previouslynot been modeled. This also creates the possibility of short-ening rendering times by eliminating computations that cal-culate image features which do not evoke a strong reaction inthe brain as opposed to those which do. The brains responseto artifacts is also essential for the modeling of masking al-gorithms for rendered image sequences. While the currentexperimental setup provides new and relevant information ithas some limitations. The main issue we see is the absence ofeye movement information. Eye tracking would allow us toincorporate information regarding the exact viewing patternof the participants during stimuli presentation. A more com-plete picture of participants eye gaze pattern during stimulipresentation is essential for advances in realistic image andvideo synthesis. Also using sensors to capture physiological

data would provide more concrete information regarding theparticipants emotional state during trials.

ACKNOWLEDGMENTS

This work was funded in part by ERC grant #256941 ‘RealityCG’.

REFERENCES

1. Anderson, E., Potter, K., Matzen, L., Shepherd, J.,Preston, G., and Silva, C. A user study of visualizationeffectiveness using EEG and cognitive load. ComputerGraphics Forum 30 (2011), 791–800.

2. Ferwerda, A. J., Shiley, P., Pattanaik, N., and Greenberg,P. A model of visual adaptation for realistic imagesynthesis. In Proc. ACM SIGGRAPH (1996), 249–258.

3. International Telecommunication Union. Mean opinionscore (MOS) terminology. In ITU-T Recommendation(2006), P.800.1.

4. Koch, C., and Tsuchiya, N. Attention andconsciousness: two distinct brain processes. Trends inCognitive Sciences 11, 1 (2007), 16–22.

5. Korsar, R., Healey, C. G., Interrante, V., Laidlaw, D. H.,and Ware, C. Thoughts on user studies: Why, how, andwhen. Computer Graphics and Applications 23, 4(2003), 20–25.

6. Lindemann, L., and Magnor., M. Assessing the qualityof compressed images using EEG. In Proc. IEEEInternational Conference on Image Processing (ICIP)(2011), 3170–3173.

7. Lindemann, L., S., W., and M., M. Evaluation of videoartifact perception using event-related potentials. ACMApplied Perception in Computer Graphics andVisualization (APGV) (2011), 1–5.

8. McNamara, A., Mania, K., Banks, M., and Healey, C.Perceptually-motivated graphics, visualization and 3Ddisplays. In Proc. ACM SIGGRAPH (2010), 1–159.

9. Ponomarenko, N., Lukin, V., Zelensky, A., Egiazarian,K., Astola, J., Carli, M., and Battisti, F. A database forevaluation of full-reference visual quality assessmentmetrics. Advances of Modern Radioelectronics 10, 10(2009), 30–45.

10. Shenoy, P., and Tan, D. Human-aided computing:utilizing implicit human processing to classify images(2008). 845–854.

11. Silberman, E., and Weingartner, H. Hemisphericlateralisation of functions related to emotion. Brain andCognition 5 (1986), 322–353.

12. Snavely, N., Seitz, M., and Szeliski, R. Photo tourism:exploring photo collections in 3D. ACM Trans. Graph.25 (2006), 835–846.

13. Vangorp, P., Chaurasia, G., Laffont, P.-Y., Fleming,R. W., and Drettakis, G. Perception of visual artifacts inimage-based rendering of facades. Computer GraphicsForum 30 (2011), 1241–1250.

EEG Analysis of Implicit Human Visual Perception...EEG Analysis of Implicit Human Visual Perception...

Documents

Transcript of EEG Analysis of Implicit Human Visual Perception...EEG Analysis of Implicit Human Visual Perception...