Tracked Captioning: Improving Engagement for Deaf ......Bercan, Karina Kushalnagar, Raja S. REU...

Figure 1: RTTD/Tracked Captioning system

● Microsoft Kinect 2● Projector● Windows laptop/computer● C-Print with captionist

Tracked Captioning: Improving Engagement for Deaf Audiences During Live DiscussionsBercan, KarinaKushalnagar, Raja S.REU Accessible Multimodal Interfaces Program Site, Rochester Institute of Technology

Figure 2: Tracked Captioning DemonstrationDeaf audiences often rely on captioning or on interpreters to translate spoken English into American Sign Language during presentations and lectures. In order to follow a discussion, d/Deaf audience members shift their attention from the captions or from the interpreter to each speaker and to presentation slides, resulting in eye fatigue, distraction, and decreased engagement. To minimize the distances between the audience’s visual points of interest while juggling the presence of multiple speakers, Tracked Captioning recognizes a person’s step forward as a request to speak and displays captions above them.

This study builds on research by Kushalnagar, et al. to implement Tracked Captioning for settings with multiple speakers and to enhance the experience of d/Deaf audience members. This research in tracked captioning technology will expand access to panels and presentations, in addition to increasing engagement by all audience members.

Abstract

DEVELOPMENT/SETUP

Tracked Captioning uses the same equipment as RTTD, as shown in Figure 1. It has two modes, a panel setting (Figure 2d) and a presentation setting (Figure 2c), though this poster focuses on the evaluation of the presentation mode. In a setting where the speakers are standing or otherwise moving around on stage and giving a planned lecture, the program detects which speaker is closest to the Kinect, in terms of the Z-coordinate.

EVALUATION PROCEDURE

1. Participants watch a version of A simple way to break a bad habit, a TED Talk by Judson Brewer, adapted for two speakers in presentation-style

○ 1st half uses traditional captioning○ 2nd half uses Tracked Captioning

2. Participants watch a version of “How Sleepwalking Works” from the podcast Stuff You Should Know, adapted for two speakers in panel-style

○ 1st half uses Tracked Captioning○ 2nd half uses traditional captioning

3. Participants take an evaluation survey

Methods

Contact Karina Bercan; [email protected]

This work has been generously supported by an NSF REU Site Grant (#1460894).

Funding

1. M. W. G. Dye, D. E. Baril, and D. Bavelier. (2007). Which aspects of visual attention are changed by deafness? The case of the Attentional Network Test. Neuropsychologia, 45(8), 1801-1811.

2. M. W. G. Dye, P. C. Hauser, and D. Bavelier. (2008). Visual attention in deaf children and adults. In M. Marschark and P. C. Hauser (Eds.), Deaf cognition: Foundations and outcomes, 250-263. Oxford University Press.

3. R. S. Kushalnagar, G. W. Behm, A. W. Kelstone, and S. Ali. (2015). Tracked Speech-To-Text Display: Enhancing Accessibility and Readability of Real-Time Speech-To-Text. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (pp. 223-230). New York, NY, USA: ACM.

4. R. S. Kushalnagar, P. Kushalnagar, and J. B. Pelz. (2012). Deaf and Hearing Students' Eye Gaze Collaboration. In K. Miesenberger, A. Karshmer, and P. Penaz (Eds.), Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I (pp. 92-99). Berlin, Heidelberg: Springer Berlin Heidelberg.

References

Tracked Captioning is a fair alternative to traditional captioning. Empirical data suggests that Tracked Captioning slightly improves d/Deaf audience members’ experiences with live presentations by more than one speaker over traditional captioning. It helps them more easily identify the speaker, understand the discussion, and follow the discussion. Additionally, it facilitates their engagement and involvement in the discussion. Overall, Tracked Captioning bolsters the experience of d/Deaf audiences without impeding the experience of hearing audiences, supporting it as a universal technology rather than simply an access technology.

However, the sample size used in this study may be too small and too limited to draw confident conclusions. The results may be more indicative of each participant’s personal preferences than of the superiority of either captioning technique. Moreover, the ambiguous wording of the survey questions in conjunction with the lack comprehension questions to evaluate understanding make it difficult to identify the exact strengths and weaknesses of Tracked Captioning. On the other hand, it is clear from participant feedback that the speed of Tracked Captioning was too slow when switching between speakers.

Conclusions

During discussion by hearing presenters, d/Deaf audience members participate and understand with the help of interpreters, who translate to and from spoken English and American Sign Language (ASL), or with the help of captions. A common method of live captioning is called C-Print, where a trained captionist transcribes speech using a laptop. The person using the captioning services watches this transcription on their own computer to follow along. In contrast to hearing audiences who listen to speakers while watching for body language cues and reading presentation slides, d/Deaf audiences must multitask during lectures, regardless of which accommodation is used.

Juggling various information streams and shifting focus repeatedly causes d/Deaf viewers to miss out on content, to get tired, and to get distracted. Additionally, they can feel left out of the conversation and as though they do not grasp the material. This disconnect between hearing presenters and d/Deaf audience members hinders the community, causing misunderstanding and miscommunication, a gap that can isolate d/Deaf people from their professional and educational communities.

Background

The basis of Tracked Captioning comes from Real-Time Text Display (RTTD) developed by Kushalnagar, et al. for classroom use [3]. RTTD is a caption display method which tracks a single speaker moving across a classroom and projects captions above them. The system is designed to be portable, easy to set-up, and low-cost, implementing a Microsoft Kinect 2 to track the position of the speaker. A C-Print captionist transcribes the captions, a projector displays the speech as text, and a computer or laptop provides computing power, all shown in Figure 1.

Kushalnagar, et al. showed RTTD to be effective captioning in the classroom setting, improving students’ ability to follow along with a lecture and to understand lecture content over traditional captioning. However, the system is optimized for one speaker only. In this study, we expand the capabilities of RTTD to accommodate more than one user and evaluate the effectiveness of these enhancements for two-person discussions and presentations.

Real-Time Text Display

(a) Experiment room set-up with the Tracked Captioning system in the center

(b) Traditional captioning; static, without tracking

(c) Tracked Captioning in Presentation Mode

(d) Tracked Captioning in Panel Mode

Results

Tracked Captioning

Easier to focus on the discussion

Traditional Captioning

Easier to understand the discussion

Easier to understand the discussion

Easier to focus on the discussion

Easier to ID speaker

Easier to follow the discussion

More involving or engaging

Preferred for future discussions

hearing

d/D

eaf a

nd h

eari

ng

d/Deaf

Tracked Captioning: Improving Engagement for Deaf ......Bercan, Karina Kushalnagar, Raja S. REU...

Documents

Transcript of Tracked Captioning: Improving Engagement for Deaf ......Bercan, Karina Kushalnagar, Raja S. REU...