Tracked Captioning: Improving Engagement for Deaf ......Bercan, Karina Kushalnagar, Raja S. REU...

1
Figure 1: RTTD/Tracked Captioning system Microsoft Kinect 2 Projector Windows laptop/computer C-Print with captionist Tracked Captioning: Improving Engagement for Deaf Audiences During Live Discussions Bercan, Karina Kushalnagar, Raja S. REU Accessible Multimodal Interfaces Program Site, Rochester Institute of Technology Figure 2: Tracked Captioning Demonstration Deaf audiences often rely on captioning or on interpreters to translate spoken English into American Sign Language during presentations and lectures. In order to follow a discussion, d/Deaf audience members shift their attention from the captions or from the interpreter to each speaker and to presentation slides, resulting in eye fatigue, distraction, and decreased engagement. To minimize the distances between the audience’s visual points of interest while juggling the presence of multiple speakers, Tracked Captioning recognizes a person’s step forward as a request to speak and displays captions above them. This study builds on research by Kushalnagar, et al. to implement Tracked Captioning for settings with multiple speakers and to enhance the experience of d/Deaf audience members. This research in tracked captioning technology will expand access to panels and presentations, in addition to increasing engagement by all audience members. Abstract DEVELOPMENT/SETUP Tracked Captioning uses the same equipment as RTTD, as shown in Figure 1. It has two modes, a panel setting (Figure 2d) and a presentation setting (Figure 2c), though this poster focuses on the evaluation of the presentation mode. In a setting where the speakers are standing or otherwise moving around on stage and giving a planned lecture, the program detects which speaker is closest to the Kinect, in terms of the Z-coordinate. EVALUATION PROCEDURE 1. Participants watch a version of A simple way to break a bad habit, a TED Talk by Judson Brewer, adapted for two speakers in presentation-style 1st half uses traditional captioning 2nd half uses Tracked Captioning 2. Participants watch a version of “How Sleepwalking Works” from the podcast Stuff You Should Know, adapted for two speakers in panel-style 1st half uses Tracked Captioning 2nd half uses traditional captioning 3. Participants take an evaluation survey Methods Contact Karina Bercan; [email protected] This work has been generously supported by an NSF REU Site Grant (#1460894). Funding 1. M. W. G. Dye, D. E. Baril, and D. Bavelier. (2007). Which aspects of visual attention are changed by deafness? The case of the Attentional Network Test. Neuropsychologia, 45(8), 1801-1811. 2. M. W. G. Dye, P. C. Hauser, and D. Bavelier. (2008). Visual attention in deaf children and adults. In M. Marschark and P. C. Hauser (Eds.), Deaf cognition: Foundations and outcomes, 250-263. Oxford University Press. 3. R. S. Kushalnagar, G. W. Behm, A. W. Kelstone, and S. Ali. (2015). Tracked Speech-To-Text Display: Enhancing Accessibility and Readability of Real-Time Speech-To-Text. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (pp. 223-230). New York, NY, USA: ACM. 4. R. S. Kushalnagar, P. Kushalnagar, and J. B. Pelz. (2012). Deaf and Hearing Students' Eye Gaze Collaboration. In K. Miesenberger, A. Karshmer, and P. Penaz (Eds.), Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I (pp. 92-99). Berlin, Heidelberg: Springer Berlin Heidelberg. References Tracked Captioning is a fair alternative to traditional captioning. Empirical data suggests that Tracked Captioning slightly improves d/Deaf audience members’ experiences with live presentations by more than one speaker over traditional captioning. It helps them more easily identify the speaker, understand the discussion, and follow the discussion. Additionally, it facilitates their engagement and involvement in the discussion. Overall, Tracked Captioning bolsters the experience of d/Deaf audiences without impeding the experience of hearing audiences, supporting it as a universal technology rather than simply an access technology. However, the sample size used in this study may be too small and too limited to draw confident conclusions. The results may be more indicative of each participant’s personal preferences than of the superiority of either captioning technique. Moreover, the ambiguous wording of the survey questions in conjunction with the lack comprehension questions to evaluate understanding make it difficult to identify the exact strengths and weaknesses of Tracked Captioning. On the other hand, it is clear from participant feedback that the speed of Tracked Captioning was too slow when switching between speakers. Conclusions During discussion by hearing presenters, d/Deaf audience members participate and understand with the help of interpreters, who translate to and from spoken English and American Sign Language (ASL), or with the help of captions. A common method of live captioning is called C-Print, where a trained captionist transcribes speech using a laptop. The person using the captioning services watches this transcription on their own computer to follow along. In contrast to hearing audiences who listen to speakers while watching for body language cues and reading presentation slides, d/Deaf audiences must multitask during lectures, regardless of which accommodation is used. Juggling various information streams and shifting focus repeatedly causes d/Deaf viewers to miss out on content, to get tired, and to get distracted. Additionally, they can feel left out of the conversation and as though they do not grasp the material. This disconnect between hearing presenters and d/Deaf audience members hinders the community, causing misunderstanding and miscommunication, a gap that can isolate d/Deaf people from their professional and educational communities. Background The basis of Tracked Captioning comes from Real-Time Text Display (RTTD) developed by Kushalnagar, et al. for classroom use [3]. RTTD is a caption display method which tracks a single speaker moving across a classroom and projects captions above them. The system is designed to be portable, easy to set-up, and low-cost, implementing a Microsoft Kinect 2 to track the position of the speaker. A C-Print captionist transcribes the captions, a projector displays the speech as text, and a computer or laptop provides computing power, all shown in Figure 1. Kushalnagar, et al. showed RTTD to be effective captioning in the classroom setting, improving students’ ability to follow along with a lecture and to understand lecture content over traditional captioning. However, the system is optimized for one speaker only. In this study, we expand the capabilities of RTTD to accommodate more than one user and evaluate the effectiveness of these enhancements for two-person discussions and presentations. Real-Time Text Display (a) Experiment room set-up with the Tracked Captioning system in the center (b) Traditional captioning; static, without tracking (c) Tracked Captioning in Presentation Mode (d) Tracked Captioning in Panel Mode Results Tracked Captioning Easier to focus on the discussion Traditional Captioning Easier to understand the discussion Easier to understand the discussion Easier to focus on the discussion Easier to ID speaker Easier to follow the discussion More involving or engaging Preferred for future discussions hearing d/Deaf and hearing d/Deaf

Transcript of Tracked Captioning: Improving Engagement for Deaf ......Bercan, Karina Kushalnagar, Raja S. REU...

  • Figure 1: RTTD/Tracked Captioning system

    ● Microsoft Kinect 2● Projector● Windows laptop/computer● C-Print with captionist

    Tracked Captioning: Improving Engagement for Deaf Audiences During Live DiscussionsBercan, KarinaKushalnagar, Raja S.REU Accessible Multimodal Interfaces Program Site, Rochester Institute of Technology

    Figure 2: Tracked Captioning DemonstrationDeaf audiences often rely on captioning or on interpreters to translate spoken English into American Sign Language during presentations and lectures. In order to follow a discussion, d/Deaf audience members shift their attention from the captions or from the interpreter to each speaker and to presentation slides, resulting in eye fatigue, distraction, and decreased engagement. To minimize the distances between the audience’s visual points of interest while juggling the presence of multiple speakers, Tracked Captioning recognizes a person’s step forward as a request to speak and displays captions above them.

    This study builds on research by Kushalnagar, et al. to implement Tracked Captioning for settings with multiple speakers and to enhance the experience of d/Deaf audience members. This research in tracked captioning technology will expand access to panels and presentations, in addition to increasing engagement by all audience members.

    Abstract

    DEVELOPMENT/SETUP

    Tracked Captioning uses the same equipment as RTTD, as shown in Figure 1. It has two modes, a panel setting (Figure 2d) and a presentation setting (Figure 2c), though this poster focuses on the evaluation of the presentation mode. In a setting where the speakers are standing or otherwise moving around on stage and giving a planned lecture, the program detects which speaker is closest to the Kinect, in terms of the Z-coordinate.

    EVALUATION PROCEDURE

    1. Participants watch a version of A simple way to break a bad habit, a TED Talk by Judson Brewer, adapted for two speakers in presentation-style

    ○ 1st half uses traditional captioning○ 2nd half uses Tracked Captioning

    2. Participants watch a version of “How Sleepwalking Works” from the podcast Stuff You Should Know, adapted for two speakers in panel-style

    ○ 1st half uses Tracked Captioning○ 2nd half uses traditional captioning

    3. Participants take an evaluation survey

    Methods

    Contact Karina Bercan; [email protected]

    This work has been generously supported by an NSF REU Site Grant (#1460894).

    Funding

    1. M. W. G. Dye, D. E. Baril, and D. Bavelier. (2007). Which aspects of visual attention are changed by deafness? The case of the Attentional Network Test. Neuropsychologia, 45(8), 1801-1811.

    2. M. W. G. Dye, P. C. Hauser, and D. Bavelier. (2008). Visual attention in deaf children and adults. In M. Marschark and P. C. Hauser (Eds.), Deaf cognition: Foundations and outcomes, 250-263. Oxford University Press.

    3. R. S. Kushalnagar, G. W. Behm, A. W. Kelstone, and S. Ali. (2015). Tracked Speech-To-Text Display: Enhancing Accessibility and Readability of Real-Time Speech-To-Text. In Proceedings of the 17th International ACM SIGACCESS Conference on Computers & Accessibility (pp. 223-230). New York, NY, USA: ACM.

    4. R. S. Kushalnagar, P. Kushalnagar, and J. B. Pelz. (2012). Deaf and Hearing Students' Eye Gaze Collaboration. In K. Miesenberger, A. Karshmer, and P. Penaz (Eds.), Computers Helping People with Special Needs: 13th International Conference, ICCHP 2012, Linz, Austria, July 11-13, 2012, Proceedings, Part I (pp. 92-99). Berlin, Heidelberg: Springer Berlin Heidelberg.

    References

    Tracked Captioning is a fair alternative to traditional captioning. Empirical data suggests that Tracked Captioning slightly improves d/Deaf audience members’ experiences with live presentations by more than one speaker over traditional captioning. It helps them more easily identify the speaker, understand the discussion, and follow the discussion. Additionally, it facilitates their engagement and involvement in the discussion. Overall, Tracked Captioning bolsters the experience of d/Deaf audiences without impeding the experience of hearing audiences, supporting it as a universal technology rather than simply an access technology.

    However, the sample size used in this study may be too small and too limited to draw confident conclusions. The results may be more indicative of each participant’s personal preferences than of the superiority of either captioning technique. Moreover, the ambiguous wording of the survey questions in conjunction with the lack comprehension questions to evaluate understanding make it difficult to identify the exact strengths and weaknesses of Tracked Captioning. On the other hand, it is clear from participant feedback that the speed of Tracked Captioning was too slow when switching between speakers.

    Conclusions

    During discussion by hearing presenters, d/Deaf audience members participate and understand with the help of interpreters, who translate to and from spoken English and American Sign Language (ASL), or with the help of captions. A common method of live captioning is called C-Print, where a trained captionist transcribes speech using a laptop. The person using the captioning services watches this transcription on their own computer to follow along. In contrast to hearing audiences who listen to speakers while watching for body language cues and reading presentation slides, d/Deaf audiences must multitask during lectures, regardless of which accommodation is used.

    Juggling various information streams and shifting focus repeatedly causes d/Deaf viewers to miss out on content, to get tired, and to get distracted. Additionally, they can feel left out of the conversation and as though they do not grasp the material. This disconnect between hearing presenters and d/Deaf audience members hinders the community, causing misunderstanding and miscommunication, a gap that can isolate d/Deaf people from their professional and educational communities.

    Background

    The basis of Tracked Captioning comes from Real-Time Text Display (RTTD) developed by Kushalnagar, et al. for classroom use [3]. RTTD is a caption display method which tracks a single speaker moving across a classroom and projects captions above them. The system is designed to be portable, easy to set-up, and low-cost, implementing a Microsoft Kinect 2 to track the position of the speaker. A C-Print captionist transcribes the captions, a projector displays the speech as text, and a computer or laptop provides computing power, all shown in Figure 1.

    Kushalnagar, et al. showed RTTD to be effective captioning in the classroom setting, improving students’ ability to follow along with a lecture and to understand lecture content over traditional captioning. However, the system is optimized for one speaker only. In this study, we expand the capabilities of RTTD to accommodate more than one user and evaluate the effectiveness of these enhancements for two-person discussions and presentations.

    Real-Time Text Display

    (a) Experiment room set-up with the Tracked Captioning system in the center

    (b) Traditional captioning; static, without tracking

    (c) Tracked Captioning in Presentation Mode

    (d) Tracked Captioning in Panel Mode

    Results

    Tracked Captioning

    Easier to focus on the discussion

    Traditional Captioning

    Easier to understand the discussion

    Easier to understand the discussion

    Easier to focus on the discussion

    Easier to ID speaker

    Easier to follow the discussion

    More involving or engaging

    Preferred for future discussions

    hearing

    d/D

    eaf a

    nd h

    eari

    ng

    d/Deaf