Architectures of assistive software applications for Windows-based computers

16
Architectures of assistive software applications for Windows-based computers Gareth Evans * , Paul Blenkhorn Department of Computation, University of Manchester Institute of Science and Technology, P.O. Box 88, Manchester M60 1QD, UK Received 2 July 2002; received in revised form 19 November 2002; accepted 11 December 2002 Abstract This paper considers the architecture of some of the most common types of assistive software, namely, screen readers, full-screen magnifiers, on-screen keyboards, predictors and simulated Braille keyboards. The paper provides an overview of the operation of these applications and then identifies the technologies that may be used to implement assistive applications on a Windows platform. The basic architecture of each of the applications is then presented and is followed by examples of how technologies and design approaches that are used in one type of application can be used to enhance the capabilities of another type of application. q 2003 Elsevier Science Ltd. All rights reserved. Keywords: Assistive; Magnifiers; Windows; Braille; Screen reader; On-screen keyboard 1. Introduction There are few papers that discuss the architecture of assistive software applications. We believe that this is because most applications are developed in commercial companies. We are in the relatively fortunate position of having developed commercial examples of all of applications described in this paper over a number of years. One motivation for writing this paper is to expose the architectures and the design choices available to the developers of assistive application software. In this paper we will briefly consider the functionality of the most common assistive software applications; review the technologies that are available to the Windows developer; present the architecture of the applications; and indicate how some of the design approaches used in one application can inform the design of others. 1084-8045/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved. doi:10.1016/S1084-8045(03)00002-X Journal of Network and Computer Applications 26 (2003) 213–228 www.elsevier.com/locate/yjnca * Corresponding author. Tel.: þ 44-161-200-3368; fax: þ44-161-200-3324. E-mail address: [email protected] (G. Evans).

Transcript of Architectures of assistive software applications for Windows-based computers

Architectures of assistive software applications

for Windows-based computers

Gareth Evans*, Paul Blenkhorn

Department of Computation, University of Manchester Institute of Science and Technology,

P.O. Box 88, Manchester M60 1QD, UK

Received 2 July 2002; received in revised form 19 November 2002; accepted 11 December 2002

Abstract

This paper considers the architecture of some of the most common types of assistive software,

namely, screen readers, full-screen magnifiers, on-screen keyboards, predictors and simulated

Braille keyboards. The paper provides an overview of the operation of these applications and then

identifies the technologies that may be used to implement assistive applications on a Windows

platform. The basic architecture of each of the applications is then presented and is followed by

examples of how technologies and design approaches that are used in one type of application can be

used to enhance the capabilities of another type of application.

q 2003 Elsevier Science Ltd. All rights reserved.

Keywords: Assistive; Magnifiers; Windows; Braille; Screen reader; On-screen keyboard

1. Introduction

There are few papers that discuss the architecture of assistive software applications. We

believe that this is because most applications are developed in commercial companies. We

are in the relatively fortunate position of having developed commercial examples of all of

applications described in this paper over a number of years. One motivation for writing

this paper is to expose the architectures and the design choices available to the developers

of assistive application software. In this paper we will briefly consider the functionality of

the most common assistive software applications; review the technologies that are

available to the Windows developer; present the architecture of the applications; and

indicate how some of the design approaches used in one application can inform the design

of others.

1084-8045/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved.

doi:10.1016/S1084-8045(03)00002-X

Journal of Network and

Computer Applications 26 (2003) 213–228

www.elsevier.com/locate/yjnca

* Corresponding author. Tel.: þ44-161-200-3368; fax: þ44-161-200-3324.

E-mail address: [email protected] (G. Evans).

In this paper we will focus on screen readers, full-screen magnifiers, on-screen

keyboards, predictors and simulated Braille keyboards, because we believe that these offer

the greatest challenges to the designer and because they are relatively high value

components. We will attempt to show that this seemingly diverse set of applications shares

certain software components and design approaches. A second motivation for writing this

paper is, therefore, to explicitly indicate the commonality where it exists and also to show

that a design approach that is typically used in one application may be used in another to

enhance its operation. For example, some approaches that are commonly used in the

design of on-screen keyboards can be used to enhance the features of a screen reader.

Some of this ‘cross fertilisation’ is speculation on behalf of the authors, but some will be

illustrated by practical examples of existing systems developed by the authors.

2. Restrictions on the paper

The paper will not consider speech recognition and optical character recognition

(OCR) as an assistive technology. Whilst these technologies are widely used by

disabled people, they have been developed to support a wider group of users. We

acknowledge that some speech recognition and OCR work is specifically targeted at

disabled users, especially those with non-standard speech (Davis et al., 1990) but we

do not consider this issue further.

The paper does not consider augmentative and assistive communication (AAC) and

electronic communication systems (ECS) beyond the prediction and on-screen keyboard

components that may be used to implement these systems, because the paper focuses on

assistive technology systems that provide access to standard applications rather than

systems that are applications in their own right. Braille translation systems are also

important examples of assistive systems that are not discussed here.

The paper focuses exclusively on the architecture of Windows-based systems. We do

this because we believe that the majority of assistive applications are designed for

Windows-based systems and because this is where our expertise lies. However, the issues

we raise in this paper will have some relevance for developers working with other

operating systems.

The paper focuses on the most common assistive software applications. We do not

consider relatively simple applications such as the Windows Sound Sentry (which

translates audible Windows warnings into visual indications for deaf people) and

applications that simulate input from the keyboard or the mouse including serial keys,

sticky keys and mouse keys. In addition, we exclude less common applications such as

disambiguation tools (Minneman, 1986).

3. Overview of the assistive applications

We firstly need to consider the operation of some of the more common assistive

applications. For reasons of space, the descriptions are brief, some generalisations will be

made and some of the more subtle aspects of the applications’ operations will be excluded.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228214

3.1. Screen readers

A screen reader expresses the current interactive segment of Windows in a form that is

suitable for blind people. Generally, screen readers produce speech output through a

speech synthesizer or Braille output by driving a Braille line. In this paper we will focus

exclusively on speech output. This makes little significant difference to the description of

the screen reader’s operation or its architecture.

The principle goal of a screen reader’s designer is to make standard applications (such

as commercial word processors, web browsers, etc.) appear as if they were talking

applications designed especially for blind users.

One point, which will be significant later, is that the screen reader has to inform the user

of changes in an application’s or the operating system’s state. This means that, for

example, when a pop-up window appears, the user needs to be informed of its contents.

We refer to the ability of the screen reader to detect the changes in application and/or

operating system state and its ability to inform the user as ‘context sensitivity’. In addition,

we should introduce the concept of ‘focus’. A Windows control (such as a button, menu

item or client window) can have the current ‘focus’. This control will accept input until the

user changes focus. A blind user will normally change focus using the keyboard. The

screen reader gives information about the component that has the current focus. If this

component has ‘subcomponents’ (such as a form that has a number of buttons) the screen

reader may inform the user about the ‘subcomponents’.

In this paper we will consider the Lookout screen reader developed by the authors. This

is a fully featured commercial screen reader that has many features in common with other

commercial screen readers. Further details concerning its operation and architecture are

given in (Blenkhorn and Evans, 2001a,b).

3.2. Magnifiers

In this paper we will focus exclusively on full-screen magnifiers. As we note in (Baude

et al., 2000) there are other styles of magnification but a discussion of full screen

magnification is sufficient to address the full range of architectural issues.

A magnifier presents an enlarged portion of the standard visual output on the

computer’s monitor. The visually impaired person (VIP) can control which portion of the

display he/she can see at any given time by ‘scrolling around’ the display using the cursor

keys and/or the mouse. When ‘scrolling around’, the updates to the screen should be flicker

free, giving the illusion of a smoothly changing image. A typical magnifier will allow the

screen to be magnified between 2 and 32 times. Like the screen reader, a magnifier should

be context sensitive. So, for example, if a pop-up window appears, the magnifier should

change its display so that the new window is at the centre of the display.

A magnifier may, in addition to changing the size of text and images, change their

representation. When characters are magnified they can appear ‘blocky’, smoothing

algorithms can be employed to produce clearer text—for more details see (Baude et al.,

2000). The colours of text and images can also be changed by the user to provide a clearer

display.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 215

3.3. On-screen keyboards

An on-screen keyboard (OSK) provides a means by which a person, who cannot use a

conventional keyboard can provide text or other input. OSKs are typically used by people

with physical disabilities as an alternative to the keyboard. An OSK presents a set of

choices that the user can select from, typically in the form of a matrix of cells (when

simulating a QWERTY keyboard, for example, each cell corresponds to a key). An OSK

can generally be configured to allow ‘free selection’ or to provide ‘scanned input’. When

using the former, the user can select any of the cells by selecting the cell with a pointing

device, such as a mouse. A typical example is a person with physical disabilities who may

use a head-operated mouse. One traditional method of providing scanned input is for the

OSK to highlight each of the rows in the matrix of cells in turn. The user selects (by using a

switch) the row that contains the cell that he/she desires. The OSK then highlights each of

the cells in the row in turn and the user can then select the appropriate cell. In practice,

there are many different scanning modes (see (Hawes and Blenkhorn, 1999)) that can be

selected according to the user’s preference (including those that use two or more

switches), this paper describes only one mode.

An OSK must be visible to the user at the same time as the application that it is

controlling. From a Windows perspective this is quite interesting (especially where ‘free

selection’ is considered) in that the application has the focus,1 but the OSK must be visible

and capable of capturing input from the user.

3.4. Predictors

Predictors are applications that predict, on the basis of a partially typed word, what

the full word will be. Typically, after the user has typed one or more letters, the

predictor will give the user a list of possible words that complete the word (so called

‘word completion’). When a word is completed, the predictor may give suggestions

for the next word (which would be ‘word prediction’ in the true sense). The user can

then continue to type or select one of the complete words offered by the predictor.

Simple prediction is often done on the basis of the most commonly used words in a

given language but can also be updated by considering the frequency and recency of

words typed by a particular user. More sophisticated predictors will also take into

account syntactical or semantic contexts.

Predictors can be used with OSKs so that typing can be speeded up. In this case

some of the OSK’s cells are used to contain the predictions and as such they operate

as described in the previous section—for examples, see the displays.2 Predictors can

also be used by people who have print disabilities in order to assist in word selection.

When predictors are not integrated with OSKs, the predictor must interact with

another application, most generally a word processor. A predictor can be installed as a

component of a word processor and as the user types the predicted word completions

can appear close to the where the user is typing. Other predictors provide their own

1 As noted earlier, when an application has the focus, keyboard input is directed to that window.2 Hands Off, http://www.sensorysoftware.com/sensory.htm, current 10 Dec 2001.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228216

interfaces and copy the text into another application only when the user has finished

typing each word or phrase.

3.5. Braille keyboards

Some blind users prefer to use Braille keyboards rather than QWERTY keyboards.

External Braille keyboards can be connected to the computer’s serial, parallel or USB port.

Braille keyboards may be special purpose keyboards, Braille note takers or adapted Braille

typewriters (Blenkhorn et al., 2001). The last of these is particularly attractive in that it can

produce a paper and electronic copy at the same time. From the operating system’s

perspective these devices connect as keyboards with their own drivers. However, at some

point, the Braille characters must be converted to standard ASCII or Unicode characters so

that they can be passed to an application (such as a word processor). This conversion is not

always straightforward, particularly when contracted Braille is being typed. The

conversion may happen within the keyboard itself, this approach is common for Braille

note takers, but can also take place inside the computer system. Indeed the keyboard may

send key up and key down events to the computer system rather than Braille characters.

The relationship between Braille and text and the algorithms for conversion can be found

in (Blenkhorn, 1995, 1997).

In another approach a standard QWERTY keyboard can be used for Braille input. Here

eight keys on the QWERTY keyboard are used to simulate the eight Braille keys3 (the six

‘dot’ keys, space and new line). For example, the keys SDF JKL may be used for dots one

to six, respectively. We refer to such a system as a ‘simulated Braille keyboard’.

4. Technologies

In this section we consider the technologies that are available to the developers of

assistive application when developing for Microsoft Windows. Space precludes a detailed

examination of the technologies and some generalisations are made.

4.1. Methods to simulate input

Many of the applications described above are required to simulate keyboard actions (for

example the OSK, the simulated Braille keyboard, etc.) or mouse actions (for example the

OSK, mouse keys, etc.). This can be achieved by making a call on the Windows

Application Programming Interface (API). Keyboard and mouse events can be simulated

using the SendInput API call. This takes parameters that indicate, amongst other

information, the source of the event (mouse, keyboard or other hardware) and information

pertinent to the event (for example, the identifier of the key or the mouse location).

SendInput will route the input to the operating system’s keyboard or mouse event queue as

appropriate.

3 This description assumes 6-dot Braille. Applications that support 8-dot Braille are also available.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 217

4.2. Hooking—getting information about input and output

There are a number of circumstances when an assistive application needs to intercept

input before it reaches an application and output before it reaches the screen. For example,

both screen readers and magnifiers need to intercept keyboard input to obtain the

commands that control the assistive application and prevent these from being routed to the

application. Screen readers and magnifiers also need to intercept the calls made by

applications and the operating system on the operating system’s graphics system to be able

to determine the information that is being written to the screen.

The interception of information in this way is achieved by using a hook. Considering

input ‘a hook is a mechanism by which a function can intercept events (messages, mouse

actions, keystrokes) before they reach an application’ (Marsh, 1993). Data that is obtained

through a hook can be filtered. This means that some data is extracted and used by the

assistive application and other data is passed through to the standard application. In a

screen reader some keyboard commands are used to control the operation of the screen

reader and these are extracted and processed by the screen reader. Other keyboard input is

intended to control the application that the user is using at the time (for example a word

processor) and are passed back into the keyboard buffer without modification.

4.3. Microsoft active accessibility (MSAA)

MSAA ‘is a suite of technologies for the software developer that lets applications and

objects communicate and automate more efficiently. MSAA allows applications to

effectively cooperate with accessibility aids, providing information about what they are

doing and the contents of the screen’.4

MSAA provides a means by which applications and the operating system can generate

events and expose their contents to other applications through a standard interface. It is the

application’s responsibility to provide this interface—an MSAA server. An

assistive application can take advantage of this interface to determine what information

an application is presenting to the user (this is very useful for screen readers). Not all

applications provide an MSAA server. However, the assistive application can generally

get some useful information provided that the application uses standard interface

components such as pull down menus and controls. Because these are supported by the

operating system and because Windows provides an MSAA server, this information can be

extracted.

4.4. Object linking and embedding (OLE)

OLE is an automation technology developed by Microsoft that allows applications to

work together. An application that has an OLE interface exposes its Object Model. The

Object Model can be used by an assistive application to extract data from an application

and also to control the behaviour of the application. This is very useful for applications that

4 http://www.microsoft.com/Windows/platform/Innovate/Marketbulls/msaakybull.asp, current 7 Dec 1999.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228218

do not fully support MSAA. For instance, the area in which a user types in Microsoft Word

is not exposed through MSAA, but Word has an OLE interface through which an assistive

application can obtain the data. It also provides a means through which alternative

interfaces to standard applications can be constructed—see later.

4.5. Off-screen model (OSM)

Screen Readers need to be able to determine the information that is displayed on all

parts of the screen. MSAA and OLE can satisfy many of the demands for information that

a screen reader may make.5 However, MSAA and OLE cannot satisfy the requirement that

a screen reader will interact with almost all applications under all circumstances. Thus, a

screen reader needs to maintain its own version of the information that is displayed on the

screen. This is achieved by the screen reader hooking the calls that applications and the

operating system make on the graphics display system. From these calls the screen reader

can build an OSM. This is data structure that has an entry for every pixel on the screen and

which holds information about the information that is displayed by that pixel. The OSM

presents an interface that allows the screen reader to determine the attributes of a pixel. For

example, a screen reader can determine whether a pixel is part of a character or not. If it is,

the screen reader can obtain information about a character’s value, its font and its

dimensions. The OSM will also be capable of informing the screen reader that significant

events have taken place (in the same way as MSAA can be used to inform an assistive

application of changes). For example, the OSM can send an event to the screen reader

whenever the ‘blinking caret’ in an application moves.

4.6. Topmost window

In Windows, the window that has the focus is generally on top (i.e. visible to the user

and obscuring other windows). In some cases we want windows to be visible to the user,

but not necessarily to have the focus. This is true for OSKs. In Windows, a window can be

set to be a Topmost window. Such a window will appear in front of all the windows that

are not set to be Topmost windows. This technique allows an OSK to be placed in front of

part of the window of an application, at the same time ensuring that the application, rather

than the OSK, has the focus.

5. Architectures

In this section we present the software architecture of the assistive applications

described earlier with a particular focus on their use of the technology identified in the

previous section. In presenting a single architecture for each we are obviously making

some generalisations.

5 Traditionally screen readers have not used MSAA and OLE and we believe that a number of current screen

readers do not use this approach. They use an off-screen model exclusively for the purpose of obtaining on-screen

information.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 219

5.1. Screen reader

The architecture of the LookOUT screen reader (Blenkhorn and Evans, 2001b) is

shown in Fig. 1. The user controls LookOUT through the keyboard. Of course the

applications and the operating system also require keyboard input, so LookOUT uses a

keyboard hook and filters out the keystroke combinations that form the commands. The

other keystrokes are passed to the operating system, which places them in the Windows

keyboard buffer.

LookOUT uses two major sources of information, MSAA and its internal OSM.6

LookOUT always uses MSAA as its first source of information. MSAA is also used as a

source of events that indicate, for example, that a pop-up window has appeared. MSAA is

also queried by LookOUT so that LookOUT can determine the type of information. For

example, MSAA can be used to determine the type of element on a form, i.e. whether it is a

checkbox, radio button, etc. MSAA events are also handled by LookOUT in order to

provide context sensitivity.

As noted earlier, not all applications support MSAA and indeed a small set of the

standard Windows controls are not supported by MSAA. When information cannot be

obtained from MSAA, LookOUT uses the OSM. The OSM obtains its information by

hooking the calls made to Windows Graphics Display Interface (GDI), which is used by

Fig. 1. The architecture of the LookOUT Screen Reader.

6 LookOUT uses an OSM developed jointly by Microsoft, ONCE, Baum Products GmbH and euroBraille SA.

This OSM can be obtained as part of an open source project. Further information can be obtained from http://

www.stonesoupsoftware.org

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228220

Windows and applications to construct visual output. By interpreting these calls the OSM

can establish the type of information held at every pixel on the screen.

In some circumstances it is difficult for the screen reader to determine information from

the OSM. The classic example is determining the selected cell in Microsoft Excel. This is

indicated visually using quite subtle means (providing a highlighted box around the cell

and setting the row and column names to bold). The screen reader must be able to read

these rather subtle visual cues and doing so can be quite computationally intensive.

However, if an application provides an OLE interface and the screen reader is aware of

which application it is dealing with, the information can easily be obtained by making a

call on the object model. Continuing with the Excel example, it is relatively simple to use

Excel’s OLE interface to find which cell is currently highlighted. This technique is

computationally efficient and reliable. The use of OLE is not confined to extracting

information from applications. Because OLE provides a means to control an application,

the screen reader can present an alternative interface for blind users that may be simpler

for the user. This issue is discussed later.

5.2. Magnifier

The architecture of a magnifier is presented in Fig. 2. Like the screen reader, the

magnifier has to filter the keyboard to determine whether keystrokes should be directed to

another application or be used to control the magnifier itself. The magnifier also has to

hook the mouse. In general, mouse movement with no buttons selected should be

interpreted directly by the magnifier itself. However, mouse button selections should be

routed to the application.

A magnifier has to carry out two major tasks. Its first task is to capture the graphics

output so that it can enlarge it and the second task is to cause the enlarged image to be

presented on the screen. There are a number of methods available to the designer of a

magnifier to achieve both tasks.

Broadly speaking there are two major techniques for capturing the graphics output.7

The first is to hook the Windows graphics system at some point and to interpret the calls to

create a bitmap equivalent to the output. Fig. 2 implies this technique. The second method

is to allow the Windows graphics system to create its own bitmap and to subsequently

work on this. To make this approach work either the bitmap created by the operating

system has to be redirected from the display card’s video memory to some other set of

memory locations or some technique needs to be used whereby the video image taken

from the display card’s video memory is obscured.

We will discuss two techniques for displaying the enlarged image. One is to make use

of the Windows graphics system. This is achieved by making calls on the graphics system

so that the enlarged image is displayed. Fig. 2 implies this. The other approach is to take

advantage of Direct X (specifically Direct Draw) to present the enlarged image. Direct

Draw is an API developed by Microsoft that allows a programmer to efficiently create and

manipulate graphical images. One interesting feature of Direct Draw is that it supports

overlays. This feature can be used by a magnifier to overlay the enlarged image ‘on top’ of

7 A more detailed description is given in (Blenkhorn and Evans, 2001a).

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 221

the standard video display. This approach is appropriate when allowing the graphics

system to create its image in the display card’s video memory before enlarging the image

(see above).

The magnifier uses MSAA so that it can respond to events, i.e. to provide the context

sensitive behaviour described earlier.

5.3. On-screen keyboards

The architecture of an OSK is shown in Fig. 3.

The OSK has two sources of input, a mouse, when the user uses ‘free selection’, and

one or more switches, when the user is using ‘scanned input’.

The system produces keystrokes that need to be routed to the application. This is

achieved by using the SendInput Windows API call. The OSK needs to be set to be the

Topmost window, this is important when a user is using a mouse.

5.4. Predictor

The architecture of a predictor is shown in Fig. 4. This assumes that the predictor is a

standalone application that is not integrated with an OSK. The predictor uses a keyboard

hook and passes an appropriate set of keystrokes to Windows through the SendInput API

call. The predictor also has to present the set of predicted words to the user.

5.5. Simulated Braille keyboard

The architecture of a simulated Braille keyboard is shown in Fig. 5. From this it can be

seen that the keyboard’s architecture is identical to that of the predictor. The processing

Fig. 2. The architecture of a magnifier.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228222

Fig. 3. The architecture on an On-Screen Keyboard.

Fig. 4. The architecture of a predictor.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 223

requirements are somewhat different. The keyboard filter discards input from all keys that

do not correspond to the eight keys that simulate the Braille keyboard. The keystrokes are

translated into standard text and passed to the application using the Win API’s SendInput

function.

Serial Braille keyboards act as serial input devices and appropriate software must be

written to intercept and process the data that is input.

6. Extending assistive architectures

In this section we examine the ways in which the basic architectures given earlier

can be extended by using techniques that are used in other applications. This section

is divided into two subsections. The first presents extensions to screen reader

architectures that have been implemented and evaluated by the authors. In the second,

we propose extensions to assistive applications; however, these extensions have not

been implemented or evaluated.

6.1. Implemented screen reader extensions

6.1.1. Alternative interfaces using OLE

Certain standard applications can be very difficult to use, even with a well-

designed screen reader. This may be because the information is presented by the

application in a visually complex way. One example is accessing web browsers such

Fig. 5. The architecture of a simulated Braille keyboard.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228224

as Microsoft Internet Explorer. Certain web sites are very complex, especially if they

use multiple frames and have a large number of links. One approach to addressing

this problem is to develop a browser that is specifically designed for blind people,

such as the Brookes Talk browser (Zajicek et al., 1998). A user of such a system is

interacting with an application that has been specifically designed for bind users. As

noted earlier, one of the goals of a screen reader is to give the impression to the user

that he/she is interacting with an application that has been specifically designed for

blind people, but which is based on a standard application. There are some advantages

to using a standard application in terms of maintenance and upward compatibility.

The question is, therefore, is there a way in which alternative interfaces can be

provided for a screen reader that is interacting with a standard application? This can

be achieved by using OLE. As noted earlier OLE allows an application to both obtain

information and to control another application. Because Internet Explorer has an OLE

interface it is possible to develop an application that presents an alternative interface

to the user through a screen reader. This approach has been used by Baum GmbH in

their Web Wizard.8

Our approach (which is similar to Baum’s) is to develop a standalone application that

provides an alternative interface to an application that provides information in a way that is

optimised for screen reading. The application is standalone, rather than integrated with the

screen reader, so that it can be used with any screen reader. Our initial work has focused on

providing alternative interfaces for Internet Explorer and Microsoft Outlook. The Outlook

interface is particularly interesting because not only does it present information that is

difficult to read in Outlook using a screen reader in a ‘screen reader friendly’ way, but also

it provides additional functions that are not available in Outlook itself. Suppose that a user

of Outlook wishes to find his/her next free two-hour appointment on a Friday afternoon.

For a sighted user, it is relatively easy to scan through the days and locate the free time by

sight. The task is quite time consuming for a blind user, because he/she has to step through

considerable amounts of information. One solution is to add an extra function (a ‘next free

slot’ function) that will find a time slot subject to certain constraints (such a length of time,

day of the week, etc.).

6.1.2. Alternative interfaces using SendInput

Not all applications support OLE, but alternative interfaces can be developed by

simulating keyboard and mouse input in a similar way to OSKs, predictors and simulated

Braille Keyboards by using the Win API SendInput call. We give two examples.

In the first we provide an alternative interface to Windows Media Player (WMP).

Specifically we give user control over the Stop, Play and Pause operations through the

numeric keyboard keys 4, 5 and 6, respectively. This is achieved by hooking and filtering

the relevant numeric keyboard keys. The screen reader9 determines the co-ordinates of top

left-hand corner of the WMP’s window using an API call. The screen reader knows the

location of the Stop, Play and Pause buttons relative to the WMP’s top left-hand corner

8 http://www.baum.de/English/webwizard.htm, current 10 Dec 2001.9 Strictly speaking it is not the screen reader, but an application-specific script executed by the screen reader

that carries out the operation. See (Blenkhorn and Evans, 2001b) for further details.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 225

and, using an API call; it moves the mouse pointer to the relevant location and clicks the

mouse button.

In the second example access is given to a multimedia encyclopaedia’s text. The

encyclopaedia uses non-standard controls and information cannot be obtained from

MSAA. The OSM can be used, but the nature of encyclopaedia’s interface can make this

unreliable. Text is obtained by moving the mouse to the encyclopaedia’s text box using the

techniques outlined above. Key combinations are then sent that select the text of the entry

and copy it to the Windows clipboard. The screen reader then reads the text from the

clipboard.

6.2. Proposed extensions

In this section we propose extensions to assistive architectures. These have not been

implemented or evaluated.

6.2.1. Extensions to OSKs

OSKs can be made context-sensitive using the techniques used by screen readers and

magnifiers. In this approach, the OSK uses MSAA or an off-screen model to determine the

current focus and context and presents the user with appropriate options on the keyboard.

When scanning is used, these options can be presented in the positions that are quickest to

access. For example, when the current context is a menu the user can be presented with cells

to control navigation through the menu (for example, up, down, right, return and escape). If

the current context is a control, options can be given that are appropriate to that control. For

example, if the context is spin control, the option Up and Down should be given. We

speculate that this approach may make OSKs faster to use, particularly when scanning.

6.2.2. Extensions to predictors

Context sensitivity can also be used in predictors when the predictor is integrated with a

word processor. Suppose that the user is editing an existing document. If he/she types a

letter at the end of the document, prediction should work as normal. However, suppose the

caret is in the middle of a word, for example between the ‘l’ and the ‘o’ in the word ‘helo’.

If the user now types an ‘l’ (to make the word ‘hello’), it is not appropriate for the predictor

to give options of a word starting with ‘l’. Knowing the position of the caret can prevent

this. After the user has typed the ‘l’, suppose he/she moves the caret one position to the

right (after the ‘o’), if he/she now types a space, the predictor should now try to predict the

next word.

In short, knowledge of the context can prevent the predictor presenting inappropriate

options when the user is editing a document.

6.2.3. Extensions to magnifiers

Magnifiers generally present the current focus as the centre of the display. So, for

example, when a user is writing a document in a word processor, the display is centred on

the caret. If the user is typing a new document (rather editing), half the display will be

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228226

blank. To maximise the amount of contextual information presented to the user, the caret

should be set toward the bottom right of the display. This will present more information on

the current line and the previous lines. If the user is interacting with a control, such as a

checkbox, it may be more appropriate for the focus to be displayed toward the left of the

display, showing the status of the checkbox and maximising the amount of text that can be

seen. Again, this maximises the amount of contextual information. Context can be

obtained from MSAA, OLE or an OSM.

Magnifiers generally enlarge the graphical image presented by the application. As

noted earlier, enlarged text can appear ‘blocky’ and most magnifiers provide smoothing

algorithms that can be selected by the user to reduce the degree of ‘blockiness’. Smoothing

works reasonably well, but it does have a significant computational overhead. When

dealing with text an alternative approach is to determine what the text is (by using MSAA,

OLE or an OSM) and presenting the text in an enlarged font. This is equivalent to changing

the size of the fonts in a word processor and thus the text is smooth. This reduces the

overhead associated with smoothing and may lead to a more readable output. However, the

characters may not necessarily occupy precisely the same relative positions as they did in

the original image. This may cause problems in some applications, but, where the

primarily goal is to present readable text information, it may be an appropriate approach

and one that merits further investigation.

6.2.4. Simulated Braille keyboards

As with any keyboard, prediction could be used (even where contracted Braille is used).

The predicted options would have to be presented to the user in speech or Braille form.

Assuming that the user is a reasonably capable Braille typist, it will almost certainly be

quicker for the typist to continue typing than to select from an option and thus prediction is

not a good option.

7. Concluding remarks

It is clear that the major assistive software applications for Windows machines make

use of similar technology and that there is considerable overlap in their architectures. It is

also clear that the knowledge of the context is very important in screen readers. We believe

that other assistive applications can make greater use of contextual information to provide

more intelligent interfaces to their users. We will be developing tools that incorporate

some of these features.

Acknowledgements

We would like to acknowledge the contribution of Jeff Witt whose knowledge of

assistive architectures for Windows has been of great help over a number of years.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 227

References

Baude A, Blenkhorn P, Evans G. The architecture of a Windows 9x full screen magnifier using DDI

hooking. In: Marincek C, Buhler C, Knops H, Andrich R, editors. Assistive Technology—Added

Value to the Quality of Life (AAATE 2001). Amsterdam: IOS Press; 2001. p. 113–8.

Blenkhorn P. A system for converting Braille into print. IEEE Trans Rehabil Engng 1995;3(2):

215–21.

Blenkhorn P. A system for converting print into Braille. IEEE Trans Rehabil Engng 1997;5(2):

121–9.

Blenkhorn P, Evans G. Considerations for user interaction with talking screen readers. Proc CSUN

2001 2001a; http://www.csun.edu/cod/conf2001/proceedings/0131blenkhorn.html.

Blenkhorn P, Evans G. The architecture of a Windows screen reader. In: Marincek C, Buhler C,

Knops H, Andrich R, editors. Assistive Technology—Added Value to the Quality of Life

(AAATE 2001), 2001. Amsterdam: IOS Press; 2001b. p. 119–23.

Blenkhorn P, Pettitt S, Evans D. Multi-lingual input to a personal computer using a modified Perkins

Braille writer. Br J Vis Impair 2001;19(1):17–19.

Davis E, Scott PD, Spangler RA. Voice activated control for the physically handicapped and speech

impaired. Speech Tech ‘90 1990;50–2.

Hawes P, Blenkhorn P. Speeding up your switch input. Proc CSUN 1999 1999;.

Marsh K. Win 32 hooks. Microsoft Developer Network 1993; July 29.

Minneman SL. Keyboard optimization techniques to improve output rate of disabled individuals.

Proc 9th RESNA 1986;.

Zajicek M, Powell C, Reeves C, Griffiths J. Web browsing for the visually impaired. Proc 15th IFIP

World Congress 1998;161–9.

Gareth Evans holds a BSc (Hons) in Electrical and Electronic Engineering from

the University of Manchester and a PhD in Computation from UMIST.

He joined UMIST in 1987 and is currently a Senior Lecturer in the Department

of Computation. His research interests include alternative interfaces to

computers for people with disabilities, speech synthesis and assistive devices

for people with a range of disabilities.

Paul Blenkhorn holds a BSc (Hons) in Mathematics from the University of

Manchester.

He has been an active developer of systems for people with disabilities for the

past 20 years and worked at the Open University and at the Research Center for

the Visually Handicapped at the University of Birmingham. He was co-founder

and research director of Dolphin Systems. He joined UMIST in 1991 and is

currently the Professor of Assistive Technology in the Department of

Computation. He has broad research interests in the area of technology and

people with disabilities.

G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228228