Architectures of assistive software applications for Windows-based computers
-
Upload
gareth-evans -
Category
Documents
-
view
212 -
download
0
Transcript of Architectures of assistive software applications for Windows-based computers
Architectures of assistive software applications
for Windows-based computers
Gareth Evans*, Paul Blenkhorn
Department of Computation, University of Manchester Institute of Science and Technology,
P.O. Box 88, Manchester M60 1QD, UK
Received 2 July 2002; received in revised form 19 November 2002; accepted 11 December 2002
Abstract
This paper considers the architecture of some of the most common types of assistive software,
namely, screen readers, full-screen magnifiers, on-screen keyboards, predictors and simulated
Braille keyboards. The paper provides an overview of the operation of these applications and then
identifies the technologies that may be used to implement assistive applications on a Windows
platform. The basic architecture of each of the applications is then presented and is followed by
examples of how technologies and design approaches that are used in one type of application can be
used to enhance the capabilities of another type of application.
q 2003 Elsevier Science Ltd. All rights reserved.
Keywords: Assistive; Magnifiers; Windows; Braille; Screen reader; On-screen keyboard
1. Introduction
There are few papers that discuss the architecture of assistive software applications. We
believe that this is because most applications are developed in commercial companies. We
are in the relatively fortunate position of having developed commercial examples of all of
applications described in this paper over a number of years. One motivation for writing
this paper is to expose the architectures and the design choices available to the developers
of assistive application software. In this paper we will briefly consider the functionality of
the most common assistive software applications; review the technologies that are
available to the Windows developer; present the architecture of the applications; and
indicate how some of the design approaches used in one application can inform the design
of others.
1084-8045/03/$ - see front matter q 2003 Elsevier Science Ltd. All rights reserved.
doi:10.1016/S1084-8045(03)00002-X
Journal of Network and
Computer Applications 26 (2003) 213–228
www.elsevier.com/locate/yjnca
* Corresponding author. Tel.: þ44-161-200-3368; fax: þ44-161-200-3324.
E-mail address: [email protected] (G. Evans).
In this paper we will focus on screen readers, full-screen magnifiers, on-screen
keyboards, predictors and simulated Braille keyboards, because we believe that these offer
the greatest challenges to the designer and because they are relatively high value
components. We will attempt to show that this seemingly diverse set of applications shares
certain software components and design approaches. A second motivation for writing this
paper is, therefore, to explicitly indicate the commonality where it exists and also to show
that a design approach that is typically used in one application may be used in another to
enhance its operation. For example, some approaches that are commonly used in the
design of on-screen keyboards can be used to enhance the features of a screen reader.
Some of this ‘cross fertilisation’ is speculation on behalf of the authors, but some will be
illustrated by practical examples of existing systems developed by the authors.
2. Restrictions on the paper
The paper will not consider speech recognition and optical character recognition
(OCR) as an assistive technology. Whilst these technologies are widely used by
disabled people, they have been developed to support a wider group of users. We
acknowledge that some speech recognition and OCR work is specifically targeted at
disabled users, especially those with non-standard speech (Davis et al., 1990) but we
do not consider this issue further.
The paper does not consider augmentative and assistive communication (AAC) and
electronic communication systems (ECS) beyond the prediction and on-screen keyboard
components that may be used to implement these systems, because the paper focuses on
assistive technology systems that provide access to standard applications rather than
systems that are applications in their own right. Braille translation systems are also
important examples of assistive systems that are not discussed here.
The paper focuses exclusively on the architecture of Windows-based systems. We do
this because we believe that the majority of assistive applications are designed for
Windows-based systems and because this is where our expertise lies. However, the issues
we raise in this paper will have some relevance for developers working with other
operating systems.
The paper focuses on the most common assistive software applications. We do not
consider relatively simple applications such as the Windows Sound Sentry (which
translates audible Windows warnings into visual indications for deaf people) and
applications that simulate input from the keyboard or the mouse including serial keys,
sticky keys and mouse keys. In addition, we exclude less common applications such as
disambiguation tools (Minneman, 1986).
3. Overview of the assistive applications
We firstly need to consider the operation of some of the more common assistive
applications. For reasons of space, the descriptions are brief, some generalisations will be
made and some of the more subtle aspects of the applications’ operations will be excluded.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228214
3.1. Screen readers
A screen reader expresses the current interactive segment of Windows in a form that is
suitable for blind people. Generally, screen readers produce speech output through a
speech synthesizer or Braille output by driving a Braille line. In this paper we will focus
exclusively on speech output. This makes little significant difference to the description of
the screen reader’s operation or its architecture.
The principle goal of a screen reader’s designer is to make standard applications (such
as commercial word processors, web browsers, etc.) appear as if they were talking
applications designed especially for blind users.
One point, which will be significant later, is that the screen reader has to inform the user
of changes in an application’s or the operating system’s state. This means that, for
example, when a pop-up window appears, the user needs to be informed of its contents.
We refer to the ability of the screen reader to detect the changes in application and/or
operating system state and its ability to inform the user as ‘context sensitivity’. In addition,
we should introduce the concept of ‘focus’. A Windows control (such as a button, menu
item or client window) can have the current ‘focus’. This control will accept input until the
user changes focus. A blind user will normally change focus using the keyboard. The
screen reader gives information about the component that has the current focus. If this
component has ‘subcomponents’ (such as a form that has a number of buttons) the screen
reader may inform the user about the ‘subcomponents’.
In this paper we will consider the Lookout screen reader developed by the authors. This
is a fully featured commercial screen reader that has many features in common with other
commercial screen readers. Further details concerning its operation and architecture are
given in (Blenkhorn and Evans, 2001a,b).
3.2. Magnifiers
In this paper we will focus exclusively on full-screen magnifiers. As we note in (Baude
et al., 2000) there are other styles of magnification but a discussion of full screen
magnification is sufficient to address the full range of architectural issues.
A magnifier presents an enlarged portion of the standard visual output on the
computer’s monitor. The visually impaired person (VIP) can control which portion of the
display he/she can see at any given time by ‘scrolling around’ the display using the cursor
keys and/or the mouse. When ‘scrolling around’, the updates to the screen should be flicker
free, giving the illusion of a smoothly changing image. A typical magnifier will allow the
screen to be magnified between 2 and 32 times. Like the screen reader, a magnifier should
be context sensitive. So, for example, if a pop-up window appears, the magnifier should
change its display so that the new window is at the centre of the display.
A magnifier may, in addition to changing the size of text and images, change their
representation. When characters are magnified they can appear ‘blocky’, smoothing
algorithms can be employed to produce clearer text—for more details see (Baude et al.,
2000). The colours of text and images can also be changed by the user to provide a clearer
display.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 215
3.3. On-screen keyboards
An on-screen keyboard (OSK) provides a means by which a person, who cannot use a
conventional keyboard can provide text or other input. OSKs are typically used by people
with physical disabilities as an alternative to the keyboard. An OSK presents a set of
choices that the user can select from, typically in the form of a matrix of cells (when
simulating a QWERTY keyboard, for example, each cell corresponds to a key). An OSK
can generally be configured to allow ‘free selection’ or to provide ‘scanned input’. When
using the former, the user can select any of the cells by selecting the cell with a pointing
device, such as a mouse. A typical example is a person with physical disabilities who may
use a head-operated mouse. One traditional method of providing scanned input is for the
OSK to highlight each of the rows in the matrix of cells in turn. The user selects (by using a
switch) the row that contains the cell that he/she desires. The OSK then highlights each of
the cells in the row in turn and the user can then select the appropriate cell. In practice,
there are many different scanning modes (see (Hawes and Blenkhorn, 1999)) that can be
selected according to the user’s preference (including those that use two or more
switches), this paper describes only one mode.
An OSK must be visible to the user at the same time as the application that it is
controlling. From a Windows perspective this is quite interesting (especially where ‘free
selection’ is considered) in that the application has the focus,1 but the OSK must be visible
and capable of capturing input from the user.
3.4. Predictors
Predictors are applications that predict, on the basis of a partially typed word, what
the full word will be. Typically, after the user has typed one or more letters, the
predictor will give the user a list of possible words that complete the word (so called
‘word completion’). When a word is completed, the predictor may give suggestions
for the next word (which would be ‘word prediction’ in the true sense). The user can
then continue to type or select one of the complete words offered by the predictor.
Simple prediction is often done on the basis of the most commonly used words in a
given language but can also be updated by considering the frequency and recency of
words typed by a particular user. More sophisticated predictors will also take into
account syntactical or semantic contexts.
Predictors can be used with OSKs so that typing can be speeded up. In this case
some of the OSK’s cells are used to contain the predictions and as such they operate
as described in the previous section—for examples, see the displays.2 Predictors can
also be used by people who have print disabilities in order to assist in word selection.
When predictors are not integrated with OSKs, the predictor must interact with
another application, most generally a word processor. A predictor can be installed as a
component of a word processor and as the user types the predicted word completions
can appear close to the where the user is typing. Other predictors provide their own
1 As noted earlier, when an application has the focus, keyboard input is directed to that window.2 Hands Off, http://www.sensorysoftware.com/sensory.htm, current 10 Dec 2001.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228216
interfaces and copy the text into another application only when the user has finished
typing each word or phrase.
3.5. Braille keyboards
Some blind users prefer to use Braille keyboards rather than QWERTY keyboards.
External Braille keyboards can be connected to the computer’s serial, parallel or USB port.
Braille keyboards may be special purpose keyboards, Braille note takers or adapted Braille
typewriters (Blenkhorn et al., 2001). The last of these is particularly attractive in that it can
produce a paper and electronic copy at the same time. From the operating system’s
perspective these devices connect as keyboards with their own drivers. However, at some
point, the Braille characters must be converted to standard ASCII or Unicode characters so
that they can be passed to an application (such as a word processor). This conversion is not
always straightforward, particularly when contracted Braille is being typed. The
conversion may happen within the keyboard itself, this approach is common for Braille
note takers, but can also take place inside the computer system. Indeed the keyboard may
send key up and key down events to the computer system rather than Braille characters.
The relationship between Braille and text and the algorithms for conversion can be found
in (Blenkhorn, 1995, 1997).
In another approach a standard QWERTY keyboard can be used for Braille input. Here
eight keys on the QWERTY keyboard are used to simulate the eight Braille keys3 (the six
‘dot’ keys, space and new line). For example, the keys SDF JKL may be used for dots one
to six, respectively. We refer to such a system as a ‘simulated Braille keyboard’.
4. Technologies
In this section we consider the technologies that are available to the developers of
assistive application when developing for Microsoft Windows. Space precludes a detailed
examination of the technologies and some generalisations are made.
4.1. Methods to simulate input
Many of the applications described above are required to simulate keyboard actions (for
example the OSK, the simulated Braille keyboard, etc.) or mouse actions (for example the
OSK, mouse keys, etc.). This can be achieved by making a call on the Windows
Application Programming Interface (API). Keyboard and mouse events can be simulated
using the SendInput API call. This takes parameters that indicate, amongst other
information, the source of the event (mouse, keyboard or other hardware) and information
pertinent to the event (for example, the identifier of the key or the mouse location).
SendInput will route the input to the operating system’s keyboard or mouse event queue as
appropriate.
3 This description assumes 6-dot Braille. Applications that support 8-dot Braille are also available.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 217
4.2. Hooking—getting information about input and output
There are a number of circumstances when an assistive application needs to intercept
input before it reaches an application and output before it reaches the screen. For example,
both screen readers and magnifiers need to intercept keyboard input to obtain the
commands that control the assistive application and prevent these from being routed to the
application. Screen readers and magnifiers also need to intercept the calls made by
applications and the operating system on the operating system’s graphics system to be able
to determine the information that is being written to the screen.
The interception of information in this way is achieved by using a hook. Considering
input ‘a hook is a mechanism by which a function can intercept events (messages, mouse
actions, keystrokes) before they reach an application’ (Marsh, 1993). Data that is obtained
through a hook can be filtered. This means that some data is extracted and used by the
assistive application and other data is passed through to the standard application. In a
screen reader some keyboard commands are used to control the operation of the screen
reader and these are extracted and processed by the screen reader. Other keyboard input is
intended to control the application that the user is using at the time (for example a word
processor) and are passed back into the keyboard buffer without modification.
4.3. Microsoft active accessibility (MSAA)
MSAA ‘is a suite of technologies for the software developer that lets applications and
objects communicate and automate more efficiently. MSAA allows applications to
effectively cooperate with accessibility aids, providing information about what they are
doing and the contents of the screen’.4
MSAA provides a means by which applications and the operating system can generate
events and expose their contents to other applications through a standard interface. It is the
application’s responsibility to provide this interface—an MSAA server. An
assistive application can take advantage of this interface to determine what information
an application is presenting to the user (this is very useful for screen readers). Not all
applications provide an MSAA server. However, the assistive application can generally
get some useful information provided that the application uses standard interface
components such as pull down menus and controls. Because these are supported by the
operating system and because Windows provides an MSAA server, this information can be
extracted.
4.4. Object linking and embedding (OLE)
OLE is an automation technology developed by Microsoft that allows applications to
work together. An application that has an OLE interface exposes its Object Model. The
Object Model can be used by an assistive application to extract data from an application
and also to control the behaviour of the application. This is very useful for applications that
4 http://www.microsoft.com/Windows/platform/Innovate/Marketbulls/msaakybull.asp, current 7 Dec 1999.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228218
do not fully support MSAA. For instance, the area in which a user types in Microsoft Word
is not exposed through MSAA, but Word has an OLE interface through which an assistive
application can obtain the data. It also provides a means through which alternative
interfaces to standard applications can be constructed—see later.
4.5. Off-screen model (OSM)
Screen Readers need to be able to determine the information that is displayed on all
parts of the screen. MSAA and OLE can satisfy many of the demands for information that
a screen reader may make.5 However, MSAA and OLE cannot satisfy the requirement that
a screen reader will interact with almost all applications under all circumstances. Thus, a
screen reader needs to maintain its own version of the information that is displayed on the
screen. This is achieved by the screen reader hooking the calls that applications and the
operating system make on the graphics display system. From these calls the screen reader
can build an OSM. This is data structure that has an entry for every pixel on the screen and
which holds information about the information that is displayed by that pixel. The OSM
presents an interface that allows the screen reader to determine the attributes of a pixel. For
example, a screen reader can determine whether a pixel is part of a character or not. If it is,
the screen reader can obtain information about a character’s value, its font and its
dimensions. The OSM will also be capable of informing the screen reader that significant
events have taken place (in the same way as MSAA can be used to inform an assistive
application of changes). For example, the OSM can send an event to the screen reader
whenever the ‘blinking caret’ in an application moves.
4.6. Topmost window
In Windows, the window that has the focus is generally on top (i.e. visible to the user
and obscuring other windows). In some cases we want windows to be visible to the user,
but not necessarily to have the focus. This is true for OSKs. In Windows, a window can be
set to be a Topmost window. Such a window will appear in front of all the windows that
are not set to be Topmost windows. This technique allows an OSK to be placed in front of
part of the window of an application, at the same time ensuring that the application, rather
than the OSK, has the focus.
5. Architectures
In this section we present the software architecture of the assistive applications
described earlier with a particular focus on their use of the technology identified in the
previous section. In presenting a single architecture for each we are obviously making
some generalisations.
5 Traditionally screen readers have not used MSAA and OLE and we believe that a number of current screen
readers do not use this approach. They use an off-screen model exclusively for the purpose of obtaining on-screen
information.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 219
5.1. Screen reader
The architecture of the LookOUT screen reader (Blenkhorn and Evans, 2001b) is
shown in Fig. 1. The user controls LookOUT through the keyboard. Of course the
applications and the operating system also require keyboard input, so LookOUT uses a
keyboard hook and filters out the keystroke combinations that form the commands. The
other keystrokes are passed to the operating system, which places them in the Windows
keyboard buffer.
LookOUT uses two major sources of information, MSAA and its internal OSM.6
LookOUT always uses MSAA as its first source of information. MSAA is also used as a
source of events that indicate, for example, that a pop-up window has appeared. MSAA is
also queried by LookOUT so that LookOUT can determine the type of information. For
example, MSAA can be used to determine the type of element on a form, i.e. whether it is a
checkbox, radio button, etc. MSAA events are also handled by LookOUT in order to
provide context sensitivity.
As noted earlier, not all applications support MSAA and indeed a small set of the
standard Windows controls are not supported by MSAA. When information cannot be
obtained from MSAA, LookOUT uses the OSM. The OSM obtains its information by
hooking the calls made to Windows Graphics Display Interface (GDI), which is used by
Fig. 1. The architecture of the LookOUT Screen Reader.
6 LookOUT uses an OSM developed jointly by Microsoft, ONCE, Baum Products GmbH and euroBraille SA.
This OSM can be obtained as part of an open source project. Further information can be obtained from http://
www.stonesoupsoftware.org
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228220
Windows and applications to construct visual output. By interpreting these calls the OSM
can establish the type of information held at every pixel on the screen.
In some circumstances it is difficult for the screen reader to determine information from
the OSM. The classic example is determining the selected cell in Microsoft Excel. This is
indicated visually using quite subtle means (providing a highlighted box around the cell
and setting the row and column names to bold). The screen reader must be able to read
these rather subtle visual cues and doing so can be quite computationally intensive.
However, if an application provides an OLE interface and the screen reader is aware of
which application it is dealing with, the information can easily be obtained by making a
call on the object model. Continuing with the Excel example, it is relatively simple to use
Excel’s OLE interface to find which cell is currently highlighted. This technique is
computationally efficient and reliable. The use of OLE is not confined to extracting
information from applications. Because OLE provides a means to control an application,
the screen reader can present an alternative interface for blind users that may be simpler
for the user. This issue is discussed later.
5.2. Magnifier
The architecture of a magnifier is presented in Fig. 2. Like the screen reader, the
magnifier has to filter the keyboard to determine whether keystrokes should be directed to
another application or be used to control the magnifier itself. The magnifier also has to
hook the mouse. In general, mouse movement with no buttons selected should be
interpreted directly by the magnifier itself. However, mouse button selections should be
routed to the application.
A magnifier has to carry out two major tasks. Its first task is to capture the graphics
output so that it can enlarge it and the second task is to cause the enlarged image to be
presented on the screen. There are a number of methods available to the designer of a
magnifier to achieve both tasks.
Broadly speaking there are two major techniques for capturing the graphics output.7
The first is to hook the Windows graphics system at some point and to interpret the calls to
create a bitmap equivalent to the output. Fig. 2 implies this technique. The second method
is to allow the Windows graphics system to create its own bitmap and to subsequently
work on this. To make this approach work either the bitmap created by the operating
system has to be redirected from the display card’s video memory to some other set of
memory locations or some technique needs to be used whereby the video image taken
from the display card’s video memory is obscured.
We will discuss two techniques for displaying the enlarged image. One is to make use
of the Windows graphics system. This is achieved by making calls on the graphics system
so that the enlarged image is displayed. Fig. 2 implies this. The other approach is to take
advantage of Direct X (specifically Direct Draw) to present the enlarged image. Direct
Draw is an API developed by Microsoft that allows a programmer to efficiently create and
manipulate graphical images. One interesting feature of Direct Draw is that it supports
overlays. This feature can be used by a magnifier to overlay the enlarged image ‘on top’ of
7 A more detailed description is given in (Blenkhorn and Evans, 2001a).
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 221
the standard video display. This approach is appropriate when allowing the graphics
system to create its image in the display card’s video memory before enlarging the image
(see above).
The magnifier uses MSAA so that it can respond to events, i.e. to provide the context
sensitive behaviour described earlier.
5.3. On-screen keyboards
The architecture of an OSK is shown in Fig. 3.
The OSK has two sources of input, a mouse, when the user uses ‘free selection’, and
one or more switches, when the user is using ‘scanned input’.
The system produces keystrokes that need to be routed to the application. This is
achieved by using the SendInput Windows API call. The OSK needs to be set to be the
Topmost window, this is important when a user is using a mouse.
5.4. Predictor
The architecture of a predictor is shown in Fig. 4. This assumes that the predictor is a
standalone application that is not integrated with an OSK. The predictor uses a keyboard
hook and passes an appropriate set of keystrokes to Windows through the SendInput API
call. The predictor also has to present the set of predicted words to the user.
5.5. Simulated Braille keyboard
The architecture of a simulated Braille keyboard is shown in Fig. 5. From this it can be
seen that the keyboard’s architecture is identical to that of the predictor. The processing
Fig. 2. The architecture of a magnifier.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228222
Fig. 3. The architecture on an On-Screen Keyboard.
Fig. 4. The architecture of a predictor.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 223
requirements are somewhat different. The keyboard filter discards input from all keys that
do not correspond to the eight keys that simulate the Braille keyboard. The keystrokes are
translated into standard text and passed to the application using the Win API’s SendInput
function.
Serial Braille keyboards act as serial input devices and appropriate software must be
written to intercept and process the data that is input.
6. Extending assistive architectures
In this section we examine the ways in which the basic architectures given earlier
can be extended by using techniques that are used in other applications. This section
is divided into two subsections. The first presents extensions to screen reader
architectures that have been implemented and evaluated by the authors. In the second,
we propose extensions to assistive applications; however, these extensions have not
been implemented or evaluated.
6.1. Implemented screen reader extensions
6.1.1. Alternative interfaces using OLE
Certain standard applications can be very difficult to use, even with a well-
designed screen reader. This may be because the information is presented by the
application in a visually complex way. One example is accessing web browsers such
Fig. 5. The architecture of a simulated Braille keyboard.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228224
as Microsoft Internet Explorer. Certain web sites are very complex, especially if they
use multiple frames and have a large number of links. One approach to addressing
this problem is to develop a browser that is specifically designed for blind people,
such as the Brookes Talk browser (Zajicek et al., 1998). A user of such a system is
interacting with an application that has been specifically designed for bind users. As
noted earlier, one of the goals of a screen reader is to give the impression to the user
that he/she is interacting with an application that has been specifically designed for
blind people, but which is based on a standard application. There are some advantages
to using a standard application in terms of maintenance and upward compatibility.
The question is, therefore, is there a way in which alternative interfaces can be
provided for a screen reader that is interacting with a standard application? This can
be achieved by using OLE. As noted earlier OLE allows an application to both obtain
information and to control another application. Because Internet Explorer has an OLE
interface it is possible to develop an application that presents an alternative interface
to the user through a screen reader. This approach has been used by Baum GmbH in
their Web Wizard.8
Our approach (which is similar to Baum’s) is to develop a standalone application that
provides an alternative interface to an application that provides information in a way that is
optimised for screen reading. The application is standalone, rather than integrated with the
screen reader, so that it can be used with any screen reader. Our initial work has focused on
providing alternative interfaces for Internet Explorer and Microsoft Outlook. The Outlook
interface is particularly interesting because not only does it present information that is
difficult to read in Outlook using a screen reader in a ‘screen reader friendly’ way, but also
it provides additional functions that are not available in Outlook itself. Suppose that a user
of Outlook wishes to find his/her next free two-hour appointment on a Friday afternoon.
For a sighted user, it is relatively easy to scan through the days and locate the free time by
sight. The task is quite time consuming for a blind user, because he/she has to step through
considerable amounts of information. One solution is to add an extra function (a ‘next free
slot’ function) that will find a time slot subject to certain constraints (such a length of time,
day of the week, etc.).
6.1.2. Alternative interfaces using SendInput
Not all applications support OLE, but alternative interfaces can be developed by
simulating keyboard and mouse input in a similar way to OSKs, predictors and simulated
Braille Keyboards by using the Win API SendInput call. We give two examples.
In the first we provide an alternative interface to Windows Media Player (WMP).
Specifically we give user control over the Stop, Play and Pause operations through the
numeric keyboard keys 4, 5 and 6, respectively. This is achieved by hooking and filtering
the relevant numeric keyboard keys. The screen reader9 determines the co-ordinates of top
left-hand corner of the WMP’s window using an API call. The screen reader knows the
location of the Stop, Play and Pause buttons relative to the WMP’s top left-hand corner
8 http://www.baum.de/English/webwizard.htm, current 10 Dec 2001.9 Strictly speaking it is not the screen reader, but an application-specific script executed by the screen reader
that carries out the operation. See (Blenkhorn and Evans, 2001b) for further details.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 225
and, using an API call; it moves the mouse pointer to the relevant location and clicks the
mouse button.
In the second example access is given to a multimedia encyclopaedia’s text. The
encyclopaedia uses non-standard controls and information cannot be obtained from
MSAA. The OSM can be used, but the nature of encyclopaedia’s interface can make this
unreliable. Text is obtained by moving the mouse to the encyclopaedia’s text box using the
techniques outlined above. Key combinations are then sent that select the text of the entry
and copy it to the Windows clipboard. The screen reader then reads the text from the
clipboard.
6.2. Proposed extensions
In this section we propose extensions to assistive architectures. These have not been
implemented or evaluated.
6.2.1. Extensions to OSKs
OSKs can be made context-sensitive using the techniques used by screen readers and
magnifiers. In this approach, the OSK uses MSAA or an off-screen model to determine the
current focus and context and presents the user with appropriate options on the keyboard.
When scanning is used, these options can be presented in the positions that are quickest to
access. For example, when the current context is a menu the user can be presented with cells
to control navigation through the menu (for example, up, down, right, return and escape). If
the current context is a control, options can be given that are appropriate to that control. For
example, if the context is spin control, the option Up and Down should be given. We
speculate that this approach may make OSKs faster to use, particularly when scanning.
6.2.2. Extensions to predictors
Context sensitivity can also be used in predictors when the predictor is integrated with a
word processor. Suppose that the user is editing an existing document. If he/she types a
letter at the end of the document, prediction should work as normal. However, suppose the
caret is in the middle of a word, for example between the ‘l’ and the ‘o’ in the word ‘helo’.
If the user now types an ‘l’ (to make the word ‘hello’), it is not appropriate for the predictor
to give options of a word starting with ‘l’. Knowing the position of the caret can prevent
this. After the user has typed the ‘l’, suppose he/she moves the caret one position to the
right (after the ‘o’), if he/she now types a space, the predictor should now try to predict the
next word.
In short, knowledge of the context can prevent the predictor presenting inappropriate
options when the user is editing a document.
6.2.3. Extensions to magnifiers
Magnifiers generally present the current focus as the centre of the display. So, for
example, when a user is writing a document in a word processor, the display is centred on
the caret. If the user is typing a new document (rather editing), half the display will be
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228226
blank. To maximise the amount of contextual information presented to the user, the caret
should be set toward the bottom right of the display. This will present more information on
the current line and the previous lines. If the user is interacting with a control, such as a
checkbox, it may be more appropriate for the focus to be displayed toward the left of the
display, showing the status of the checkbox and maximising the amount of text that can be
seen. Again, this maximises the amount of contextual information. Context can be
obtained from MSAA, OLE or an OSM.
Magnifiers generally enlarge the graphical image presented by the application. As
noted earlier, enlarged text can appear ‘blocky’ and most magnifiers provide smoothing
algorithms that can be selected by the user to reduce the degree of ‘blockiness’. Smoothing
works reasonably well, but it does have a significant computational overhead. When
dealing with text an alternative approach is to determine what the text is (by using MSAA,
OLE or an OSM) and presenting the text in an enlarged font. This is equivalent to changing
the size of the fonts in a word processor and thus the text is smooth. This reduces the
overhead associated with smoothing and may lead to a more readable output. However, the
characters may not necessarily occupy precisely the same relative positions as they did in
the original image. This may cause problems in some applications, but, where the
primarily goal is to present readable text information, it may be an appropriate approach
and one that merits further investigation.
6.2.4. Simulated Braille keyboards
As with any keyboard, prediction could be used (even where contracted Braille is used).
The predicted options would have to be presented to the user in speech or Braille form.
Assuming that the user is a reasonably capable Braille typist, it will almost certainly be
quicker for the typist to continue typing than to select from an option and thus prediction is
not a good option.
7. Concluding remarks
It is clear that the major assistive software applications for Windows machines make
use of similar technology and that there is considerable overlap in their architectures. It is
also clear that the knowledge of the context is very important in screen readers. We believe
that other assistive applications can make greater use of contextual information to provide
more intelligent interfaces to their users. We will be developing tools that incorporate
some of these features.
Acknowledgements
We would like to acknowledge the contribution of Jeff Witt whose knowledge of
assistive architectures for Windows has been of great help over a number of years.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228 227
References
Baude A, Blenkhorn P, Evans G. The architecture of a Windows 9x full screen magnifier using DDI
hooking. In: Marincek C, Buhler C, Knops H, Andrich R, editors. Assistive Technology—Added
Value to the Quality of Life (AAATE 2001). Amsterdam: IOS Press; 2001. p. 113–8.
Blenkhorn P. A system for converting Braille into print. IEEE Trans Rehabil Engng 1995;3(2):
215–21.
Blenkhorn P. A system for converting print into Braille. IEEE Trans Rehabil Engng 1997;5(2):
121–9.
Blenkhorn P, Evans G. Considerations for user interaction with talking screen readers. Proc CSUN
2001 2001a; http://www.csun.edu/cod/conf2001/proceedings/0131blenkhorn.html.
Blenkhorn P, Evans G. The architecture of a Windows screen reader. In: Marincek C, Buhler C,
Knops H, Andrich R, editors. Assistive Technology—Added Value to the Quality of Life
(AAATE 2001), 2001. Amsterdam: IOS Press; 2001b. p. 119–23.
Blenkhorn P, Pettitt S, Evans D. Multi-lingual input to a personal computer using a modified Perkins
Braille writer. Br J Vis Impair 2001;19(1):17–19.
Davis E, Scott PD, Spangler RA. Voice activated control for the physically handicapped and speech
impaired. Speech Tech ‘90 1990;50–2.
Hawes P, Blenkhorn P. Speeding up your switch input. Proc CSUN 1999 1999;.
Marsh K. Win 32 hooks. Microsoft Developer Network 1993; July 29.
Minneman SL. Keyboard optimization techniques to improve output rate of disabled individuals.
Proc 9th RESNA 1986;.
Zajicek M, Powell C, Reeves C, Griffiths J. Web browsing for the visually impaired. Proc 15th IFIP
World Congress 1998;161–9.
Gareth Evans holds a BSc (Hons) in Electrical and Electronic Engineering from
the University of Manchester and a PhD in Computation from UMIST.
He joined UMIST in 1987 and is currently a Senior Lecturer in the Department
of Computation. His research interests include alternative interfaces to
computers for people with disabilities, speech synthesis and assistive devices
for people with a range of disabilities.
Paul Blenkhorn holds a BSc (Hons) in Mathematics from the University of
Manchester.
He has been an active developer of systems for people with disabilities for the
past 20 years and worked at the Open University and at the Research Center for
the Visually Handicapped at the University of Birmingham. He was co-founder
and research director of Dolphin Systems. He joined UMIST in 1991 and is
currently the Professor of Assistive Technology in the Department of
Computation. He has broad research interests in the area of technology and
people with disabilities.
G. Evans, P. Blenkhorn / Journal of Network and Computer Applications 26 (2003) 213–228228