Post on 12-Nov-2014
description
© Copyright by Gang Feng, 2001
SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYE-MOVEMENT
BY
GANG FENG
B. Ed., Beijing Normal University, 1990 M.A., University of Illinois, 1998 M.S., University of Illinois, 1999
THESIS
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Psychology
in the Graduate College of the University of Illinois at Urbana-Champaign, 2001
Urbana, Illinois
SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYE-MOVEMENT
Gang Feng, Ph. D.
Department of Psychology University of Illinois at Urbana-Champaign, 2001
Kevin F. Miller, Advisor
Advances in methods for capturing patterns of eye-movements in reading have not yet
been matched by corresponding methods for turning those data into a comprehensive quantitative
model that is able to account for patterns of reading eye movements.
The primary objective of the current research is to identify a set of mathematical tools
that are able to describe reading eye movements, which are complex time-series data that covary
with linguistic, perceptual, and other variables. A survey of existing quantitative models of
reading eye movements shows that many of the models are unable to account for distributions of
empirical eye-movement. Nonetheless, the variety of modeling approaches also point to
promising solutions to the problem.
Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic,
hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov
model helps to capture the temporal dependency between reading eye movements, and the
hierarchical structure concisely represents the logical relationships between covariate factors,
eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation
is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to
conduct probabilistic reasoning.
A simple model based on the SHARE architecture has been developed. Although it only
takes into account a limited number of covariates and only models dependency between adjacent
eye movements, it nevertheless is able to capture much of the dynamics of reading eye
movements. A simulation study shows that with its simple structure, the model is able to
reproduce the distributions of fixation durations and saccade length, as well as composite eye-
movement variables. Because each reader is modeled individually, analyses of model parameters
for readers of varying age and reading proficiency also shed light on the development of reading
skills.
The SHARE architecture is shown to be flexible enough to characterize both beginning
and fluent reading, which is particularly attractive for the study of reading development. Its
ability to capture eye-movement patterns also opens a wide range of possibilities for real-world
applications of the eye-movement technology.
iii
ABSTRACT
Advances in methods for capturing patterns of eye-movements in reading have not yet
been matched by corresponding methods for turning those data into a comprehensive quantitative
model that is able to account for patterns of reading eye movements.
The primary objective of the current research is to identify a set of mathematical tools
that are able to describe reading eye movements, which are complex time-series data that covary
with linguistic, perceptual, and other variables. A survey of existing quantitative models of
reading eye movements shows that many of the existing models are unable to account for
distributions of empirical eye-movement. Nonetheless, the variety of modeling approaches also
point to promising solutions to the problem.
Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic,
hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov
model helps to capture the temporal dependency between reading eye movements, and the
hierarchical structure concisely represents the logical relationships between covariate factors,
eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation
is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to
conduct probabilistic reasoning.
A simple model based on the SHARE architecture has been developed. Although it only
takes into account a limited number of covariates and only models dependency between adjacent
eye movements, it nevertheless is able to capture much of the dynamics of reading eye
movements. A simulation study shows that with its simple structure, the model is able to
reproduce the distributions of fixation durations and saccade length and to predict eye-movement
iv
variables with reasonable accuracy. Because each reader is modeled individually, analyses of
model parameters for readers of varying age and reading proficiency also shed light on the
development of reading skills.
A distinctive strength of the SHARE architecture is that it makes minimal assumptions
about psychological mechanisms but concentrates on mathematical descriptions of eye-
movement patterns. To the extent that it separates objective descriptions from hypothetical
mechanisms, it presents a way to implement and test a variety of theories of reading eye
movement in a common platform. The SHARE architecture is shown to be flexible enough to
characterize both beginning and fluent reading, which is particularly attractive for the study of
reading development. Its ability to capture eye-movement patterns also opens a wide range of
possibilities for real-world applications of the eye-movement technology.
v
DEDICATION
To My Family
vi
ACKNOWLEDGEMENTS
I would like to recognize those people who have helped me meet this part of the Ph. D.
requirement. I thank the members of my dissertation review committee, Richard C. Anderson,
Cynthia Fisher, George W. McConkie, Kevin F. Miller, and Douglas Simpson for the distinct
expertise that each person brought to the project.
I am greatly in debt to my academic and dissertation advisor, Kevin Miller, who has
given me generous support, intellectually, financially, and emotionally, for the past seven years. I
cannot think of any other labs where I could enjoy the total freedom to pursue my intellectual
interests, the thoughtful and timely guidance, and the extraordinary research facility that Kevin
offered me. His influence on me, both professionally and personally, will be felt in the years to
come.
George McConkie showed me the way to eye-movement research. But more importantly,
he provided me with an example of an extraordinary scholar, an enthusiastic advisor, and simply
a good person. He has never refused a single request for help, no matter how big or small it was.
I cherish every opportunity to work with him, and am grateful for all the help he gave me over
the years.
The other members of my committee also made major contributions to my understanding
of reading, language, and statistical issues in modeling, as well as helping me to clarify my
thinking. Cynthia Fisher introduced me to many new concepts in linguistics and language
acquisition, and has read and given thoughtful comments on many papers over the years. Doug
Simpson made many incisive and constructive suggestions about the statistical aspects of this
project; his patience and encouragement had a major impact on this project. Richard Anderson
vii
has been consistently supportive throughout my career at UIUC, and even made the supreme
sacrifice of returning from his summer home in Wisconsin to a hot and humid Champaign-
Urbana for my final orals meeting. Of course, none of my committee members can be held
responsible for the errors that remain in this project.
The greatest support throughout my graduate program comes from my family. My
parents, Sunqi Feng and Mei Chen, are always confident in me and forever encouraging. No
word can express my thanks to my wife, Xiuhong Cao, and daughter, Jessie. During the dull
moments of thesis writing, those joyful albeit brief after-dinner family moments were the only
source of power that recharged me after the many hours of daily work and carried me through
the long journey.
viii
TABLE OF CONTENTS
TABLE OF CONTENTS............................................................................................................. viii
LIST OF TABLES......................................................................................................................... xi
LIST OF FIGURES ...................................................................................................................... xii
CHAPTER 1. INTRODUCTION .................................................................................................. 1
Describing a Single Reading Eye Movement ..................................................................... 2
Composite Eye-movement Variables: Measuring Local Dynamics................................... 3
Eye Movements as Stochastic Processes ............................................................................ 7
From Measurement to Modeling ...................................................................................... 10
CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS .................................................... 11
“Direct Control” Model and the READER Simulation .................................................... 11
“Attentional Shift” Theory and Reilly’s Connectionist Model......................................... 15
“E-Z Reader” Models ....................................................................................................... 18
“Strategy-tactics” Theory and the Reilly and O’Regan Simulations................................ 26
Mr. Chips: The Ideal Observer ......................................................................................... 33
Stochastic Models by Stark and Suppes .......................................................................... 36
Normal Eye Movements: McConkie and colleagues' mathematical modeling ................ 40
CHAPTER 3. DESIGN PRINCIPLES ........................................................................................ 48
Theory-driven vs. Data-driven Modeling ......................................................................... 48
Deterministic vs. Probabilistic Modeling ......................................................................... 50
The WHEN and WHERE Decisions................................................................................. 51
Linguistic vs. Low-level Variables ................................................................................... 52
ix
Time-series vs. Independent Data..................................................................................... 53
Discrete vs. Continuous Control ....................................................................................... 54
Group vs. Individual Models ............................................................................................ 58
Descriptive vs. Predictive Applications............................................................................ 59
Choosing the Mathematical Tools .................................................................................... 60
CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING...................... 65
Modeling Environment ..................................................................................................... 65
Modeling Data .................................................................................................................. 65
Structure of the SHARE Model ........................................................................................ 66
Temporal Dynamics.......................................................................................................... 73
Model Fitting and Parameter Learning ............................................................................. 74
Model Adequacy and Comparison.................................................................................... 78
CHAPTER 5. SIMULATION RESULTS ................................................................................... 80
Simulation Method............................................................................................................ 80
Distributions of fixation durations .................................................................................... 83
Distributions of Saccade Length....................................................................................... 84
SHARE in Conventional Eye-movement Measures ......................................................... 85
Summary ........................................................................................................................... 87
CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS ........... 89
Previous Research on the Development of Reading Eye Movements.............................. 89
Developmental Analyses Using SHARE.......................................................................... 90
Development of Reading Eye-movement Control............................................................ 91
x
Effects of Input Variables on Eye-movement Control ..................................................... 95
Discussion......................................................................................................................... 98
CHAPTER 7. DISCUSSION..................................................................................................... 100
What is SHARE? ............................................................................................................ 100
What SHARE is Not ....................................................................................................... 102
Composite Variables Revisited: Implications to Psycholinguistic Research ................. 103
Applications in Reading Education ................................................................................ 105
TABLES ..................................................................................................................................... 107
FIGURES.................................................................................................................................... 109
APPENDIX A. PROBLEMS IN THE E-Z READER MODEL ................................................ 223
The Goodness-of-fit Index.............................................................................................. 223
Correlations, Multicollinearity, and Parsimonious Modeling......................................... 227
APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION
DISTRIBUTIONS ...................................................................................................................... 230
Introduction..................................................................................................................... 230
Method ............................................................................................................................ 231
Results............................................................................................................................. 232
Discussion....................................................................................................................... 236
REFERENCES ........................................................................................................................... 238
CURRICULUM VITAE............................................................................................................. 250
xi
LIST OF TABLES
Table 1. Developmental Characteristics of Reading Eye Movements ....................................... 107
Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting .......................... 108
xii
LIST OF FIGURES
Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control ................. 109
Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5. ....................................... 110
Figure 3. Order-of-processing diagram for E-Z Reader 5 .......................................................... 111
Figure 4. Illustration of components of the Mr. Chips model .................................................... 112
Figures 5A and 5B. Landing Position of Fixations During Reading.......................................... 113
Figure 6. Frequency of skipping four- and eight-letter words .................................................... 114
Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site......... 115
Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model .................. 116
Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press) ...................... 117
Figure 10. Graphical representation of the SHARE model ........................................................ 118
Figures 11-1 through 76. Simulating Fixation Duration and Saccade Length Distributions...... 119
Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency .................... 195
Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency.................. 196
Figure 14. Simulated and Empirical Gaze Duration by Word Frequency.................................. 197
Figure 15. Simulated and Empirical Skipping Probability by Word Frequency ........................ 198
Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency
............................................................................................................................................. 199
Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word ............... 200
Figure 18. Developmental Changes in Saccade Targeting Probabilities.................................... 201
Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short,
Medium, and Long Fixations.............................................................................................. 202
xiii
Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and
Long Fixation Durations ..................................................................................................... 203
Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium,
and Long Fixation Durations .............................................................................................. 204
Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next
Word, Fixation Landing Position, and the Previous Saccade Move................................... 205
Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the
Next Word, Fixation Landing Position, and the Previous Saccade Move.......................... 206
Figure 24. BNT Mixture of Gaussian Model Diagram............................................................... 207
Figure 25-1 through 15. Fitting 3rd-grade, 5th-grade, and Adult Fixation Duration with n-
Component Lognormal Mixture Models ............................................................................ 208
1
We are all working toward daylight in the
matter, and many of the discrepancies of facts
and theories are more apparent than real.
(E. B. Huey, 1908, p. 102)
CHAPTER 1. INTRODUCTION
The fact that the eyes travel through a line of text with a series of stops and jumps was
first documented over a century ago (Javel, 1878, cited in Huey, 1908). From the very beginning,
eye movements held great promise for revealing the mental processes involved in silent reading:
These movements [of the eyes during reading] are not only subject to the influence of the
direction of thought as words and phrases are read and assimilated, but they are also
directly concerned in the sensory processes of perception. ... This two-fold relation of
these movements with the control activities on the one hand, and on the other hand as the
necessary accessory to a peripheral organ of sensation gives them an intermediary
position between sensation and recognition and between thought and motor expressions
which is of particular interest for the cues or indices which study of them may give of
some of the workings of the mind. (Dearborn, 1906, quoted in Gray, 1922, p. 173-174)
However, the road from eye movements to the understanding of mental processes has not
been an easy one. What we can learn from reading eye movements depends on our ability to
quantitatively describe them. More than 80 years after Dearborn, O'Regan (1990) outlined the
basic logic for inferring the workings of the mind from eye movements:
The first step in making use of eye movements as a clue to cognitive and perceptual
processes is to proceed backwards: manipulate processing in a known way, and try to
2
understand the accompanying changes in eye movements. Later, when it is known how
eye movements react to processing changes, one can use eye movements to understand
the cognitive and perceptual processing that occurs in particular cases. (O'Regan, 1990,
p. 400)
In other words, the ability to describe eye-movement patterns, particularly how they
change in response to other factors, precedes and limits our ability to understand the
psychological processes of interest.
The central concern of this research is how to quantitatively describe reading eye
movements. The first two chapters briefly summarize some of the previous approaches and
problems associated with them. A stochastic, hierarchical architecture for reading eye
movements (SHARE) is developed and a simple model is implemented using this architecture.
Model fitting and simulation results are also presented.
Describing a Single Reading Eye Movement
Reading eye movements are generally described as an alternating sequence of fixations
(stops) and saccades (jumps). At this level of abstraction, the eye is assumed to be stationary
during a fixation and to make a fast, ballistic movement during a saccade. Oculomotor details
below this level of abstraction are rarely discussed. Two measures of eye movements – fixation
duration and saccade length – are most widely used in the reading literature (Inhoff & Radach,
1998).
The use of these two measures, however, is not without controversies. First of all, the
boundary between saccade and fixation is blurred. The transition between a saccade and a
fixation is gradual, and micro-saccades, tremors, and drifts happen during a fixation (Carpenter,
3
1988; Inhoff & Radach, 1998). Thus, in practice the numerical values of fixation duration and
saccade length depend on many factors, such as the temporal and spatial resolution of the eye-
tracking device and the algorithm that detects fixations and saccades (McConkie, 1981).
Secondly, even at the above level of abstraction, there may be a need for additional
measures. For example, Irwin (1998) showed that linguistic processing is not stopped during a
saccade, and therefore its time should be included when measuring processing time.
Last but not least, eye-movement measures, particularly fixation duration, are often
subject to censoring. It is a common practice to discard fixations, for example, shorter than 100
msec or longer than some threshold. The theoretical motivation for censoring seems to be the
belief that these fixations are not produced by cognitive processes and are thus uninteresting or
unrepresentative (see Inhoff & Radach, 1998, for a discussion). Because extreme scores can
greatly affect means and standard deviations, censoring also has the effect of making these
measures more representative of the data as well as “improving” the significance of statistical
analyses. This is particularly a concern with models that try to fit group data such as averages
rather than individual observations.
In the current study, we focus on the two traditional eye movement measures – fixation
duration and saccade length. No censoring of data is used in the current study, and individual
fixations and saccades are used as the unit of analysis.
Composite Eye-movement Variables: Measuring Local Dynamics
Beyond measuring individual eye movements, reading researchers face the challenge of
quantifying a series of eye movements. Psycholinguists are particularly interested in how eye-
movement patterns change in response to experimental manipulations. This requires a way to
4
summarize the dynamics of processing over multiple eye movements.
This turns out to be a difficult undertaking. Reading eye movements are intrinsically
dynamic. They occur in order, and the characteristics of one fixation depend in part on those of
the previous ones (e.g., Henderson & Ferreira, 1993; McConkie, Kerr, Reddix, Zola, & et al.,
1989). Eye movements also respond in real time to the content under the current fixation, or even
in the periphery (see Rayner, 1998, for a review). Finally, reading eye movements are extremely
variable. In fact, Huey (1908) commented “…the variation [of fixation duration] is so very great
that any average is misleading, and the pauses may really be of almost any length” (p. 33).
The use of composite eye-movement variables is an attempt to summarize eye-movement
dynamics over a short period of time. A composite variable, such as gaze duration or skipping
rate1, is essentially a sample statistic computed from a set of eye movements that satisfy certain
criteria (for example, all fixations that landed on a particular word or word group). This
effectively turns an eye-movement pattern into a single number, which then can be used in
statistical analyses. For example, in a hypothetical psycholinguistic study, a researcher interested
in how word frequency affects reading processes might manipulate the frequencies of some
designated words in the reading materials, and calculate readers’ gaze duration on the
experimental words. These data are then fed to an ANOVA to determine whether readers’ eye
movements were affected by the frequency manipulation.
1 Gaze duration is typically defined as the sum of the duration of all fixations on a word (or a predefined region)
provided that the eye has not left the word (region). Skipping rate is the probability of a word (or region) not being
fixated. A finer distinction may be made between cases where the word was later regressed to and those where the
word was never fixated during the entire reading.
5
This familiar scenario illustrates several problems with the use of composite eye-
movement variables. First, no single statistic can completely summarize the eye-movement
dynamics on a word. Therefore, multiple composite variables have to be computed with the hope
that collectively they will give a full description of the eye movement pattern. In a recent review,
Inhoff and Radach (1998) enumerated at least seven time-related composite variables: single
fixation duration, the duration of the first and second of two target fixations, first fixation
duration, gaze duration, mean fixation duration, total time, and total repair time. Each of them is
a different way of selecting from and summing over the set of fixations on a word (due to space
limitations their definitions are not listed here). New measures have been introduced to select
and sum fixations over time in order to capture additional eye movement patterns (e.g.,
Liversedge, Paterson, & Pickering, 1998). In addition to reading time measures, a variety of
variables have been used to describe saccade patterns, including the probability of skipping,
refixating, or regressing to and from a word (or a region) and the length of saccade going in and
out of a region, among others.
Having too many options may not be advantageous. In practice, it is impossible to search
through all of the composite variables for an effect. Researchers have to rely on rules of thumb to
select a small set of “reasonable” variables, and hope they will capture the desired effects.
Second, the correlations between these measures make it difficult to interpret findings.
One rationale for the multitude of composite variables is that each is sensitive to different aspects
of reading (e.g., Liversedge et al., 1998) or taps into different processing stages (Murray, 2000;
Rayner, 1998). In reality, however, very few of these variables are independent of each other,
and some pairs are often highly correlated. This is not surprising given that the various time-
6
related measures are just different and overlapping ways of selecting from the same pool of
fixation and saccades. As a result, it is difficult to establish a direct link between a composite
variable and a reading process. Similarly, because of the composite nature of the variables2, one
cannot be certain that an effect found in one variable is not caused by others. When an effect in
gaze duration is found, for example, it is impossible to conclude whether the difference is caused
by prolonged individual fixation duration or elevated refixation probability, both of which are
part of the definition of gaze duration. The complex relations between these variables create
obstacles in attributing and interpreting empirical discoveries (see Inhoff & Radach, 1998).
Moreover, the composite variables give the appearance of measures of independent
events, which may mislead researchers. It is easy to forget that the fixation duration is not only
determined by the characteristics of the currently fixated word, but also affected by that of the
neighboring words, for instance, through parafoveal previewing (e.g., Henderson & Ferreira,
1993). The probabilities of refixating and skipping a word are also strongly related to the
location of the previous fixation. Such information is lost when the composite variables are
calculated and entered in statistical procedures such as ANOVA, which are designed for testing
independent samples. When these temporal correlates are excluded from data analysis, one runs
the risk of overestimating the effects of factors related to the foveal words and overlooking
potentially important temporal effects.
2 Strictly speaking, the same problem also applies to measures such as first fixation duration and single fixation
duration. Although they do not involve summation over multiple eye movements, they are in fact contingent on the
fact that the word is being fixated (i.e., first fixation duration is defined as missing for skipped words), and thus are
not statistically independent from other variables. They have to be interpreted in relation with, e.g., skipping rate.
7
Finally, one’s choice of composite variables is often tied to a favored theory of eye
movement control. For instance, researchers who believe that lexical processes drive reading eye
movements tend to focus exclusively on measures related to fixation-duration, whereas
proponents of oculomotor or perceptually-oriented theories pay more attention to saccade
patterns. Some researchers believe that measurement and theory should be tightly bound. For
example, Rayner (1995) complained that many psycholinguistic researchers “probably don't have
a model of eye-movement control in mind. In fact, they probably feel that it's not necessary to
specify a model dealing with where the eye lands. All they care about is that gaze durations are
variable as a function of various linguistic variables” (p.12). Philosophically it may be
impossible to separate measurement from theory. But this does not mean that one has to
subscribe to a particular theory in order to describe eye movements.
Underlying the problem of composite variables is the mismatch between the dynamic,
stochastic nature of reading eye movements and the mathematical tools chosen to represent eye-
movement patterns. As illustrated above, a small set of simple statistics cannot sufficiently
summarize a series of eye movements. And the problem is exacerbated by simply adding
additional composite variables, which causes confusions at both the conceptual and empirical
levels.
Eye Movements as Stochastic Processes
The solution is to describe reading eye movements as stochastic processes rather than
independent events. Reading eye movements may be conceptualized as a series of events
(fixations), each of which may be measured by two continuous variables – fixation duration and
saccade length (of the saccade that follows the fixation). Eye movements are stochastic because
8
the values of fixation duration and saccade length at fixation t are probabilistically determined by
those of the previous fixation at t-1, or even t-2, … etc.
To further simplify the problem, we can code saccades in terms of a finite number of
moves, corresponding to the number of words each saccade covers (e.g., +2 for moving forward
2 words, -1 for moving backward 1 word, etc.). We assume that at the time of planning a
saccade, each move has certain probability of being chosen. Reading saccades are now described
as discrete events (different moves) happening at discrete times (when saccades are made, which
is assumed to be instantaneous under the current level of abstraction).
Such a stochastic system may be well modeled by a classical Markov model. In a simple
first-order Markov model, a system x is assumed to have a finite number of states, Xi, i=1..k (in
our case, there are k possible moves). The system may change from one state to another at a
designated time (making different kinds of saccades), and the probability of being in state Xi at
time t depends only on the previous state but not any earlier state history. Mathematically, the
probability of making move Xi at fixation t is
P(xt=Xi| xt-1, xt-2, xt-3, … x1)= P(xt=Xi| xt-1)
In other words, the probability of the current state is independent of events prior to the last state.
The above model is referred to as the first-order Markov model because the conditional
dependency extends one step back. In a zero-order Markov model, also known as the random-
walk model, the current state is completely independent of any previous history. The system
effectively describes a sample of independent events. It is also possible to derive higher-order
Markov models, in which the current state depends on the previous n states, but computational
cost becomes prohibitive as n increases. The first-order Markov model often offers a good
9
approximation of short-term temporal relations in data.
How would the Markov model help in conceptualizing and describing reading eye
movements (in this case, saccades)? Assuming that a first-order Markov model as described
above is applicable, all the dynamics of eye movements are summarized in the model’s transition
probability matrix. Any saccade-related composite variable, such as skipping rate or average
saccade length (in words), can be mathematically derived from the matrix. In fact, they can be
computed from the marginal probabilities of the transition matrix. With the transition matrix one
may answer much more detailed questions about saccade programming, such as “if the current
saccade is a refixation versus a regression, is it more likely to skip a word in the next saccade?”
Markov models have been used to summarize eye movement patterns in picture viewing and
scene perception (e.g., Stark & Ellis, 1981).
Describing reading eye movements, however, is a different matter. There are at least two
major obstacles to using a simple Markov model for reading. First, the Markov model described
above can only deal with discrete events, but fixation duration and saccade length are continuous
measures. Fixation duration is highly informative in reading research, perhaps more so than in
picture perception studies. Although one may be able to code saccade length as discrete values
(number of words), fixation duration cannot be treated in the same way. How to model
continuous data is a problem to be solved.
In addition, the classical Markov framework assumes a constant transition matrix, i.e., the
transition probabilities remain unchanged. This is unrealistic for reading, because it excludes the
possibility of linguistic or other factors affecting eye-movement programming. One possible
extension of the model is to allow relevant factors to change the transition probabilities. In other
10
words, the transition probabilities are probabilistically dependent on the values of linguistic and
oculomotor variables. The current research is an exploration in this direction.
From Measurement to Modeling
This chapter started by identifying the problem of describing eye-movement patterns as a
critical link for reading eye-movement research. It then pointed out problems associated with the
use of composite variables and argued that reading eye movements should be treated as time-
series data. Nonetheless, it not only does not offer a simple solution for describing eye
movements, but also calls for more sophisticated mathematical models.
The conclusion may come as a surprise, but it sheds light on the nature of the problem.
Describing eye-movement patterns is not a measurement problem. It is squarely in the domain of
mathematical modeling because it deals with numbers – measures of eye movements, not eye
movements themselves. Reading eye movements are complex, so their description requires more
than basic mathematical tools.
In searching for the right tools to model reading eye movements, it is critical to
understand their mathematical properties. There have been a number of quantitative models of
reading eye movements. Although they come from different perspectives, each summarized
constraints and regularities of reading eye movements. They present a natural starting point for
the current exploration.
11
CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS
Much of the history of reading eye-movement research can be characterized by debates
over eye-movement control mechanisms (see Rayner, 1998, for a brief review of different
theories). Until recently, reading eye-movement theories were largely verbal descriptions of
hypothetical mechanisms with some supportive evidence. Testing these theories was difficult, if
not impossible, because they were often too vague and flexible to be disconfirmed by empirical
evidence. The past decade has seen a spurt of quantitative models that specify theories in the
language of mathematics or computer algorithms.
The current chapter reviews previous attempts at quantitative modeling of reading eye
movements, with emphasis on their modeling approaches, including mathematical models,
assumptions about eye movements and model fitting. The goal is to discover facts about reading
eye movements, successful modeling approaches, and reasons for failure. The findings of this
survey suggest principles for the design of the current model.
The survey is not intended to be a review of eye movement theories, although a brief
introduction to the theoretical background of a model is given when necessary. Comments after
each review are only relevant to the current research and are not meant to be comprehensive
discussions.
“Direct Control” Model and the READER Simulation
Just and Carpenter (1980) proposed two assumptions that linked eye movements and
cognitive processes. The immediacy assumption states that a reader tries to interpret each content
words of a text as it is encountered, making guesses if uncertain. The eye-mind assumption
asserts that the eye remains fixated on a word as long as the word is being processed. Together,
12
these two assumptions formed the basis for a reading model in which eye movements, measured
by gaze duration, are controlled entirely by cognitive processes. They supported the two
assumptions with regression analyses of reading eye movements, which showed that gaze
duration could be predicted from linguistic variables.
READER was a computer implementation of their theory of reading and eye movement
control (Thibadeau, Just, & Carpenter, 1982). It was designed to be “a natural language
understanding system that reads the text word by word, and whose processing time on each word
corresponds to the human gaze duration on that word” (Thibadeau et al., 1982, p.158). With
respect to eye-movement control, the only eye-movement variable it attempted to model was
gaze duration, which, according to Just and Carpenter, was equal to the mental processing time
on words.
Model structure. READER was implemented as a LISP program. To give the flavor of
the system, a partial representation of the word "are" in "Flywheels are …" would be in the
following form:
… (WORD2: HAS FEATURE1) (FEATURE1: IS 'A') (WORD2: HAS FEATURE2) (FEATURE2: IS 'R') … (WORD2: IS 'ARE') (WORD2: HAS SUBJECT2) (SUBJECT2: IS WORD1)
…
As a complete comprehension system, READER included a variety of components,
ranging from a lexicon to a schema-based knowledge representation. Reading started with
encoding letters one by one, until the word was found in the lexicon. The ultimate goal was to
produce a summary of the passage it “read.” At any moment lexical, syntactic, semantic, and
13
discourse-level analyses were being carried out concurrently and interactively.
READER’s gaze duration was measured by a linear transformation of the machine cycles
the model spent on processing a word. Just and Carpenter (1980) were explicit about when the
eyes should move: “When the perceptual and semantic stages have done all of the requisite
processing on a particular word, the eye is directed to land in a new place where it continues to
rest until the requisite processing is done” (p. 336). The “requisite processing” could be any
(combination) of the reading processes, for example, lexical access or text integration. What is
considered “required” depends on the goal of reading.
READER assumed a word-by-word reading strategy, targeting the next word in line after
finishing processing the current word. The model, however, did allow word skipping when the
comprehension processes were able to “predict” the next word – when the lexical activation of
the next word was elevated beyond a threshold by other reading processes. The skipped words
turned out to be short function words such as “of” and no content word was ever skipped in the
model.
Parameter estimation. The empirical data for modeling were gaze duration results
obtained from a study in which undergraduate students were asked to read some short scientific
passages, including the “flywheel” passage that READER read. Gaze durations on each word
were first averaged across participants, and then entered as the dependent variable in multiple
regression analyses in order to determine the contributions of various textual factors, such as
word length and syntactic role.
Although primarily a symbolic processing system, READER had quite a few activation
weights, memory decay rates, and thresholds in the system that required parameterization. The
14
authors did not mention how values were assigned, nor did they perform any systematic
optimization of the parameters.
Model fitting. READER’s “reading” performance was evaluated in several ways.
READER did a fair job as a comprehension system because it was able to “recall” a reasonable
amount of information after reading the passage. Thibadeau, Just, and Carpenter (1982) also
compared the effects of various linguistic factors on human and model performances, and
concluded that the effects were qualitatively, and sometimes quantitatively, similar. However,
they did not perform formal statistical tests to support their conclusions. In fact, Carpenter (1984)
argued against overall statistical goodness-of-fit tests and preferred examining mismatches
between the model and data. The only quantitative index of model fit was the correlation
coefficient between human and READER’s gaze duration over the 140 words, which was
approximately r=0.80.
Comments. READER might be a successful model of reading comprehension, but it is
quite limited as an eye-movement control model. The most obvious problem is that it accounted
for only gaze duration and left no explanations for any other eye-movement phenomena. Equally
problematic is the fact that the READER simulation was based on a single 140-word passage.
The model was never extended to “read” other stories, and there was no evidence that it could be
easily generalized to other reading materials.
Methodologically, Kliegl, Olson, and Davidson (1982) pointed out that, because the
independent variables (linguistic factors) were correlated in their regression analyses, the
regression coefficients might not reflect the effects of the factors in the presence of other factors.
The validity of the model is consequently undermined because the READER model was tuned to
15
reflect the effects as shown in the regression coefficients.
“Attentional Shift” Theory and Reilly’s Connectionist Model
In contrast to Just and Carpenter's ambitious project, Morrison's (1984) model was
designed to explain basic eye-movement patterns with minimal assumptions. Morrison suggested
that eye movements were driven by word recognition. It was assumed that during a fixation,
attention would focus on the foveally fixated word until it was recognized. At this moment a
signal was sent to the oculomotor system to start programming a saccade to the next word, while
in the meantime attention shifted to work on the next word based on peripheral visual
information. If the peripheral word was recognized quickly, before the oculomotor system would
finish programming the saccade, this saccade command was cancelled and the oculomotor
system was instructed to program a new saccade to the word after it. Even if the peripheral word
was not completely recognized by the end of the current fixation, the partial processing would
still improve word recognition in the next fixation.
Various modifications to the model have since been proposed (Henderson & Ferreira,
1993; Kennison & Clifton, 1995; Rayner & Pollatsek, 1989; Reilly, 1993). The most recent
version of the Morrison model is the E-Z Reader models (Reichle, Pollatsek, Fisher, & Rayner,
1998; Reichle, Rayner, & Pollatsek, 1999), discussed in the next section.
Reilly (1993) aimed to build a common platform, based on a connectionist framework,
for testing different reading eye-movement control models. He chose a connectionist modeling
approach because of its “ability to model the blending and merging of constraints in lexical
encoding and in the production of saccadic shifts” (p. 210). The Morrison model, termed the
“Attentional Shift Model (ASM),” is the only model implemented in the paper.
16
Model architecture. Reilly’s connectionist model was composed of three main
components: (a) a visual input module, (b) a lexical module, and (c) a saccade programming
module (see Figure 1).
The visual input module mimicked some interesting details of the human retina. It
consisted of a matrix of 26x20 units, representing a horizontal visual field of 20 English letters.
When the model “fixated” on a word, letters within the visual field would activate the
corresponding units. The farther away a letter was from the center of the fovea3, the lower its
overall activation level. In addition, the model implemented two blurring mechanisms – spatial
blurring and category blurring – to simulate decreased acuity for eccentric letters. Reilly's model
provides a fairly intuitive and physiologically plausible account for visual input during reading.
Visual attention was modeled as an inverted “spotlight” on the visual field, which
functioned as a filter that severely suppressed the activation of unattended regions4. Attention
could be shifted by moving the ‘spotlight,” which in turn would modify the visual input and
trigger saccade programming.
The lexical module was a fully connected feed-forward network, which took input from
the visual input module. The network represented 222 word types in the training corpus. During
simulations, a word was considered “identified” if the output activation level became stable.
3 The center of the fovea was the 8th letter position from the left, not the geometric center of the visual field. This
simulated the asymmetric perceptual span (McConkie & Rayner, 1975).
4 Reilly (1993) was unclear about the size of the spotlight, but suggested that it has to be small enough to provide a
relatively noise-free target for saccade programming. He was also vague on how the movement of the spotlight was
guided. Presumably it always jumped to the center of the next word in the periphery.
17
The saccadic control module was a feed-forward network that also took input from the
visual module, and activation levels for each letter position were averaged to simulate low-level
visual information. The two output units represented saccade directions (left and right); their
activation values corresponded to the distance of the saccade, which was used to update visual
input after each saccade was carried out.
Following Morrison (1984) and Henderson and Ferreira (1993), the saccadic control
module was activated either when there was an attention shift or when the fixation “timed out.”5
An attention shift was only triggered when the current word was identified. This lexical access
time, in turn, depended on the frequency of the word in the training corpus. Thus, the decision of
when to move the eyes was primarily lexically based but was affected by the eccentricity of the
word relative to the fovea.
Model training and testing. The connectionist model had approximately 65,000
modifiable weights, and the values of these parameters were set through back-propagation
training. The lexical and saccadic modules were trained independently.
The lexical module was trained using a corpus of three short stories consisting of 222
word types and 863 word tokens. During training, the lexical module learned to identify words at
random “retinal” positions (i.e., the word and the attention “spotlight” were randomly placed).
The training stopped when the lexical network was able to identify 98.7% of the fixated words.
The saccade control module was trained to move to the location of the attention
“spotlight.” Special care was taken in Reilly (1993) so that the proportions of progressions,
5 Henderson and Ferreira (1993) suggested that if during a fixation lexical access was not completed after a
18
regressions, and refixations in the training samples closely matched those found in normal adult
reading. The saccade module was trained to reach an 80% accuracy level so as to mimic the less-
than-perfect performance of the human saccadic mechanism.
Reilly (1993) presented some example output from the simulation study, demonstrating
that the model was able to reproduce a range of empirical eye-movement phenomena, including
skipping, refixations, the word frequency effect, and the penalty of eccentricity viewing. Reilly
(1993) acknowledged that the model was preliminary, and needed fine-tuning to ensure a
quantitative fit to empirical processing time and saccade length measures, particularly their
distributional properties. Therefore, no formal goodness-of-fit testing was performed.
Comments. Reilly’s (1993) neural network implementation of the Morrison (1984) model
is unique among the models reviewed here. The model’s connectionist framework and less-than-
perfect training criteria imply that eye-movement control is probabilistic. In addition,
consecutive eye movements are not independent because parafoveal processing would change
the activations in the lexical unit and thus facilitate or hinder word recognition during the next
fixation. In short, Reilly’s model strongly suggests a stochastic control mechanism of reading
eye movements.
“E-Z Reader” Models
"E-Z Reader" (Rayner, Reichle, & Pollatsek, 1998; Reichle et al., 1998; Reichle et al.,
1999), a series of six computer simulation models, is the latest incarnation of Morrison’s theory.
One of the problems with the original Morrison model is that it predicted that the time to process
deadline, the fixation would be terminated automatically.
19
a parafoveal word, which was the time to execute the current saccade, is independent of the
characteristics of the word under the current fixation. Experimental evidence suggests that
parafoveal processing benefits diminish when the word under fixation is difficult to process
(Henderson & Ferreira, 1993).
To solve this problem, Reichle et al. (1998) proposed that the signal to shift attention and
the signal to program a saccade should be decoupled. Saccade programming was moved to an
earlier point, allowing variable time for parafoveal preview of the next word(s). This is arguably
the most significant change from Morrison's original model. Other improvements included
incorporating contextual predictability to capture effects of higher processes, adding a default
refixation strategy in the oculomotor system, implementing penalties for processing non-
centrally fixated words, and the incorporation of landing position effects (see McConkie, Kerr,
Reddix, & Zola, 1988). The E-Z Reader model is probably the most ambitious modeling
endeavor among all models, therefore it deserves more detailed scrutiny.
One of the most impressive features of the E-Z Reader modeling effort is the way in
which the models have evolved over time. E-Z Reader models were initially built on simplistic
assumptions, and became progressively more complex as more assumptions were added to make
them more psychologically plausible. The “E-Z Reader 1” model included the basic structure of
the models, but did not utilize contextual predictability information and did not have the ability
to simulate within-word refixations. Contextual predictability was incorporated into the “E-Z
Reader 2” model. “E-Z Reader 3” added a mechanism for intra-word refixations. Penalties for
20
eccentric viewing positions were implemented in “E-Z Reader 4 and 5.” “E-Z Reader 66”
(Reichle et al., 1999) is a recent attempt to improve Model 5 by adding the capability to model
the effect of within word landing positions (McConkie et al., 1988). Our discussion focuses on
the E-Z Reader 5 and 6 models as they were considered the state-of-the-art models by the
authors.
Model architecture of E-Z Reader 5. E-Z Reader 5 was composed of a lexical module and
an oculomotor module. In order to decouple the signal for attention shift from that for saccade
programming, lexical access was divided into two sequential processes. The first was the
familiarity check (fc), which corresponded to “a rapid feeling of familiarity” or “matching on the
basis of global similarity” (Reichle et al., 1998) to all entries in the mental lexicon. It was
followed by a process called completion of lexical access (lc), which actually finished word
identification. The signal to start programming the next saccade was triggered at the end of the fc
stage, before the fixated word was completely identified. Attention shift, on the other hand, was
triggered only after the lc stage, when lexical processing is finished.
The oculomotor module also included two sequential processes – (a) an early, labile stage
(m) of saccade programming that could be cancelled by subsequent saccadic programming, and
(b) a later, nonlabile stage (M) in which saccades could no longer be cancelled. The original
Morrison model did not have a mechanism for refixations. To explain refixations, Reichle et al
6 Reichle, Rayner, and Pollatsek (1999) had refused to call it “E-Z Reader 6” because they considered it an
incremental improvement over the E-Z Reader 5 rather than a qualitatively different one. However the name “E-Z
Reader 6” appeared in data tables. It is referred to as ‘E-Z Reader 6” in this paper, because the addition of landing
position modeling significantly changed the basic architecture of E-Z Reader 5.
21
(1998) hypothesized a default refixation mechanism that was essentially the same as that of
Reilly and O’Regan (1998, 1998): the oculomotor system was assumed to plan refixation at the
beginning of each fixation, which was subject to cancellation by a progressive saccade triggered
by lexical processing.
As in all Morrison family models, reading phenomena in the E-Z Reader model result
from variations in the mixture of different processes that take different amounts of time to
complete different processes. With respect to the lexical processes, it assumed that the
processing times for both fc and lc were linear functions of the logarithm of word frequency,
albeit with different slopes, which allowed more parafoveal processing time for high-frequency
words (see Figure 2). Additionally, the fc and lc processing times were also functions of
contextual predictability and eccentricity of words relative to the retina. To avoid determinism,
random variation was explicitly introduced. The lexical processing times were assumed to follow
Gamma distributions, with standard deviations equal to one third of their means.
For the oculomotor system, the times to complete the labile and nonlabile programming
processes were assumed to follow Gamma distributions with means of 150 msec and 50 msec,
respectively, and standard deviations of 1/3 of their respective means7. The oculomotor
processing times were independent of lexical processes.
The E-Z Reader model was able to generate fairly complex eye-movement behaviors.
The computer simulations were implemented as stochastic finite state machines, as illustrated in
7 The Gamma distributions were chosen because they showed similar shapes to the empirical distributions. All
Gamma distributions in the E-Z Reader series had standard deviations equaled to 1/3 of their means. The ratio was
picked for convenience by the authors.
22
Figure 3. Each of the square boxes represents a possible state of the whole system, which is a
combination of the states of the lexical and the oculomotor modules. There were 14 states in E-Z
Reader 5. The model moved from one state to another if one of the processes terminated and a
new process started. The arrows on the diagram mark legal transitions from one state to another.
For example, at State 1 the lexical system was doing familiarity check on word N (f(n)) while the
oculomotor system was planning a refixation on word N (r(n)). If after some time the labile
programming stage (r(N)) of refixation to word N ended and turned into nonlabile programming
(R(N)), the system now would move from State 1 (f(n) r(n)) to State 2 (f(n) R(n)).
It should be emphasized that although the lexical processes may appear to “drive”
reading eye movements in the model, every decision was in fact a result of an interaction, or
more precisely competition, between the lexical and oculomotor processing time. This is clearly
illustrated in Figure 3.
Improvement of E-Z Reader 6. The primary motivation of the E-Z Reader 6 model
(Reichle et al., 1999) was to extend the E-Z Reader 5 model to account for landing position
effects (McConkie et al., 1988). McConkie et al. found that saccades tend to overshoot targets
closer than approximately 7 letter spaces and undershoot those farther than 7 letter spaces. The
magnitude of this systematic error was in the range of 0.5 letters per letter PSL. The landing
positions were also subject to random error, which follows a Normal distribution. The longer the
distance of a saccade the greater the variance in the Normal distribution.
These effects were implemented in E-Z Reader 6 with a pair of linear regression
formulas. For a given planned saccade length (PSL, the distance between the current fixation
position and the center of the intended word; same as launch site in McConkie et al., 1988), the
23
actual saccade length was
Saccade length EPSLPSL mb +Ψ⋅−Ψ+ )(= ,
where Ψb=7 and Ψm=0.4 were fixed parameters derived from McConkie et al.’s (1988) study,
and E was a normally distributed random error with a mean of zero and standard deviation given
by8:
PSLmb ⋅+= ββσ
where βb and βm were free parameters to be estimated.
Parameter estimation and model fitting. E-Z Reader 5 was modeled on a corpus of adult
reading data (Schilling, Rayner, & Chumbley, 1998). Words in the corpus were classified into
five categories based on their word frequency. Six eye-movement variables were calculated for
each of the categories: (1) mean gaze duration, (2) mean first fixation duration, (3) mean single
fixation duration, (4) the mean probability that the word was skipped, (5) the mean probability of
making a single fixation, and (6) the mean probability of making two fixations. Model
parameters were estimated based on these 30 means.
An E-Z Reader model was essentially a Monte Carlo simulation. It took texts, coded in
terms of word frequency and contextual predictability, and traveled through the state transition
diagram (Figure 3) by random sampling from the Gamma distributions. The simulations were
run 1,000 times and the above six eye-movement measures were calculated from the simulated
“eye-movement” data.
8 McConkie et al. (1988) estimated that the standard deviation was a cubic function of PSL (see discussions on
Reilly & O’Regan’s model in the next section). Reichle et al. (1999) apparently simplified it to a linear function.
24
Model fitting was done using a “grid search” procedure, which involved repeated Monte
Carlo simulations with different parameter values that covered the whole (or a reasonable part9
of the) parameter space. The parameter values that maximized the overall fit between the model
and empirical data were reported.
EZ-Reader is clearly the most ambitious and systematic attempt to date to model control
of eye-movements in reading. At the same time, two serious shortcomings in E-Z Reader’s
parameter estimation and model fitting led to problems in the model-fitting program. These
problems are briefly summarized here; further discussion can be found in Appendix A.
First, the computation formula for the goodness-of-fit measure, as described in Reichle et
al. (1998), contains two errors. Reichle et al. mistakenly squared one of the elements in the
formula, which, instead of normalizing differences, scaled the differences by as much as 100
times. In addition, they used standard deviations when standard errors (of the means) should be
employed, which resulted another unintended scaling in the magnitude of about 50. The resulting
RMS values, measuring how much variation was left after model-fitting, were reported as
statistically nonsignificant, but should have been highly significant.
This computational mistake can help to explain another puzzle in the evolution of the E-Z
Reader models: the goodness-of-fit measure, RMS, did not improve much, and sometimes even
dropped, when new structures and free parameters were introduced. Reichle et al. ignored this
warning sign and based their model selection on theoretical arguments rather than on fit with
data.
9 Reichle et al. (1998, 1999) were vague on how they chose the range of parameter space.
25
Another problem with the modeling effort was a severe multicollinearity in the measures
being fit. I analyzed the basic data for the E-Z Reader modeling, which consisted of 30 means of
eye-movement variables. As shown in Appendix A, all six eye-movement measures were so
highly correlated in the empirical dataset that after a principle component analysis, a single
factor explained 94.6% of variance, and three factors accounted for 99.999% of total variance. In
effect, the free parameters in E-Z Reader 1 through 6 were estimated on only 5 points. In
addition, the first component was also a linear function of (log-transferred) word frequency.
Thus, the only “correct” model based on this dataset of 30 means would be “any eye-movement
measure is a linear function of log-transformed word frequency.” Given that this linearity was
built-in since E-Z Reader 1, it is not surprising that the later models did not improve model fits.
Comments. At the conceptual level, the E-Z Reader model represents a substantial
improvement of the original Morrison (1984) model. In particular, two new mechanisms
proposed by Reichle et al. (1998) – the decoupling of attention-shift and saccade signals and the
default refixation strategy – enabled the model to simulate more phenomena than the original
Morrison model. On the other hand, there is as yet little empirical evidence to support the two
new assumptions. Their psychological plausibility remains to be seen.
As a quantitative simulation endeavor, E-Z Reader has major limitations. Besides the
mathematical errors, fitting the model on a small set of means proved to be very problematic.
Even if there were not the multicollinearity problem in the data and the modeling were carried
out correctly, there would be still no guarantee that the model really described reading eye
movements. In fact, it would almost certainly not capture the distributional characteristics of
fixation duration and saccade length, given the arbitrary use of gamma distributions.
26
“Strategy-tactics” Theory and the Reilly and O’Regan Simulations
O'Regan (1990) suggested that the oculomotor guidance system works according to the
following two heuristics:
1. Between-word strategy. Readers fixate on a word until the completion of lexical access
or some other significant stage of recognition. Then they pick a target word from the right
periphery, attempt to move to the generally optimal viewing position (word center) of the word.
In other words, triggering of the between-word saccades is under the control of ongoing
psycholinguistic processing, but word targeting is simply an oculomotor process.
2. Within-word tactics. If the landing position is too far from the generally optimal
position, the system immediately makes a saccade to the other side of the word, and then returns
to the between-word strategy. These tactics are purely oculomotor phenomena and fixation
duration and saccade length are independent of psycholinguistic factors.
Most models assume a word-by-word reading strategy, but word targeting in the
Strategy-tactics model is flexible. O’Regan (1990) presented analyses based on a “careful, word-
by-word” reading strategy, but also explored alternative scanning routines. An important
challenge for the strategy-tactics theory is to find the word-targeting strategy used in normal
reading.
The Reilly and O’Regan (1998) simulation study was an attempt to answer this question.
The study was based on McConkie et al.’s (1988) finding that the distributions of landing sites
on a word tend to follow a normal distribution. Reilly and O’Regan (1998), however, noticed
that the there were systematic mismatches between the observed distributions and the predicted
normal curves. They argued that the mismatches resulted when the over/undershooting fixations
27
ended up landing on neighboring words. They further predicted that different word-aiming
strategies (e.g. “jump to each successive word,” or “skip high frequency words”) would result in
different patterns of over/undershooting, and therefore different patterns of deviation from the
normal curves. By simulating different word targeting strategies and comparing the simulated
landing position distributions to empirical data, Reilly and O’Regan (1998) hoped to identify the
most likely word aiming strategy in reading.
Reilly and O’Regan (1998; 1998) hypothesized at least six potential word-targeting
strategies, which fell in two categories – oculomotor strategies and linguistic strategies. The
oculomotor strategies do not require any lexical processing in selecting the next word. They
included (1) Random Control10, (2) Word by Word (WBW), (3) Target long word (TLW), and
(4) Skip short words (SSW). The linguistic strategies included (5) Skip high-frequency word
(SHFW) and (6) Attention shift (AS). The first five strategies are self-explanatory based on their
names. The AS strategy was the Rayner and Pollatsek (1989) version of the Morrison (1984)
model without the Henderson and Ferreira (1993) deadline hypothesis.
Model architecture. All word-targeting strategies were simulated within the same basic
framework and differed only in the strategy used for selection of the next target word. Like E-Z
Reader, Reilly and O'Regan's model was implemented as a finite-state simulation program.
There were three main modules in the model – a lexical system, an oculomotor system for
generating refixations, and a saccade triggering system. Before going into details of the modules,
let us first get a flavor of how the simulation worked.
10 The Random Control strategy was not modeled because it was rejected outright as impossible.
28
At the onset of a fixation on a word, the lexical and the oculomotor systems worked in
parallel. The latter would start to prepare a refixation by default. When the lexical process was
completed, it would program a progressive saccade, the target of which was determined by the
word-targeting strategy being modeled. When the refixation generation process finished, it
would program a refixation. Eye-movement commands such as "move forward" or "stay" were
taken by the saccade-triggering module, which handled the oculomotor details of saccade
programming. Each programmed saccade took a random time to be triggered. Thus, during each
fixation there was a competition between "move forward" and "stay," and the result depended
probabilistically on the processing times of the three modules.
The above illustrates two interesting features of the Reilly and O'Regan's (1998)
simulation. First, although the goal was to simulate landing position distributions, processing
times played the most significant role during the simulations. Thus, the Reilly and O'Regan
simulations qualify as comprehensive eye-movement models. Second, the default-refixation
mechanism clearly reminds us of the E-Z Reader model. In fact, despite the heated debates
between the strategy-tactics and Morrison’s theories, they were remarkably similar when
implemented as quantitative models, as will be seen in the following discussion of model details.
In the Reilly-O’Regan model, the average lexical identification time was a linear function
of the logarithm of word frequency. It was also a function of the length of the currently fixated
word and landing position eccentricity. Individual lexical access times followed a normal
distribution, whose standard deviation was 1/10 of its mean (chosen for convenience).
29
Refixations have a special importance in the Strategy-tactics theory11. The probability of
refixation was a function of word length and eccentricity of landing position (McConkie et al.,
1989). The time to prepare a refixation was a linear function of eccentricity (off-center fixations
resulted in shorter refixation latencies) but was independent of word frequency. It was assumed
to be normally distributed with a standard deviation of, again, 1/10 of its mean.
The time between programming and actually triggering a saccade – the oculomotor delay
– was assumed to be a random variable12 with a mean of 150 msec and a standard deviation of 50
msec, and was not affected by lexical or any other processes.
The landing position of a saccade was a normally distributed random variable whose
mean and standard deviation were determined according to the original McConkie et al. (1988)
formulas:
m= 3.3 + 0.49 d ,
sd= 1.318 + 0.000518 d3 ,
where d is the distance (in letters) between the launch site and center of the intended word, which
was effectively the PSL in the E-Z Reader 6 model.
Parameter estimation. Most parameters of the model were fixed. They were assigned
11 Interestingly, Reilly and O'Regan (1998) did not specify where refixations are targeted. It is possible that, like
inter-word saccades, they all aim at the center of words. However, O'Regan (1990) maintained that refixations tend
to land on the opposite side of the launching site. There is no basis in Reilly and O'Regan to judge how this was
implemented in their simulations.
12 Reilly and O'Regan (1998) did not state the distributional form of the oculomotor delay. I assume it is a normally
distributed random variable, just like all other random variables in the model.
30
either on the basis of previous findings or with convenient values. There were, however, a few
free parameters, all of which were part of the word-targeting strategies. For example, in the
Target Longest Word (TLW) strategy one had to determine the size of the visual field from
which the "long" word would be picked. When there were one or more free parameters, Reilly
and O'Regan (1998) picked some reasonable and convenient values and ran the simulation
multiple times. There was little systematic parameter estimation.
Modeling results and Model testing. Simulation materials were taken from the same text
as in McConkie et al. (1988); only word length and frequency information were used. For each
strategy, 20 trials were run with different random seeds. For each simulation, analyses similar to
McConkie et al. were conducted. Simulated landing site distributions were subtracted from the
hypothetical normal distributions for individual words. The authors looked at the patterns of
discrepancies for each word-targeting strategy and searched for ones that were close to the
empirical pattern.
Simulation results were reported mostly qualitatively. Reilly and O'Regan (1998) did not
perform any statistical test to compare the fit of models based on different strategies because the
strategies had different numbers of parameters and might not be readily comparable. The only
quantitative measure of the models' goodness of fit with empirical patterns was correlation
coefficients13, along with statistical tests of whether each was significantly different from zero.
Reilly and O'Regan relied heavily on the magnitude of the correlation coefficients to choose the
most likely word-targeting strategy.
13 The "concordance measure (rc)" in Reilly and O’Regan (1998) was a correlation coefficient. When there were free
31
Findings of the simulations were complicated and will not be reported here in detail. The
Word-by-Word strategy was shown to fit the data poorly. As for Morrison’s Attention Shift
model, Reilly and O'Regan concluded that there was not enough time to identify words in the
parafovea with the attentional shift mechanism14, and that the details of the AS model might need
some revision15. Reilly and O'Regan (1998) favored the “Target the Longest Word” strategy.
They concluded, “The results, therefore, suggest that the eye-movement guidance system does
not generally use linguistic information, but exploits word-length information in the right
parafovea to target the next saccade” (p.316).
Comments. These conclusions, however, are highly suspicious because of several
methodological and conceptual problems. The first concern is whether Reilly and O'Regan's
findings were robust. The effects they tried to model (deviations of fixation position distributions
from normal distributions) were very small. Comparing models based on these statistics thus
becomes very tricky. With an arbitrary simulation sample size of n=20, the statistical power of
these tests is very questionable. In addition, the normal distribution hypothesis was a convenient
parameters and there was a "grid-search", rc's of all simulation trials were reported in a table.
14 Reilly and O'Regan rejected an alternative explanation that the time estimates for word identification were too
long. They argued that the lexical processing time estimates were based on those of Rayner & Pollatsek (1989, p.
176), which had been shown to be quite reliable and was supported by other sources. Without direct evidence, this
argument does not seem strong. In fact, even if individual parameters of lexical processing time were accurately
estimated, the overall time could still be an overestimate. See later discussion on the use of regression coefficients
when independent variables are correlated.
15 Reilly and O'Regan suggested adding contextual predictability to reduce lexical identification time, which,
interestingly, was exactly one of the new features in Reichle et al.'s (1998) E-Z Reader models.
32
modeling choice16 in McConkie et al. (1988). Suppose the actual landing position distribution
was a slightly positively skewed distribution (e.g. a lognormal distribution), it might well require
a word-targeting strategy other than TLW to produce a pattern that would match the empirical
data.
The second problem is the use of a correlation coefficient rc as the goodness-of-fit index.
Given that Reilly and O'Regan were modeling a fairly small effect, all deviations would be close
to zero and thus correlation coefficients would be expected to be low and variable. Choosing a
model on the basis of absolute values of correlation coefficients, as Reilly and O'Regan did, is
risky. There is no guarantee that a model with r= 0.34 is statistically better than one with r= 0.30.
A better goodness-of-fit indicator is needed to evaluate Reilly and O'Regan's conclusions.
In addition, many modeling decisions were quite arbitrary. The assumption that
processing times are normally distributed implies that fixation durations, the sum of the
component times, would also be normally distributed. This contradicts the well-known fact that
fixation durations, like reaction times, follow a positively skewed distribution that systematically
differs from normal (McConkie, Kerr, & Dyre, 1994). Similarly, most of the parameters in the
model were fixed to convenient values rather than being systematically estimated from data. A
different set of values may yield a different conclusion.
At the conceptual level, it is unclear why readers would necessarily follow a single word-
targeting strategy. It is conceivable that the eye may be attracted by a host of different features,
such as word length, orthographic structure (Liversedge & Underwood, 1998), or the likelihood
16 Reilly and O'Regan dismissed the choice of distribution other than Normal as "unparsimonious."
33
of being identified parafoveally (Brysbaert & Vitu, 1998). There may also be individual
differences in word-targeting strategies. If these are true, Reilly and O’Regan’s attempt to
identify strategies is doomed to fail. A more fruitful approach seems to be to describe directly
how readers actually target words in reading, instead of presupposing any fixed strategy.
Mr. Chips: The Ideal Observer
The ideal observer models take a different modeling approach from the previous ones.
“An ideal observer is an algorithm that yields the best possible performance in a task that has a
well-specified goal…” (Legge, Klitz, & Tjan, 1997, p. 525). In other words, an ideal observer
model begins by specifying a goal and task constraints and tries to find an optimal solution. Its
objective is not to describe human data but to compare human performance to that of the optimal
algorithm. “The ideal observer provides an index of task-relevant information by showing the
performance level that can be achieved when all of the information is used optimally.
Comparison of human performance to ideal performance can establish whether human
performance is limited by the information available in the stimulus or by information-processing
limitations within the human” (p. 525).
Mr. Chips (Legge et al., 1997), a computer simulation program, attempted to identify the
optimal strategy for saccade programming that minimizes uncertainty in word recognition. In the
simple world Mr. Chips lived in, reading had one goal – to identify each and every word – and
two constraints – the limited visual acuity of the retina and inaccurate control of eye movements.
Mr. Chips attempted to “read” a word list with the minimum number of saccades and identify
each word in order. This was achieved by carefully calculating the best landing position of the
next saccade so as to minimize uncertainty in word identification. Its calculation was based on its
34
lexical knowledge, the (partial) information from its "retina," and characteristics of the
oculomotor system. Note that Legge et al. did not try to simulate the temporal dimension of
reading17.
Model architecture. As shown in Figure 4, Mr. Chips had three main modules – the
retina, the lexicon, and the oculomotor system.
Mr. Chips' retina consisted of three regions: (a) high-resolution vision in which letters
can be identified, (b) low-resolution vision (relative scotomas) in which spaces can be
distinguished from letters but letters cannot be identified, and (c) blind spots (absolute scotomas)
where there is no vision.
Mr. Chips had a lexicon composed of the 542 most common words in written English,
along with their relative frequencies. The reading materials (word lists) were randomly sampled
from Mr. Chips' lexicon.
At the core of Mr. Chips was the algorithm for calculating and minimizing uncertainty
about the current word. This was done in two steps. Based on the partial visual information from
the retina (some identified letters and word length), Mr. Chips extracted from the lexicon a list of
candidate words. If the list had more then one word (i.e., the word could not be uniquely
identified) Mr. Chips would compute an entropy value, an index of the amount of uncertainty,
based on the frequencies of the candidate words, for every possible landing position of the next
saccade (most likely refixations) and select the movement that was most likely to identify the
word. This is the "entropy-minimization principle" underlying the ideal-observer model.
17 Legge, Klitz and Tjan (1997) did include a section discussing the "reading speed" of Mr. Chips, but this speed was
35
Like humans, Mr. Chips' saccade execution could be imperfect. In one version of the
model, its saccade length followed a normal distribution. Mr. Chips had to incorporate this
statistical information into saccade programming.
Parameter estimation. Because it is an ideal-observer model, Mr. Chips’ parameters were
manipulated by the modeler rather than estimated from data. For example, Legge et al. (1997)
explored the effects of smaller vocabulary size and abnormal retina on reading saccade
programming. Parameters were not estimated from human data.
Modeling results. The virtue of an ideal-observer model is not how well it approximates
behavioral data, but how it can help to understand human behavior. Several human eye-
movement phenomena, such as refixations, regressions, word skipping, etc., emerged from
following the simple entropy-minimization algorithm. Mr. Chips also showed an “optimal
viewing position” – it tended to land on the third letter position on a word.
Interestingly, Legge, et al. (1997) showed that the “eye-movement behaviors” of Mr.
Chips could be characterized with a few simple heuristics, despite the complex internal
mechanisms of the model. For example, Legge et al. (1997) demonstrated that almost identical
performance could be obtained when only word length information was used. This is consistent
with the finding in reading literature that eye-movement guidance is primarily based on word
boundary information (McConkie & Rayner, 1975; Rayner, 1986). Legge and colleagues also
showed that Mr. Chips’ eye-movement strategies, such as the optimal viewing position effect,
could be summarized by a set of simple if-then heuristics. Together these findings suggest that an
estimated from its saccade length by assuming an average 250 msec fixation duration.
36
eye-movement control system may achieve optimal reading performance without actually doing
expensive entropy calculations or using high-level information.
Comments. The Mr. Chips model sheds light on some important issues in modeling eye
movements. It demonstrated that eye movements could be described at a behavioral level
separate from the underlying mechanisms. Another important insight is that simple discrete
algorithms (“targeting word centers”) could achieve near optimal performance compared to the
costly “continuous” control (“minimizing entropy”). These became important design principles
for my research.
Stochastic Models by Stark and Suppes
Two scholars, notably not mainstream reading researchers, have tried to describe reading
eye movements with stochastic models (Stark, 1994; Suppes, 1990, 1994). Both of them chose to
use Markov models (see the first chapter for a brief introduction) to capture the dynamics of eye
movements.
Scanpath theory of reading. Based on his research on scanpaths (Hacisalihzade, Stark, &
Allen, 1992; Stark, 1994; Stark & Ellis, 1981; Zangemeister, Sherman, & Stark, 1995), Stark
(1994) proposed that the sequence of reading fixations could be modeled as a Markov process, or
a “scanpath.” Stark proceeded by treating each word in a text as a possible state and describing
reading as going through a series of states. The probability of jumping from one state (word) to
another constituted a Markov transition matrix, and the transition matrix could fully describe the
stochastic properties of reading fixation sequences. Further more, Stark introduced string-editing
distance (Wagner & Fischer, 1974) as a measure of the similarity between two fixation
sequences, which could be desirable for reading research.
37
Comments on the scanpath model. Stark’s scanpath model has been largely overlooked in
the reading research community. One of the reasons is that the way Stark formulated the Markov
transition matrix originated from picture perception studies and might not be suitable for reading
research. By setting each word as a state, Stark implied that the eye might jump from a word to
any other word in reading. While this is possible, such wild saccades are very rare in reading.
Compared to picture viewing, reading is a much more constrained task, where the eyes almost
always move to adjacent words and wild jumps are rare. It is more intuitive to consider a more
localized Markov process, in which the possible moves of the eye are limited to nearby words.
Suppes’ Stochastic model. Suppes' (1990, 1994) reading eye-movement control model
provides a relatively comprehensive treatment of eye movements – modeling both fixation
duration and saccade programming – and thus is discussed in more detail.
The stochastic model was derived from Suppes’ earlier models of eye movements in
doing multi-digit arithmetic (Suppes & et al., 1983). The reading counterpart consisted of two
increasingly complex models – the minimal-control model and the text-dependent probabilistic
control (TDPC) model. In the minimal-control model, Suppes attempted to simulate fixation
duration as a pure random variable that was not affected by on-going reading processes. In
contrast, saccade direction and size were under complete cognitive control18. The minimal-
18 Suppes (1990) was inconsistent about this. Despite the facts that (a) the axioms unequivocally showed that
saccade targeting was determined by the underlying cognitive processing, and (b) he clearly stated that “direction
and size of saccade are under cognitive control in this minimal model” (p. 466), Suppes maintained the following:
“It was assumed that most of the process is an automatic low-level process, little disturbed by cognitive and
linguistic aspects of reading. The two basic assumptions of the minimal control model were (a) durations of
38
control model did not cover many empirical findings, therefore a revised model, the TDPC
model, was derived to “take into account the local variables that have the largest effects on eye
movements” (p. 472). Because the revised model does not change the fundamental architecture
of the “minimal control” model, the following discussion is primarily based on the initial model.
Model architecture. Suppes’ models were defined in terms of axioms, or fundamental
hypotheses about the principles of eye-movement control. A system of axioms was then
translated into mathematic functions, for instance, a distribution density function of fixation
duration. Some of the axioms would undoubtedly surprise mainstream reading researchers. For
example,
AXIOM F1. The execution time of each eye-control instruction is independent of past processing and the present stimulus context.
… AXIOM D1. If processing is complete in a given region of regard,
then move to the next word of text.
… AXIOM D5. A saccade is independent of past motion and earlier
stimuli.
With respect to fixation duration, the axioms implied that it should be a mixture of an (a)
exponential random variable and (b) a convolution of two identical exponential distributions.
For saccade programming, Suppes proposed a Markov model that was more intuitive
than Stark’s scanpath formulation. He categorized saccade moves into five states: move forward,
regress, refixate, skip the next word, and others. According to the axioms in the minimal-control
fixations are not affected by the content of the reading text, and (b) the length of saccades is not influenced by text
context but only by the physical layout of the page” (p. 465).
39
model, saccade programming was a zero-order Markov process, also known as a “random walk.”
At any time point in time, the probabilities of making the five moves were constants,
independent of previous states19.
The revised TDPC model added only one change to the fixation duration axioms – the
execution time of each eye-control instruction decreases monotonically along the line of text
(Heller, 1982). Factors that have been central to other models, such as word frequency or
syntactic effects, were dismissed as having “only relatively small effects” (Suppes, 1990, p. 473).
More changes were made to the axioms for saccade control, incorporating the effects of the
optimal viewing position, word length, and syntactic difficulty. However, these patches were
added in such a haphazard fashion that it became impossible to evaluate the mathematical
properties of the model.
Parameter estimation, model testing, and model comparison. The distribution of fixation
durations was a fully parameterized mathematical model, which had been fitted to eye-
movement data from Suppes’ arithmetic experiments. Models with the best fitting parameters
showed a “reasonably good” fit, but Suppes acknowledged that they would have been rejected by
a formal goodness-of-fit test. He did not report the fitting of any reading data. There are reasons
to believe that the fit would not be better than that of the arithmetic data20.
19 Suppes (1990) was not consistent on the nature of the Markov process. While he clearly intended to promote a
random-walk model (p. 467), a few axioms referred to an undefined concept of “processing.” Depending on the
outcome of the processing, different saccadic moves might be taken. This violated the basic assumptions of a
random-walk mode.
20 Suppes (1990) acknowledged that reading fixation duration was typically less variable than those in doing
40
Suppes did not develop the saccade control system in any depth beyond the five axioms.
This part of the model was not explicitly expressed in a mathematical form. No quantitative test
of the models was given in Suppes (1990; 1994). The choice of the TDPC model over the
minimal control model was based solely on theoretical analyses.
Comments. Although an extremely limited attempt, Suppes (1990; 1994) outlined the
possibility of Markovian models in describing reading eye movements, both fixation duration
and saccades. An obvious problem with the Markov models in both Stark’s (1994) and Suppes’
models is that they were not flexible enough to take into account other factors, such as word
frequency. A Markov model with a hierarchical structure will be explored in the current research.
In addition, Suppes’ model is one of the first attempts to explicitly model the distribution
of fixation durations. Although it failed (McConkie & Dyre, 2000), it called much needed
attention to the importance of modeling not only the means but also their distributions.
Normal Eye Movements: McConkie and colleagues' mathematical modeling
The goal of McConkie and colleagues' research is best summarized by the title of
McConkie, Kerr, and Dyre (1994) – “What are ‘normal’ eye movements during reading: toward
a mathematical description.” Some of their representative studies include the modeling of
landing position distributions (McConkie et al., 1988; Radach & McConkie, 1998), refixation
frequencies (McConkie et al., 1989; Radach & McConkie, 1998), skipping rates (Kerr, 1992;
McConkie et al., 1994), regressions (Vitu & McConkie, 2000; Vitu, McConkie, & Zola, 1998),
arithmetic, therefore an exponential-based model may not work well. Furthermore, the mixture distribution Suppes
proposed typically shows two modes, but reading fixation duration distribution is usually unimodal.
41
and distributions of fixation durations (McConkie & Dyre, 2000; McConkie et al., 1994).
Summarizing this line of research turns out to be difficult, because models for individual
components are still evolving and pieces of the model have not been completely put together.
Nevertheless, the central theme of this line of research is to mathematically describe regularities
and constraints that are inherent in eye-movement data. Many of its findings have become the
foundations of other modeling efforts (e.g., Reichle et al., 1998; Reilly & O'Regan, 1998).
McConkie and colleagues decomposed the problem of reading eye-movement control
into two separate decisions: (a) where to move the eyes and (b) when to move them. With respect
to the WHERE decision, a further distinction has been made between where the eyes are
intended to go and where they actually land. Therefore there are three main components in
McConkie and colleagues’ eye-movement control model: saccade target selection, saccade
execution, and fixation duration control.
Saccade execution. McConkie et al. (1988) found that the landing positions of fixations
relative to a word was a bell-shaped curve centered near the center of the word (see Figure 5A
and 5B). The shape of the curve could be approximated with a normal distribution, whose mean
and variance were functions of the launch site (planned saccade length, PSL, in Reichle et al.,
1998) and word length, among other factors. McConkie et al. (1988) proposed that saccades
were targeted at word centers but missed the targets because of two sources of error in the visuo-
motor system. A saccadic range error was responsible for the systematic overshooting of near
targets and undershooting of far away targets. A random placement error caused the random
spread in landing positions. Together the landing position distribution could be summarized with
a linear regression function, as discussed in the E-Z Reader model and the Reilly and O’Regan
42
model.
McConkie, Kerr, and Dyre (1994) concluded that landing position was not under the
control of higher levels processes. McConkie et al. (1994) reported that the landing position
distributions on pseudo-words or nonsense letter strings, embedded in continuous text, were
essentially the same as those for normal words. This was further confirmed in Radach and
McConkie (1988), which found that landing position distribution was affected by word length
and word position in a line, but not by the duration of the previous fixation or the
“informativeness” of the initial trigram of the next word. These findings suggested that saccade
execution should be modeled independently from cognitive processes.
Saccade target selection. An essential assumption in McConkie and colleagues’
framework is that eye movements are targeted at the center of words when they are planned.
Which words are selected to be the targets, then, becomes the key question. Three types of eye
movements are particularly interesting – refixations, word skipping, and regressions.
1. Refixations. McConkie et al. (1989) examined the frequency of refixating a word
immediately following the first fixation on it. Based on a large corpus of reading eye
movements, they found that the frequency of refixation is a U-shaped function of the initial
landing position on the word. The probability of making a refixation is higher if the eye lands
near the ends of a word then at the word center. McConkie et al. concluded that the initial
landing position is the primary determinant of refixations. In addition, Radach and McConkie
(1998) analyzed landing positions as a function of launching site for both forward and regressive
saccades and concluded that there is no evidence for different mechanisms, which questioned the
basic hypothesis of the strategy-tactic theory (O'Regan, 1990).
43
2. Skipping. McConkie, Kerr, and Dyre (1994; see also Kerr, 1992) found that the
frequency of skipping the next word could be expressed in a three-parameter function21:
BLaunchSiteAeMinMaxskipp −×+
−−=
11)(
where Max is the maximum of the curve and equals 1, Min is the minimum value reached by the
function, A controls how rapidly the function rises, and B is the inflection point of the curve. The
parameter values depended on word length, as shown in Figure 6.
McConkie, Kerr, and Dyre (1994) hypothesized a word-skipping mechanism based on
the concept of a visual clarity threshold that must be met for a word to be skipped. The above
equation could be interpreted as the proportion of words exceeding the threshold for a given
distance (measured as launching site). Brysbaert and Vitu (1998) proposed a similar theory based
on the “Extended Optimal Viewing Position (EOVP)” effect (Brysbaert, Vitu, & Schroyens,
1996), where the eye guidance system constantly estimated the probability of recognizing a
peripheral word within typical fixation duration. The system would probabilistically skip words
that were highly likely to be recognized at the end of the current fixation. Brysbaert and Vitu
(1998) obtained good fit to empirical skipping rate data with a one-parameter model.
Determining whether or not to skip a word is only part of saccade programming. To
complete the picture one needs to know how the saccade targeting system selects among many
potential targets. Neither McConkie et al. (McConkie et al., 1994) nor Brysbaert and Vitu (1998)
21 McConkie, Kerr, and Dyre (1994) presented the equation in a equivalent but slightly confusing form:
BLaunchSiteAeMinskipp −×+
−+=
111)(
44
addressed this issue.
3. Regressions. The phenomenon of regressions has been less well understood, in part
because of the long-held belief that they were results of comprehension break-down and thus
should be excluded from analysis (e.g., Reichle et al., 1998). Most recently, McConkie and
colleagues (Radach & McConkie, 1998; Vitu et al., 1998) have made some intriguing discoveries
about regressions. Vitu et al. found that both low-level factors (e.g., the length of the previous
saccades) and linguistic factors (e.g., word frequency of skipped words) affected the likelihood
of regressing after a word is skipped. Their results indicated that the phenomenon is complex and
is unlikely to have a single cause.
Radach and McConkie (1998) looked at the question of whether regressions are
generated by a different mechanism from that which produces other kinds of saccades. The
analyses of launch site effects showed that there was little systematic range error in interword
regressions (see Figure 7). Regressive refixations, on the other hand, show the same range of
errors and random errors as forward saccades do. Their results indicated that the control of
interword regressions was functionally different from that in making forward saccades or
refixations.
Fixation duration. Early attempts to model the distribution of fixation durations have
been incomplete and unsuccessful (Harris, Hainline, Abramov, Lemerise, & et al., 1988; Suppes,
1990), in part because their model choices were mainly based on theoretical speculations22. In
22 Suppes’ (1990) fixation duration model was derived from the axioms, which had no empirical evidence (at least
in reading research). Harris et al. (1988) presumed that saccade latency involved two (independent) consecutive
processes. This is logically possible, but there has not been experimental evidence to support it.
45
contrast, McConkie, Kerr, and Dyre (1994) and McConkie and Dyre (2000) emphasized the
inherent constraints in the data.
McConkie, Kerr, and Dyre (1994) studied the hazard function23 of the first fixation
duration distribution, and found it could be approximated by three piecewise linear functions – a
slow-rising early piece, a fast-rising period, and a flat, constant tail. Their subsequent modeling
effort capitalized on this characteristic form of a hazard function.
Like Harris et al. (1988), McConkie, Kerr, and Dyre (1994) hypothesized a two-step
process – ordering a saccade and executing a saccade. They further assumed that once a saccade
was ordered, there was a random waiting time before the saccade was executed. The random
waiting time was assumed to follow an exponential distribution24. The time to order a saccade
was modeled by a mixture of two Weibull components with linear, raising hazard functions (for
23 A hazard function, loosely speaking, characterizes the instantaneous probability of an event happening given that
it has not yet happened. Formally, it can be defined as a function of the probability density function, f(t):
∫−= t
tf
tfth
0)(1
)()(
Luce (1986) demonstrates that, compared to the cumulated probability function or the probability density function,
the hazard function was more readily interpretable and was more sensitive in differentiating distributions.
24 Interestingly, in Harris et al.’s (1988) model, the exponential component, the “β-period,” corresponded to the
wait-time for ordering the next saccade, not the executing time. McConkie et al.’s (1994; McConkie & Dyre, 2000)
interpretation is problematic because a mechanism with exponential wait-time would to be too unreliable to carry
out saccadic movements, one of the most frequent movements in humans. In reaction time literature, there had been
similar confusions, and the consensus now is that the exponential component corresponds to cognitive or signal
processing rather than to the execution (see Luce, 1986).
46
discussion of the Weibull distribution, see Johnson, Kotz, & Balakrishnan, 1994). There was no
theoretical reason to choose the Weibull distributions except that they characterized the empirical
hazard functions. Putting the two steps together, the distribution of fixation durations (sum of
ordering and executing times) was the convolution25 of the two components. This “two-stage
mixture” model fitted the empirical distribution very well, as seen in Figure 8, although no
goodness-of-fit statistics were reported.
Following this initial success, McConkie and Dyre (2000) explored two additional
models – a “two-state transition” model and a “two-stage race” model. Although the three
models, including the 1994 “two-stage mixture” model, have different assumptions about the
underlying mechanisms that determine fixation duration, they were designed to closely mimic
the piecewise linear hazard function of the empirical data. Consequently, they fit empirical data
equally well. There was no evidence that one mechanism was more plausible than another.
Comments. While there has not been a unified model, this line of research has
contributed much quantitative knowledge to our understanding of reading eye movements. The
power of the data-driven modeling approach is self-evident as two competing models – the E-Z
Reader 6 model (Reichle et al., 1999) and Reilly and O’Regan’s (1998) model – both
implemented McConkie et al.’s (1988) formulas.
With respect to saccade programming, McConkie et al.’s (1988) proposal of a two-level
saccade control model has been widely accepted. In this hierarchical model, cognitive effects are
25 The distribution of the sum of two random variables is the convolution of the two distributions. Mathematically,
dxxtgxftht
fg )()()(0
−⋅= ∫+
47
confined to the level of selecting of target words, and have only discrete control – selecting
which word but not where in the word to land the eyes. The continuous nature of saccade length
is a result of random and systematic errors, and saccade execution is conditionally independent
of higher processes. This conceptualization greatly simplified the interpretation of saccade
control in reading. The SHARE architecture is an extension of this probabilistic, hierarchical
structure.
McConkie et al.’s (1994; McConkie & Dyre, 2000) modeling of fixation duration
distribution is also inspiring. The reason for their unprecedented successes is not a superior
theory or mechanism, but their data-driven modeling approach – the choice of using piece-wise
linear models to estimate empirical hazard functions. This suggested that one might go a step
further and question the only major a priori mechanism hypothesis in their models, the
assumption of the saccade ordering and executing steps.
48
CHAPTER 3. DESIGN PRINCIPLES
The previous chapter surveyed some of the previous attempts to quantitatively account
for reading eye movements. Their successes and failures illustrate some important issues that any
quantitative model trying to describe reading eye movements has to address. A modeler has to
make conscious decisions about them. The choices will constrain his or her modeling
approaches.
Eight such issues are presented below as dichotomies, although the choices are often
neither mutually exclusive nor limited to two. They represent the decision process through which
the current model has been shaped, and provide a framework for presenting the rationale for the
basic modeling choices made in the research to follow.
Theory-driven vs. Data-driven Modeling
Rayner (1995, see chapter 1) raised an important issue – do we need a theory of eye
movements in order to measure and describe them? The question may be pursued in two senses:
whether we should try to describe eye movements without subscribing to a particular theory, and
whether we are able to do so.
My response to the first question is that we should try to develop a theory-neutral
descriptive framework for eye-movements, to the extent we can. Current theories of reading eye-
movement control – e.g., the strategy-tactics theory (O'Regan, 1990; O'Regan & Jacobs, 1992)
and theories based on Morrison (1984; e.g., Rayner & Pollatsek, 1989) – are collections of
hypotheses about the underlying mechanisms and processing. While these hypotheses are
inspired by empirical findings, there is no evidence that any particular theory is indisputable.
The field of reading eye movement research has not reached a stage where theories are well
49
established and few facts are left to be found. On the contrary, as some most recent studies
suggest (e.g., McConkie & Dyre, 2000; Shillcock, Ellison, & Monaghan, 2000), we are just
starting to discover some of the basic constraints and regularities of eye movements. At this
point, our observations should not be limited and biased by existing theories and models.
The extent to which we can describe reading eye movements without subscribing to a
particular theory is an empirical question. The SHARE architecture is an attempt to model eye
movements with a minimal number of assumptions about the underlying mechanisms and
processes. The current research approaches the problem by analyzing the logical constraints for
the modeling task, carefully selecting the mathematical model, and employing powerful
algorithms to estimate model parameters. The goal of the model is to capture the “essence” of
eye movement patterns so that it can reproduce eye movements with the same pattern, or predict
the next fixation, among other things.
What can we gain from an “atheoretical26” model, assuming it does achieve its goal? First
of all, such a data-driven modeling approach is just an extension of several lines of successful
research looking for structures in the eye movement data. By using a more powerful
26 The term is used in contrast with a model based on a particular existing theory, in particular a theory that heavily
emphasizes on hypothetical mechanisms. There is no such thing as atheoretical modeling. Every mathematical
operation imposes, explicitly or implicitly, structure and assumptions on the subject matter, and these assumptions
are part of the theory. Consider, for example, why the model “1+1=2” fails to model the volume of a cup of sugar
mixed with a cup of water, or what a better-fit model “1+1=1” (more correctly f(1,1)=1) reveals about the
underlying mechanism of the above mixing process. The assumptions of the current model will be discussed in the
rest of this chapter and the next chapter.
50
mathematical model (see discussion in Chapter 1) more should be learned about the inherent
regularities in the data. Secondly, although the model does not hypothesize about the
mechanisms, it tests whether a mathematical structure is adequate to describe some aspect of eye
movements, which in turn constraint potential mechanisms. Last but certainly not least, the
ability to faithfully describe eye-movement patterns will enable many applications of eye-
movement methodology that were previously unavailable.
In short, a data-driven modeling approach is a valuable way to contribute to our
understanding about reading eye movements, and at the current state of knowledge it is a much-
needed complement to the development of eye-movement mechanisms. The rest of the chapter
discusses some of the important modeling decisions in choosing the modeling structures and
tools.
Deterministic vs. Probabilistic Modeling
There is enormous variation in reading eye movements. One may try to account for every
bit of the variation in a model, or assume at least part of the variation is due to random
fluctuation. The models surveyed in the last chapter vary along this dimension. The READER
model (Thibadeau, 1983; Thibadeau et al., 1982) exemplifies the deterministic approach, where
variation in gaze duration was precisely determined by the intricate comprehension processes. At
the other extreme, Suppes (1990; 1994) hypothesized that fixation duration was a pure random
variable independent of any other factors.
Most models took the middle ground, but the sources of random variance were
introduced very differently. The noise in Reilly’s (1993) connectionist model was built into the
neural network architecture and training. Both the E-Z Reader and the strategic-tactics models
51
introduced arbitrary (and different) random variance to lexical and oculomotor processes. It is
particularly interesting for the E-Z Reader model, because Morrison’s original model was
presented as a deterministic machine. Neither model took the step to verify that their models
have probabilistic characteristics similar to the empirical data27. In contrast, distributional
properties of random components, such as means and standard deviation, were directly taken
from McConkie and colleagues’ estimates (McConkie & Dyre, 2000; McConkie et al., 1988;
McConkie et al., 1989).
The most illuminating example on the issue of deterministic versus probabilistic
modeling is Mr. Chips. The basic model was purely deterministic. Every move was carefully
calculated to minimize lexical uncertainty. However, the outcome of the complex deterministic
process could be modeled with surprisingly simple probabilistic heuristics. It suggests the
strength of probabilistic modeling, even if there is a complex deterministic underlying
mechanism. The current research employs a probabilistic framework.
The WHEN and WHERE Decisions
The WHEN and WHERE decisions refer to the mechanisms that determine fixation
duration and saccade length, respectively. Not all models reviewed above considered both
dimensions. Of those that did, the READER (Thibadeau et al., 1982) assumed a single
mechanism – reading comprehension – determined both, whereas in Suppes (1990) the two
27 Reichle et al. (1998) showed figures of distributions of simulated and empirical fixation duration measures and
claimed that they were similar without any quantitative support. The fittings were far from satisfactory compared to
McConkie and Dyre’s (2000) work. The simulated distributions would almost certainly be rejected as appropriate
models if any statistical analysis were performed.
52
decisions were completely independent. In both E-Z Reader and strategy-tactics models the two
decisions were made through interactions between the lexical and the oculomotor systems.
There is strong neurophysiological evidence that there exist two separate pathways, one
carrying spatially coded information and the other conveying the triggering signal of saccades
(e.g., van Gisbergen, Gielen, Cox, Brujins, & Schaars, 1981). Behavioral data also support the
separation of the two pathways (Kingstone & Klein, 1993; Walker, Kentridge, & Findlay, 1995).
These motivated Findlay and Walker (1999) to model the two pathways as a loosely coupled
parallel system, in which cognitive factors may affect both pathways but via different
mechanisms.
Whether the WHERE and WHEN pathways are closely or loosely coupled systems has to
be determined empirically. As a general architecture, the two pathways should be represented
separately, while still allowing interdependencies between the two systems. On the other hand, a
modular model, in which subsystems are only loosely connected, seems to be more desirable for
model fitting and interpreting. Therefore, in the SHARE model the two pathways are
implemented as separate subsystems that can be statistically dependent on each other. But the
first model built on the basis of SHARE will assume they are conditionally independent
subsystems. Whether or not they should be modeled as stochastically dependent processes is a
question to be answered by the fit of the model to empirical data.
Linguistic vs. Low-level Variables
There is no doubt that eye-movement decisions are not independent of what is on the
page. But whether eye movements are driven by high-level linguistic variables (e.g. word
frequency and contextual predictability) or by low-level visual factors (e.g. word length and
53
landing position) is under theoretical debate. This is clearly reflected in the various quantitative
models, each of which proposed some idiosyncratic set (including the empty set in the case of
Suppes’ model) of variables that determine fixation duration and saccade targeting.
The strategy for the SHARE architecture is to give all variables equal opportunities, and
let data determine which variable is relevant to which eye-movement outcome. As a first step,
the current implementation includes two relatively uncontroversial variables, namely the
frequency of the currently fixated word and the length of the next word (see Rayner, 1998),
which represent linguistic and low-level information, respectively. The model is not limited to
these two variables, however. It is designed to make it easy to incorporate other variables
without changing the fundamental structure of the model.
Time-series vs. Independent Data
Eye movements occur in order, therefore they naturally constitute time-series data. Most
eye-movement research tries to summarize eye movements using statistical models designed for
independent samples, for example, by using composite variables and analysis of variance.
However, unless one can prove eye movements are time-independent, they should be modeled as
time-series data. In other words, the burden of proof is on those who treat eye movements as
independent samples.
There have been attempts to study the temporal relations of eye movements. Several
studies calculated autocorrelations among eye movements and found them to be negligible
(Andriessen & De Voogd, 1973; Hogaboam, 1983; Rayner & McConkie, 1976). However, a zero
correlation coefficient does not guarantee statistical independence. There is empirical evidence
that eye movements are not independent samples. For example, regressions are more likely to
54
occur after long forward saccades (Andriessen & De Voogd, 1973). McConkie et al. (1988;
1989) found that various aspects of an eye movement (e.g., probability of word skipping) depend
on the characteristics of the previous eye movement (e.g., landing position and launch site).
The survey of quantitative models leads to a similar conclusion. Although Suppes’ (1990)
minimal-control model assumed that both fixation duration and saccade moves were
independent, identically distributed random variables, all other models treated fixation duration
and saccade length as time-dependent.
In conclusion, there is no strong a priori reason to believe eye movements can be
modeled as independent samples. Therefore, reading eye movements should be modeled as time-
series data. On the other hand, most temporal connections proposed in the literature are relatively
short term – in most cases between adjacent eye movements. This suggests a relatively simple
stochastic model may be sufficient to capture these relations.
Discrete vs. Continuous Control
Eye-movement data – fixation duration and saccade length – are continuous, but that does
not necessarily preclude the possibility that they were “intended” to be discrete. For example,
Radach and McConkie (1998) argued that saccade programming is discrete. They suggested that
saccades are targeted at word centers, and the spread of landing position is a result of errors in
the oculomotor system (McConkie et al., 1988; O'Regan, 1990; Radach & McConkie, 1998; see
also Rayner, 1998). The discrete-control model is in contrast with continuous-control theories
(e.g., Liversedge & Underwood, 1998), in which eye movements are directly aimed at particular
locations in words.
Theoretical debates aside, the discrete-control conceptualization offers some advantages
55
from a modeling point of view. For example, it insulates the effects of cognitive factors from
saccade execution details, so that the subsystems can be modeled separately. A discrete
stochastic system is also easier to model than a continuous one, and is often more interpretable.
One concern with the discrete-control approach is that the underlying mechanism may be
truly continuous. The Mr. Chips model sheds some light on this issue. The Mr. Chips model was
a strict continuous-control model, in which saccade length is meticulously calculated to
maximize information. However, the saccadic “behaviors” could be well modeled as outcomes
of a probabilistic, discrete control system in which eye movements were directed to the optimal
viewing position of each word. Therefore, to the degree that descriptions of eye movements can
be separated from the possible underlying mechanisms, a discrete-control model provides at least
a good approximation of the eye movement outcome. Because of its relative simplicity and the
likelihood that continuous data can be modeled via discrete underlying processes, it makes sense
to begin with a discrete model of eye-movement control.
While a discrete-control theory for saccade programming (McConkie et al., 1988) has
been widely accepted, fixation duration, on the other hand, has almost always been assumed to
be under continuous control. Our survey shows that the most popular, unchallenged assumption
is that fixation duration (e.g., first fixation duration or gaze duration) is a linear function of the
logarithm of word frequency (Just & Carpenter, 1980; Reichle et al., 1998; Reilly & O'Regan,
1998; see also Rayner, 1998, for a review). In some quantitative models (Reichle et al., 1998;
Reichle et al., 1999; Reilly, 1993; Reilly & O'Regan, 1998), it is also a continuous function of
landing position (eccentricity), word length, and duration of the previous fixation.
In fact, there is empirical evidence hinting a discrete control system in the WHEN
56
pathway. Distributional analyses of fixation duration have shown that linguistic factors such as
word frequency (McConkie, Reddix, & Zola, 1992) or semantics (Feng, Miller, Zhang, & Shu,
2001) tend to have strong effects on some fixations and little effects on others. These findings
contradict traditional continuous-control models based on linear regressions (Reichle et al., 1998;
Reilly & O'Regan, 1998; Thibadeau et al., 1982), which assume linguistic factors affect all
fixations by changing the means of fixation durations.
The clearest demonstration of the existence of different kinds of reading fixations is Yang
and McConkie (in press), in which they experimentally manipulated the information readers
could perceive at any given fixation using the eye-movement contingent display change
technique (McConkie & Rayner, 1973). The manipulations to the text ranged from extreme (such
as blanking the whole page or replacing a line of text with X’s) to modest (replacing text with
non-words or filling all spaces with a symbol). Yang and McConkie found three categories of
fixations (see Figure 9). The first group included short fixations (shorter than approximately 125
msec), which occurred even when all visual information was removed. The second group peaked
at approximately 175 to 200 msec. These fixations did not require linguistic information but the
content being fixated needed to be “text-like.” For instance, the position of the peaks of these
fixations were largely unaffected when a line of text was replaced with X’s but the spaces were
preserved, but the distributions were severely suppressed when the spaces were removed. Lastly,
there was a group of long fixations that peaked roughly at around 350 msec and extended well
beyond 700 msec in some cases.
Corroborating evidence for the existence of three distinct types of fixations also came
from oculomotor research. Gezeck, Fischer, & Timmer (1997) also found, in simple saccadic
57
reaction time experiments, three distinct categories of fixations – “express” (90-120 msec), “fast
regular” (135-170 msec), and “slow regular” (200-220 msec). Interestingly, the three peaks are at
the same position for naive and trained subjects but the weights differ, with more express
saccades for trained subjects. The positions of the peaks differed from those in Yang &
McConkie (in press), not surprisingly given the task differences, but both strongly suggest the
existence of different categories of fixations, each having distinct parameters and possibly
responding to different information.
To determine whether fixation duration in normal reading can be modeled with a discrete
model, I fitted a mixture-of-lognormal model to fixation duration from a large dataset (details of
the study are presented in Appendix B). The hypotheses are similar to the discrete-control
framework for saccade programming. The mixture-of-lognormal model assumes a two-level
fixation duration control system. At the “control” level, there are n discrete categories of
fixations, each having different parameters (e.g., intended duration). For each fixation, the
control system chooses the appropriate kind of fixation and sends the command to the “output”
level. At the output level, the command is carried out but with random error added, which is
assumed to follow lognormal distributions (the justifications are discussed in Appendix B). Thus,
over the long run, the distribution of all fixation durations follows a mixture of lognormal
distributions.
To summarize the findings, the distributions of fixation durations can be very well fitted
with a 3-component mixture-of-lognormal model. This model not only fits group data from
children and adults, but also fits individual distributions (these results are presented in detail in
Chapter 5). Most importantly, the parameters of the three classes of fixations are largely
58
consistent with the estimates from Yang & McConkie (in press). This suggests that the good
fitting achieved by the 3-component lognormal-mixture model is not coincidental. Based on
McConkie et al. (1988) and the above fixation duration modeling study, both WHERE and
WHEN pathways are modeled by a hierarchical probabilistic model, where eye-movement
commands are discrete at the control level and random errors come into play at the output level.
Group vs. Individual Models
Individual differences28 in reading eye movements are enormous, and they were probably
the very reason why the eye movement method attracted early researchers (Buswell, 1937; Huey,
1908). The value of the eye-movement methodology, especially in reading education, largely
depends on our ability to describe and understand these individual differences.
Nonetheless, practically all models of eye movement control are designed to eliminate
individual differences so as to model an “average skilled reader.” An understandable argument is
that after the general mechanism is discovered, individual differences may be accounted for by
simply adjusting some model parameters. Although this is not an unreasonable modeling
approach, there is no sign that many of the existing models can be easily modified to
accommodate individual differences. For example, in most of the models in the survey, the rules
(e.g., the axioms in Suppes, 1990), mechanisms (e.g., familiarity check versus lexical completion
in Reichle et al., 19988), and constraints (e.g., minimizing lexical uncertainty in Legge et al.,
1997) are hard-coded. It is unlikely that the same rules, mechanisms, or constraints will apply to
28 The term “individual differences” is used loosely here to represent both inter-personal differences and intra-
personal differences under different situations, e.g., reading for different purposes.
59
each individual under every circumstance.
As a descriptive model, the current model is designed to be flexible – it can be used to
describe group as well as individual eye-movement data. It imposes as few hard-coded
constraints as possible so that it can be maximally flexible in accounting for variance in eye
movements. In the meantime, its hierarchical framework helps to structure individual
differences, captured in model parameters, in a meaningful way.
Descriptive vs. Predictive Applications
The original motivation for developing the descriptive model was to use it in a predictive
application – detecting processing difficulties during reading. The idea was that if we could
faithfully describe the different eye-movement patterns during normal reading versus reading
difficulties, we would be able to predict whether the reader was experiencing processing
difficulty based on a sample of his/her reading eye movements. Furthermore, if the diagnosis can
be done accurately and quickly enough, it may be possible to provide real-time assistants to
readers who experience difficulties in reading.
There are several major obstacles in achieving this goal. Firstly, the eye-movement model
has to be flexible enough to capture both normal and troubled reading. Most previous theories or
models were unable to do this (e.g., E-Z Reader models excluded regressions). The current
model is designed to be able to accommodate a wide range of eye-movement patterns.
Secondly, prediction or diagnosis requires the model to be individualized; a set of
predefined criteria will not fit all readers. This is especially critical because the application is
intended for children, whose reading proficiency and eye movements vary substantially. In a
real-world computer assisted reading instruction setting, the system needs to quickly adapt to a
60
particular reader, preferably within a few practice trials. Learning a model from sparse and
incomplete data is computationally challenging because parameter estimates become unstable
and possibly biased. One of the most promising solutions to this problem is to incorporate prior
domain knowledge to guide parameter estimation (Heckerman, 1998). For example, if the reader
is a third-grade student, what we know about third-grade readers’ reading eye movements should
be used to help estimating the parameters for this particular reader.
Finally, a computer assisted reading system needs to support probabilistic decision-
making. Given a set of parameters and observed eye movements, it needs to probabilistically
decide whether or not the reader was in trouble. Previous models do not have a mechanism to
perform this task. The current model is designed to support such probabilistic classifications.
Choosing the Mathematical Tools
This chapter identified the goals and task constraints of the current model. The model
attempts to summarize reading eye-movement patterns mathematically while being neutral about
eye-movement control mechanisms as much as possible. The eight design principles enlisted
above have outlined its basic structure – a hierarchical, stochastic model that fully supports
individualization and probabilistic decision-making.
What mathematical tools will serve these needs?
The Markov models (see Chapter 1) are a natural choice for modeling stochastic
processes (e.g., Bengio, 1999). Suppes (1990; 1994) used a zero-order Markov model
(independent) for fixation duration and saccade targeting. Stark’s scanpath employed a first-
order Markov transition matrix to describe reading fixation sequences. However, as discussed
previously, classical Markov models have at least two limitations: they are only suitable for
61
modeling discrete events, and they do not allow the hierarchical structure necessary for
modeling reading eye movements.
In light of these problems, I chose to use the Hidden Markov decision tree (HMDT)
model (Jordan, Ghahramani, & Saul, 1997; Jordan, Ghahramani, Jaakkola, & Saul, 1998). The
HMDT is a marriage between a Hidden Markov model (HMM; Rabiner, 1989) and a
Hierarchical Mixture of Experts (HME) model (Jordan & Jacobs, 1994).
The HMM is a class of Markovian models known for its successful applications in
automatic speech recognition. It is a two-layered (representing two random variables)
probabilistic model that unfolds over time. The “state” variable is assumed to be unobservable
and follows the classical Markov process (thus the term Hidden Markov); the “output” variable
is observable and is conditionally independent of everything except for the concurrent value of
the state variable. Temporal dynamics are captured by the discrete, unobservable state variable,
whose value is probabilistically revealed by the observed output variable. For example, words
are composed of phonemes; phonemes are discrete, abstract categories that are not directly
observable, but they are probabilistically related to the observable acoustic waveforms. One way
to do speech recognition is to model this relationship with an HMM, where phonemes
correspond to the different states of the state variable, the output variable represents various
acoustic features of the speech, and words are characterized by different state-transition
probabilities, i.e., phoneme sequences. The goal of the HMM is often to probabilistically
determine the most likely value or value sequence of the (unobservable) state variable from a
given sequence of input, i.e., to “recognize” phonemes or words from the waveforms. In order to
do this, the HMM has to be “trained” with training data to optimize model parameters – the
62
recognition accuracy clearly depends on the model’s ability to capture the statistical regularities
in data.
The HME is a probabilistic decision tree model for classifying independent samples.
Statistically it is closely related to the multinomial logit modes, a special form of generalized
linear models (GLIM; McCullagh & Nelder, 1983). In its simplest form (e.g., see the HME
example in Murphy, 2001), the HME may be reduced to a piece-wise linear regression model.
However, its power lies in the hierarchical structure, where there are multiple layers of “gating”
variables, or “experts.” As the input goes down the hierarchy of “experts,” the data space is
recursively divided, until at the end the final categorization is reached. Thus, HME outperforms
pure linear models and other models in complex data clustering tasks (Jordan & Jacobs, 1994).
The HMDT architecture integrates the best features from both HMM and HME models. It
may be viewed as an HMM with multiple “state” layers instead of one, which makes it possible
to model more complex control mechanisms. Alternatively, it can also be seen as an HME with
temporal structures, which allows it to model not only independent data but also time-series data.
The current model uses a three-layer HMDT model, which is also known as the Input-
Output Hidden Markov model (IOHMM; Bengio, 1999; Bengio & Frasconi, 1996). In the
IOHMM terminology, word frequency, word length, and landing position are “input” variables,
and fixation duration and saccade length are “output” variables. Between the input and output
layers is the eye-movement control layer, represented as the “state” variables in the IOHMM.
Looking at the static structure, the generation of eye movement commands (word targeting and
fixation categories) at the control layer is probabilistically affected by the linguistic and visual
input variables, and the actual eye movements, the output variables, are probabilistically
63
controlled by the eye movement commands. In the temporal dimension, an eye-movement
decision is probabilistically based on not only the current input variables but also on the previous
eye movement.
It should be noted, however, that the SHARE architecture is not limited to one layer of
eye movement control. For example, in order to model the fact that eye movement patterns are
different when a reader experiences problems in reading, one may implement a four-layer
HMDT model, in which a “cognitive state” node with two states – troubled and normal reading –
is linked to the “control” layer described above. Such an implementation allows modeling of
long-term changes in eye-movement patterns, in addition to the effects between adjacent eye
movements. The modular, hierarchical structure of SHARE minimizes the effects of model
extension on existing structures.
In addition to the HMDT structure, another important element of the SHARE architecture
is the use of Bayesian methods for estimating model parameters and conducting statistical
inferences. Unlike other commonly used methods such as maximum likelihood methods,
Bayesian methods provide a natural way to combine prior knowledge and observed data during
estimation (see Bernardo & Smith, 1994for Bayesian theory in general, and Bengio, 1999, and
Jordan et al., 1998, for an introduction to the use of Bayesian methods in stochastic modeling).
At least two aspects of the Bayesian method are attractive for the current application.
First, because the model will be fitted at the individual reader level, there may not be enough
data to reliably estimate all parameters using traditional methods. By using prior knowledge
(e.g., the distribution of parameters for third-grade readers), the Bayesian method is able to
stabilize estimations and deal with missing data naturally. The other advantage of the Bayesian
64
method is that it provides a way to adapt a generic model to an individual. One may start with a
model with parameter values based on the grade level, but as eye movements are collected, the
model parameters may be updated using the Bayesian method and the model gradually and
quickly becomes individualized. Few other methods provide flexibility like this.
To summarize, the objectives of the current research requires a probabilistic description
of reading eye movements, and the stochastic model based on IOHMM provides the
mathematical tool for modeling. The architectural and computational details of the current model
are discussed in the next chapter.
65
CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING
SHARE, a stochastic, hierarchical architecture for reading eye-movement, is designed to
mathematically describe reading eye movements. The rationales for choosing the IOHMM
framework have been laid out in the previous chapter. The current chapter focuses on the
specifications and the workings of the model.
Modeling Environment
The model was implemented using MatLab, with the Bayes Net Toolbox (BNT;
Murphy, 2001). BNT is an open source MatLab package that supports graphical modeling
(Jordan et al., 1998) and Bayesian inference (Bernardo & Smith, 1994; Heckerman, 1998), which
are two crucial elements of the SHARE model. The source code for the SHARE model is
available on request.
Modeling Data
The eye-movement data used for model fitting came from Miller & Feng (in prep.), in
which English- and Chinese-speaking children (third- and fifth-graders) and adults
(undergraduate students) were asked to read ordinary short stories on a computer screen. The
current study focused only on the English data. There were 20 third-grade students, 26 fifth-
grade students, and 30 adults, each reading 16, 18, and 27 pages of text, respectively. The stories
were selected to be at the children’s age levels (third- and fifth-grade levels, respectively); adult
readers read the children’s stories for comparison.
Eye movements were recorded using the EyeLink system, a video-based system with
sampling rate of 250 Hz and spatial resolution of 0.005°. Typical calibration-recalibration
accuracy is approximately 0.5° to 1°. The default saccade detection algorithm in the system was
66
used. Eye-movement recording was binocular, but data from only the left eye were analyzed.
Reading materials were presented on a 17-inch monitor in the standard VGA mode (640 x 480
pixels), 60-70 cm away from the reader. English materials were displayed in Espy Sans font, a
font optimized for screen display. Each letter subtended an average of 0.31 visual degrees or 7.9
screen pixels.
The whole dataset consisted of more than 140,000 fixations. Eye movement variables
such as gaze location, fixation duration, and saccade length were recorded, along with relevant
information such as word frequency (Francis & Kucera, 1982), word length (in letters and
pixels), and landing position within words (in pixels).
Structure of the SHARE Model
A graphical representation of the SHARE model is shown in Figure 10. Each node in the
graph represents a random variable. Nodes with rectangular boxes are discrete variables; nodes
with oval boxes are continuous variables. Clear nodes represent observed variables; the
shadowed box (FDC) represents a hidden variable. An arrow from one node to another shows
that the latter variable is dependent on the former; the lack of an arrow between two nodes shows
that the two nodes are conditionally independent. The circular arrows beside the ST and FDC
nodes signify temporal dependency, i.e., the value of a node at time t depends on that at time t-1.
There were eight nodes in the SHARE model, forming three layers.
The top three nodes form the input layer. Three variables represented linguistic, low-level
visual, and oculomotor input information to the eye-movement control layer.
1. FREQn is the word frequency (Francis & Kucera, 1982) of the currently fixated word.
Numerous studies have shown that word frequency affects fixation durations and saccade
67
programming (see Rayner, 1998, for a review). For computational simplicity29, frequency was
divided into three categories – less than 100 occurrences per million (L), between 100 and 1000
per million (M), and more than 1000 per million (H). The three categories had roughly equal
sizes. Although the cut-off point for the low frequency category – 100 per million occurrences –
was higher than that typically used for adult psycholinguistic studies (around 40 per million), it is
more appropriate for third- and fifth-graders.
2. WLENn+1 is the word length of the word following the one currently fixated30. The
length of the word in the right periphery has been shown to affect skipping rates (Kerr, 1992) and
landing position (McConkie et al., 1988), among other eye-movement parameters. As with word
frequency, word length was classified into three levels – less than 4 letters long (S), between 4
and 8 letters (M), and longer than 8 letters (L). By token or by type, there were more short words
than long words in the reading materials.
3. ECCENn is the eccentricity of the current fixation relative to the fixated word.
McConkie et al. (1988) and O’Regan (1990) have shown that refixation rate is a function of
landing position. Fixations that land at or near word centers are less likely to result in refixations
29 In general discrete variables are less computationally demanding in Bayesian network modeling. Although in the
current study the cut-off points are more or less arbitrary and probably not optimal, the discrete variables should
show qualitatively similar effects as the continuous ones. As the very first step it was more important to implement
a simple but working model than to perfect all details. In the future continuous input variables may be used to avoid
these arbitrary decisions.
30 In case the current word is the last word of a line, WLENn+1 is the length of the first word in the next line.
Although psychologically return sweep planning may be different from that of normal saccades, no special
68
than are eccentric fixations. In the current model, ECCENn was a binary variable: eccentric (E)
fixations were those that landed on the beginning or end quarter of a word; those that landed on
the central two quarters were central (C) fixations. This served as a simplified measure for
landing position effects.
The middle layer is the eye-movement control layer, which includes the saccade targeting
(ST), fixation duration class (FDC), and planned saccade length (PSL) nodes. The control layer
receives information from the input variables and probabilistically determines the target of the
next saccade and the category of the current fixation duration. These two eye-movement
commands are passed to the output layer to generate actual eye movements.
4. STt is the saccade-targeting node. In the current model, it was assumed to be directly
observable from data.31 It was modeled as a discrete variable with seven values, or “states,”
representing seven different kinds of saccadic moves – (a) regress two or more words32, (b)
regress one word, (c) refixate the current word, (d) move forward one word, (e) move two words
forward, (f) move three words ahead, (g) move forward four or more words. Each state was
associated with a probability, which was in turn conditioned on the values of the input variables
mechanism is implemented in the current model for simplicity.
31 It is a standard assumption that the word the eye lands on is the intended word. According to this assumption, the
value of ST is directly observed. However, the assumption ignores the possibility that the eye missed the intended
target because of oculomotor errors (McConkie et al., 1988). In the current model, I chose to ignore these cases
during model fitting because it greatly simplified computation. These cases were dealt with in simulations.
32 Because only around 1-1.5% saccades were regressions longer than 2 words, these were combined with 2-word
regressions. For the same reason, forward saccades longer than 4 words (about 1% for children, 2% for adults)
69
and the previous value of ST (STt-1). In other words, the probability of each movement might go
up or down depending on the current linguistic, visual, and oculomotor information, as well as
the last saccadic move. The ST node achieved this by keeping track of all combinations of the
input variables. Internally, it had a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 7 (STt-1) x 7
(STt) = 882 probabilities, 144 (2x2x1x6x6) of which were free parameters. How these
parameters were adjusted during model fitting will be discussed in the next section.
Modeling saccade targeting as a discrete, word-based process is consistent with
McConkie et al. (1988; McConkie et al., 1994) and many other theories (e.g., O'Regan, 1990;
Rayner & Pollatsek, 1989; Stark, 1994; Suppes, 1990; but see Legge et al., 1997, and Shillcock
et al., 2000). Unlike models that assume a default word-by-word reading strategy (e.g., Morrison,
1984; O'Regan, 1990; Reichle et al., 1998), the current model assumes that each word within the
window of ST node has a certain probability of being fixated, and the actual decision is made
probabilistically. It also differs from the two previous Markov models (Stark, 1994; Suppes,
1990). In Suppes’ model WHERE and WHEN decisions were made independent of previous eye
movements. The current model extends it to represent dependencies between consecutive eye
movements. One problem with Stark’s model is that by making every word a potential target at
any moment, the model has a necessarily large transition matrix that contains mostly near-zero
probabilities, making probability estimation very difficult. In contrast, the current model uses a
local representation – only words near the current fixation are considered, which allows more
accurate estimation.
were combined with 4-word forward saccades.
70
5. FDCt represents the fixation duration category of the current fixation. As shown in
Appendix B, fixation duration could be modeled as a mixture of three lognormal distributions.
The FDC node controlled the mixture rate. It was modeled as a discrete random variable with
three states – short (S), medium (M), and long (L) fixation. The FDC was a hidden node because
its state was not directly observable. Its value was probabilistically inferred (estimated) from
observed fixation duration. Like the ST node, the probability of making a short, medium, or long
fixation was conditioned on the input variables and the previous fixation duration category
(FDCt-1). Internally, it kept a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 3 (FDCt-1) x 3
(FDCt) = 162 adjustable probabilities, 16 of which were free parameters.
6. PSLt is the planned saccade length, which is the distance (in pixels) from the current
fixation location to the center of the intended word. It was modeled as a continuous random
variable. It was an observed node during model fitting, because it was calculated from empirical
eye-movement data. Therefore, the arrow between STt and PSLt should be ignored during model
fitting. During simulations, it was computed based on the current fixation position and the
coordinates of the intended word, which was determined by the value of the ST node. The arrow
with a dotted line between STt and PSLt signifies this dependency during simulation.
At the bottom of the figure is the output layer of the model, which includes SACCt and
DURt nodes. They take commands from the eye-movement control nodes and “execute” eye
movements. Both of the variables were continuous, corresponding directly to what would be
measured by an eye-tracker.
7. SACCt is the saccade to be carried out at the end of the current fixation t. It is
measured in pixels in the current model. A positive number corresponds to a saccade to the right
71
of the current fixation position. Normally this means a forward saccade, but under rare
conditions it would also be a regressive saccade going from the beginning of a line to the end of
the last line. Conversely, a negative number typically means a regression, except for return
sweeps, in which the eye goes from the end of a line to the beginning of the next.
Following McConkie et al. (1988)33, SACCt was assumed to follow a normal distribution,
whose parameters – mean and variance – were determined by the STt and PSLt nodes. More
specifically,
mean(SACCt)= ai + bi * PSLt , and
var(SACCt)= si ,
where i (i= 1..7) corresponds to the current state of the ST node, PSLt is the currently intended
length of saccade, and ai , bi, and si are constants estimated during model fitting. In other words,
the SACCt node kept a different set of parameters (ai , bi, and si ) for each type of saccade move.
Note that the current parameterization was a simplified version of McConkie et al.’s results34. In
the current model no assumption about the variance for each saccade move (which determined
the planned saccade length) was made; it was left for the model to learn from data.
33 Using the notations in E-Z Reader model (Reichle et al., 1998; see Chapter 2), McConkie et al.’s formula for
landing position may be reformulated in terms of mean saccade length:
Mean Saccade Length PSLbaPSLPSLPSL mmbmb ⋅+=⋅Ψ−+Ψ⋅Ψ=Ψ⋅−Ψ+= )1()(
34 Some factors, for example word length, were not taken into account. In addition, McConkie et al. estimated that
the variance of the landing position distribution was a cubic function of launch sites (PSLt). This cubic function is
not implemented in the current model because the scatter plot in their paper (Figure 4) showed that the cubic trend
was not strong.
72
8. DURt represents the logarithm of the duration of the current fixation. DURt followed a
normal distribution, with a different mean and variance for each state of the FDCt node. More
specifically,
mean(DURt)= ai, and
var(DURt)= si,
where i (i= 1..3) corresponds to the current state of the FDC node, and ai and si are constants
estimated during model fitting.
Over the long run the output of the DURr node would be a mixture of three normal
distributions because of the three different set of parameters. The exponent of the DUR variable,
consequently, would follow mixture-of-lognormal distribution, which has been shown to be a
good model of the distribution of fixation durations (Appendix B). During model fitting the
empirical fixation duration was first log-transformed. In the simulation the reverse
transformation (exponential) was applied to the output values of the DUR node.
In addition to the nodes, the arrows in the figure were equally important to the structure
of the model. They represented the direction of causality in the model (Heckerman, 1998; Perl,
2000). In particular, the current model assumed that both WHEN and WHERE decisions were
affected by the three input variables – FREQ, WLEN, and ECCEN. The strength of these factors
was to be estimated from empirical data.
From the control layer to output layer, the current model assumed that the WHERE and
WHEN pathways are (conditionally) independent. There was no arrow between ST and FDC
nodes, ST and DUR nodes, or FDC and SACC nodes. The model also excluded any cross-
pathway connections from fixation t to fixation t+1. These independence assumptions were
73
made to simplify model conception and computation. However, this did not imply that SACC
and DUR nodes are independent. On the contrary, statistically and conceptually, saccade length
and fixation duration in the current model were correlated because they both shared the same
“parents” – the input nodes. If a close examination of the model shows that the empirical relation
between saccade length and fixation duration cannot be captured by the current model structure,
some of the independence assumptions may be relaxed.
Temporal Dynamics
SHARE modeled three kinds of variation in reading eye movements – (a) the inherent
randomness of perceptual, cognitive, and oculomotor processes, (b) the variation of the current
linguistic and other input, and (c) the time-dependency of the eye-movement process. The first
two were captured by the hierarchical, probabilistic model structure. The time-series nature of
eye movements was modeled with the temporal links (the two self-pointing arrows beside ST
and FDC nodes) at the eye-movement control level.
Like other arrows, the self-pointing arrows indicated that the state of the random
variables (ST or FDC) at time35 t was dependent on that of t-1. Conditioned on the input nodes,
the ST and FDC nodes followed a first-order Markov model. The model used this short-term
temporal dependency to approximate possibly complex time-series effects in eye-movement
programming. Given that most temporal effects reported (e.g., the spill-over effect, optimal
viewing position effects) are in fact confined to consecutive eye movements, it was expected that
the first-order Markovian process should capture most of the temporal dynamics in reading eye
35 Eye movements were treated as discrete time events.
74
movements.
Model Fitting and Parameter Learning
Three features distinguished the fitting of the SHARE model from the modeling efforts
reviewed previously. First, the model was completely individualized, which means that every
parameter was adjusted so that the model best captured the reading eye movements of a
particular reader. Wide ranges of individual differences in reading eye movements have been
well documented for over a century. One of the goals of this research was to find a way to fully
describe these differences. I did not attempt to construct age-group-average models because
without an understanding of the differences between individuals, a group average model would
be impossible to interpret.
In addition, the model parameters were not estimated from a set of statistics computed
from eye movements, as all previous models have done. Instead, the present model was fitted to
the raw data. In other words, every fixation and saccade a reader made was used to adjust, or
“train,” model parameters. The goal of the model fitting process was to maximize the overall
goodness of fit of the model. The goodness-of-fit index used here was the log-likelihood of the
model, which is the logarithm of the probability of the data being produced by the model.
Finally, the Bayesian method was employed to achieve the above two goals. A critical
challenge for fully individualized modeling is that there may not be enough data to reliably
estimate all parameters. For example, the overall probability of making a 5-word forward
saccade was often less than 0.05 for third-grader readers. If a child made 2,000 fixations, there
would be fewer than 100 in this category. Further divide these 100 fixations by the number of
combinations of FREQ, WLEN, ECCEN, and STt-1 nodes, which is 126, and some of the cells
75
were bound to be empty. Thus, estimating parameters of these cells would have been impossible.
Conceptually, a sensible way to deal with this situation is to estimate them with group
averages – when data from many readers are pooled together, hopefully these parameters become
estimable. The Bayesian method is uniquely suited to implement this intuition. With the
Bayesian method, we first impose a prior probability distribution, centered at the group average,
over the parameter we want to estimate. The prior probability distribution represents our belief or
knowledge about the value of the parameter. When there is no observed evidence regarding this
parameter, the posterior probability distribution is simply the prior distribution, and our best
guess in this case is the group mean. In addition to these trivial cases, the true power of the
Bayesian method is its ability to estimate posterior probability distribution when there are limited
observed data, in which case the combination of prior knowledge and empirical data narrows
down the posterior distribution, resulting in accurate parameter estimation (see Bernardo &
Smith, 1994, and Smyth, Heckerman, & Jordan, 1996, for the Bayesian methods). Therefore, in
the current model priors were used in estimating all parameters.
The fitting of an individual SHARE model involved two major steps – (a) specifying the
prior distributions for each parameter and (b) looping through eye movements of a reader and
adjusting the parameters according to the Bayes rule.
Specifying prior distributions. Because the input variables FREQ, WLEN, ECCEN, and
PSL (during model fitting) were observed, they were not estimated and therefore did not need
priors.
The prior distributions for parameters of the ST node were assumed to follow Dirichlet
distributions (the most common prior distribution for discrete variables; see Bernardo & Smith,
76
1994, and Murphy, 2001). The parameters of the Dirichlet distributions were determined in the
following way. First, the overall probabilities of the seven saccadic moves (see previous
discussion on the ST node) were calculated over the whole age-group dataset36. This set of
probabilities was replicated 126 times to fill all combinations of FREQ, WLEN, ECCEN, and
STt-1, and these 882 probabilities were set as the parameters of the 126 Dirichlet distributions.
The above steps defined our a priori knowledge about the individual reader – we assumed that
the reader was an average reader of his/her age group, and that none of the input factors had any
effects on his/her saccade programming.
The prior distributions for the FDC node were also Dirichlet distributions, but their
parameters were estimated differently from that of the ST node because FDC was unobservable.
The first step was to estimate the overall probabilities of making short, medium, or long
fixations. This was done by fitting the reader’s fixation duration to a simple Gaussian-mixture
model (McLachlan & Peel, 2000), as in Appendix B37. Once the personalized overall
probabilities were estimated, they were copied 54 times to fill all combinations of FREQ,
WLEN, ECCEN, and FDCt-1, and these 162 probabilities were set as the parameters of the 54
Dirichlet distributions. This was equivalent to the assumption that neither the input variables nor
the previous state of the FDC node had any effect of the current state of FDC.
There were three parameters for the SACC node – the intercept (ai), the slope, (bi), and
36 These simple probabilities would be all the information necessary for a zero-order Markovian minimal-control
model (Suppes, 1990, 1994).
37 Note that the fitting of the Gaussian-mixture model itself involved Bayesian modeling, where its prior was set to
a Dirichlet distribution with group averages as parameters.
77
the variance (si), all of which were conditioned on the state of STt. The SACC node itself
followed a normal distribution whose mean was determined by ai, bi, and PSLt. The priors were
assumed to follow normal-gamma distributions (the most common prior distribution for normal-
distributed random variables; see Bernardo & Smith, 1994). The initial values of ai, bi, and si
were estimated using regression analyses of all eye-movement data from the appropriate age
group. For example, to obtain estimates of the intercept, slope, and variance for refixations, all
refixations in the age group were entered to the regression model
SACC = a + b * PSL,
and the estimated parameters were used as the parameters for the prior distribution for
refixations.
Finally, the DUR node also followed a normal distribution, but its parameters were
assumed to be “clamped,” meaning that they were fixed and were not adjusted during model
fitting. The reason to clamp the parameters was to be consistent with the 3-component
lognormal-mixture model (Appendix B). If the mean and variance were allowed to change under
different combinations of the input variables, the resulting distribution of the DUR node would
be a mixture of many normal distributions rather than a 3-component normal mixture. The values
of the parameters (means and variances) were estimated as a by-product of estimating the prior
distribution for the FDC node. In fitting the (personalized) Gaussian-mixture model, the mixture
rate was used as the prior for FDC, and the estimated mean and variance for each component
normal distribution were set as fixed parameters for the DUR node. Therefore, although the DUR
parameters did not change in model fitting, they were still fully individualized.
Bayesian parameter estimation. Once the priors were set, the model was ready to be
78
trained with empirical eye-movement data. An exact inference version of the Boyen-Koller
inference algorithm for dynamic belief networks (Boyen & Koller, 1998a; Boyen & Koller,
1998b; see Murphy, 2001for implementation details) was used. The technical details of the
algorithm will not be discussed here. Conceptually it looked for the maximum posterior
probability solution given the prior distribution and data (Cowell, 1998a; Cowell, 1998b;
Heckerman, 1998). The iterative algorithm stopped when the improvement of the goodness of fit
index – the log-likelihood – was under a threshold. The chance of stopping at a local maximum
instead of global optimum solution was minimized by both the use of reasonable prior
distributions and using multiple (3) runs with different random seeds in estimating the Gaussian-
mixture model.
Model Adequacy and Comparison
From the perspective of an empirical researcher, it is natural to ask the question of
whether a model is adequate. However, the question is difficult to answer in the absolute sense.
Statistically, it is more sensible to compare the relative goodness of fit of different models. The
ultimate answer to the question depends on one’s goals.
The adequacy of the SHARE model is addressed in two ways. First, compared to various
reduced versions of the model, the complete and trained model gained significantly in likelihood
ratio tests. The improvements were examined separately for the WHEN and WHERE pathways,
because they were conditionally independent and the overall log-likelihood was the sum of the
log-likelihood indices for the two channels. Likelihood ratio tests were performed for each
individual and the following findings held for each individual reader.
For the WHEN pathway, there was a statistically significant gain in goodness of fit of the
79
simple Gaussian-mixture model when the parameters were individualized. When the Gaussian-
mixture model was further compared with the full SHARE model (WHEN pathway only) that
took into account the input variables and temporal dynamics, there was a statistically significant
gain by the latter. Similarly, the complete WHERE pathway was shown to be statistically
superior to a model that assumed no individual differences, no effects from the input variables,
and no temporal connections. Together, these results suggest that the more complex structure in
SHARE is necessary to account for reading eye-movement data, and its performance was better
than some simple models of eye movements.
Because the emphasis of the present research is to establish the SHARE architecture, a
comprehensive analysis of the model is beyond the scope of the current report. Future studies
will address some important issues, such as the relative contribution of different input variables
to the two pathways and whether some interaction between the two pathways would increase
model fitting. The next chapter will focus on simulation studies of the SHARE model and
compare eye-movement behaviors of the model to real readers.
80
CHAPTER 5. SIMULATION RESULTS
The Markovian structure of the SHARE model is very suitable for running simulations.
The model took a text, coded in terms of word frequency, word length, and the x-coordinates of
the beginning and end of the words, “read” through it according to its parameters, and stopped
reading when it reached or passed the last word of the text.
In the simulation study, each individualized model read through the same texts that the
corresponding human reader had read. Eye-movement characteristics of the reader and the model
were compared.
Simulation Method
Materials. Preparing reading materials for the model was straightforward. Each word in
the texts used in Miller and Feng (in prep.) was simply coded with four variables – FREQ,
WLEN, x1, and x2. The latter two simply marked the horizontal position of the word in screen
coordinates38. FREQ and WLEN were defined in the last chapter (see Figure 10).
Procedures. Model parameters of the particular reader were loaded. For each trial, the
model was assumed to always start with a fixation on the first word. Other parameters were
initialized as follows39: STt=0 was set to “forward 1 word,” FDCt=0 was set to a medium fixation,
and ECCENt=1 was “central fixation.”
With these initial values and the values of the input variables FREQN=1 and WLENN=2,
SHARE was able to find the appropriate STt=1 and FDCt=1. For example, STt=1=x was the
38 In the Miller and Feng (in prep.) study there were multiple lines of text per screen. However, the y-coordinates
are not interesting in reading except for distinguishing lines. They were trivial to model and were not included here.
81
conditional probability:
P(ST=x| STt-1=STt=0, FREQ=FREQN=1, WLENN+1=WLENN=2, ECCEN=ECCENt=1)
All combinations of these probabilities were estimated and stored internally in a parameter table
in the ST node. Therefore, finding P(STt=1=x) was simply a table lookup with the values of the
input variables and the previous ST as indices. The procedure for finding P(FDCt=1=x) was
similar.
The next step was to generate eye-movement commands. The value of the ST node was
randomly generated from the discrete distribution P(STt=1=x), where x was one of the seven
possible saccadic moves. The resulting random sample indicated the target word for the next
saccade. Similarly, the value of the FDC node was also randomly generated, which was the
category of the current fixation duration. An additional step in the WHERE pathway was to
calculate the value of the PSL node. The planned saccade length was the displacement between
the current position of the fixation and the center of the targeted word, as indicated by the current
value of ST. The calculation of PSL from ST was completely deterministic.
Next, the eye-movement commands were passed down to the WHEN and WHERE
pathways for execution. For the WHEN pathway, the conditional mean and variation of the DUR
node, given the current FDC value, were retrieved from the table of parameters stored in the
DUR node. Then a random sample from the normal distribution specified by the conditional
mean and variation was drawn. The exponent of this random sample was the duration of the
current fixation. The processes in the WHERE pathway were similar. The SACC node was also
39 Hereby t represents the current fixations, and N represents the current word number.
82
assumed to be a normally distributed variable, whose mean and variance were determined by the
current values of ST and PSL nodes:
mean(SACCt)= ai + bi * PSLt , and
var(SACCt)= si ,
where i is the current value of ST, ai, bi and si are parameters associated with i that were
estimated during model training. The actual saccade length was a random sample from the
normal distribution specified above.
Now the first fixation on a page had terminated and the first saccade had been made.
Some information needed to be updated at this point. Now, t=2, and N=N+STt=1 (i.e., the current
word was set to the targeted word; see below for exceptions). The ECCENt=2 was computed as
specified in the last chapter. The FREQ and WLENN+1 values were also updated. With all values
of the input nodes updated and the past values of ST and FDC nodes available, the model was
ready to repeat the above process and generate the next fixation duration and saccade move. The
process would iterate until the targeted word in ST node was beyond the last word in the text.
Problems arose when the difference between PSL and SACC was so large that the next
fixation would land on a word other than the targeted word. In this case the model simply took
the fixated word as if it were the targeted word, and calculated ECCEN, FREQ, etc. based on the
actual fixated word. Other treatments were possible but not explored here. If, after a regression,
the “eye” was sent to a word before the first word, it was simply redirected to the first word.
Ten simulation trials were run for each of the 76 individualized models, including both
children and adults, with different random seeds. The “eye movements” and the corresponding
word information were recorded for further analyses.
83
Distributions of fixation durations
The upper left panels of Figures 11-1 through 76 (one for each participant) show the
frequency distributions of empirical and simulated fixation duration. Note that the simulated
frequencies were divided by the number of simulation trials (10) so as to be with the same scale
as the empirical figures. In general, the simulated data appeared to follow empirical distributions
closely and was responsive to individual differences.
A formal statistical test of the hypothesis that two distributions are identical is the
Kolmogorov-Smirnov (K-S) test (Birnbaum, 1952; Conover, 1999; Hall & Wellner, 1980). The
K-S statistics involves calculating a critical value, w1-α, which is a function of the confidence
level α. If at any point along the distribution, the cumulative distribution function of another
distribution is more than w1-α away from that of the sample distribution, we reject, with
confidence level α, the hypothesis that the other distribution is the population distribution of the
sample. For large n Hollander & Wolfe (1999) introduced an approximation formula:
w1-α = n2
)2/ln(α− .
For α=0.05 and n=1000 (most readers have between 1,000 and 2,000 fixations), w1-α is
approximately 0.043.
The K-S test can be carried out visually. The lower left panels of Figures 11-1 through 76
show the cumulative distribution functions of empirical and simulated fixation duration. A
vertical bar at the top-left corner of each figure shows the magnitude of w1-α for that particular
reader. If the vertical difference between the two cumulative distribution functions exceeds the
length of the bar, SHARE is not a statistically adequate model of fixation duration. In fact, none
84
of the 76 individual simulations differed statistically significantly from the empirical data.
Distributions of Saccade Length
The empirical and simulated distributions of saccade length were compared for each
individual model with the same procedures as for fixation duration. Frequency distributions for
saccade length were shown in the upper right panels of Figures 11-1 through 76. The simulated
frequency distributions appear to fit fairly well with the empirical data. The model was able to
generate return sweeps as well as progressive and regressive saccades in approximately correct
proportions. Cumulative distribution functions were shown in the lower right panels of Figures
11-1 through 76. The small vertical bar at center-top of each figure represents the magnitude of
the w1-α for the reader. The simulated distributions, the smoother curve, also appear to follow the
empirical distributions closely. However, the K-S tests showed that in each of the 76 trials the
simulated distribution was statistically significantly different from the empirical one.
Three systematic discrepancies were apparent in the frequency distributions and
cumulative distributions. First, the model sometimes failed to show the dual-peak structure near
zero in the some of the empirical data. The saddle around zero indicated that readers were
unlikely to make very small saccades. This is consistent with O’Regan’s (1990) finding that a
refixation tends to land on the opposite size of the word from the previous landing position.
Interestingly, however, not every reader showed the fine structure (adult readers were more
likely to show the saddle), and the model was able to demonstrate the saddle in some cases. This
suggests that the model was able to capture the phenomenon, but some of the parameters were
probably not optimized.
In addition, the model slightly but consistently overestimated the longest saccades, which
85
was evident in the lower right panels of Figures 11-1 through 76, where the simulated
distribution function was consistently lower than the empirical curve near the top of the chart.
The likely cause of the problem is that the variance parameter in SACC node was overestimated
for the “four words or longer forward saccade” category. This category had relatively few but
very heterogeneous cases, which tended to lead to an unstable variance estimate. It is also
possible that in these cases the landing position distributions might no longer follow a normal
distribution but instead a skewed distribution. This would also lead to elevated variance
estimates under the normal distribution assumption. Future research is needed to explore ways to
model this heterogeneous category by using non-normal distributions.
Lastly, the model predicted a small but visible number of between-line regressions –
extremely long saccades involving regressions from the beginning of a line to the end of last line.
These saccades did occur in data, but were not as frequent as the model suggested. Between-line
regressive saccades did not require any special mechanism in the present model. The ST node
generated a regression command without knowing in which line the word was located. If the
target happened to be in the previous line, a long, between-line regression was generated. Thus,
the frequency of this type of regression was no different from regular regressions, according to
the current model. However, the empirical data suggest that the frequency of between-line
saccades is lower than expected. In the future information such as line number may be added to
the model to suppress these between-line regressions.
SHARE in Conventional Eye-movement Measures
To relate the SHARE model to traditional eye-movement theories, and to demonstrate its
ability to capture moment-by-moment processes, the following analysis compared simulated and
86
empirical eye movements in terms of some conventional eye-movement measures.
The structure of the analysis was intentionally borrowed from the E-Z Reader modeling
(Reichle et al., 1998). Reichle et al. classified words into five frequency categories and
summarized eye-movement data using six word-based measures. The measures were: (a) first
fixation duration, (b) single fixation duration, (c) gaze duration, (d) skipping rate, (e) the
probability of making a single fixation on the word, and (f) the probability of making two
fixations on the word.
In the current analysis, the same procedure was followed, except word frequency was
coded in three levels (as part of the model specification) instead of five. But instead of predicting
one set of group means, the current model had to predict 76 sets of individual means. This was a
more stringent test because the model needed to accommodate a wide range of individual
differences – from beginning readers to adults. The added degrees of freedom also made the
results more interpretable, in view of the collinearity problem in these measures (see Appendix
A).
Figures 12 through 17 compared the simulated and empirical values of the six measures.
Each point represents an individual mean. As seen in the figures, not only were the empirical and
simulated values highly correlated, but the model also reproduced the absolute values with
reasonable accuracy. It is worth noting that in Figure 17, the probability of making double
fixations on a word had a fairly restricted range, and yet the model was still able to predict the
values.
On the other hand, there were some systematic differences between the simulated and
empirical data. For example, the model was able to reproduce fairly closely the probability of
87
making single fixations on a work (Figure 16), but consistently (although only slightly) under-
predicted single fixation duration (Figure 15). The model did not have a special mechanism to
program a single fixation, and therefore its single fixation duration means should be identical to
first fixation duration. The simulation results suggested that average fixation duration increased
when only one fixation was made on a word, compared to cases with multiple fixations. Future
research is needed to examine whether the increase of mean fixation duration is a result of
change in the weights of the fixation duration categories or in the means of these components.
Overall, the analysis showed that, with few assumptions about mechanism, the SHARE
model was able to reproduce eye-movement details, measured by conventional eye-movement
measures. Furthermore, SHARE provided a set of terminology, such as saccade targeting
probabilities and fixation duration categories, that can reproduce eye-movement distributional
data for individuals. Because the parameters of this model are more tractable and accessible than
the raw distributions, this can be an important step toward developing an empirical methodology
for implementing and evaluating the claims of contending models of eye-movement control in
reading.
Summary
Once a SHARE model was trained with an individual reader’s eye movement data, it
captured the essence of the data and encapsulated it in the model parameters. Given the same
reading materials, the model could reproduce eye movements that were quantitatively similar to
the original empirical data, as the above simulation study demonstrated.
The simulation also showed that the SHARE architecture was able to adapt to beginning
and skilled readers. In addition, the Markovian structure at the control level of the model
88
naturally accounted for temporal dynamics in reading. When assessed using the conventional
eye-movement measures, the model was able to quantitatively reproduce empirical values.
Compared to many existing models, the graphical model is simple and its statistical
characteristics are well understood. Therefore, the SHARE structure is suitable as a general
platform of communication in the field of reading eye movement research.
The simulation and the analyses only illustrated a small portion of the potential of the
SHARE architecture. For example, it would be interesting to study its ability to predict the next
eye movement on the basis of eye movements that a reader has already made. The simulation
also suggested several aspects of the current implementation of the model that need refinement,
including the handling of refixations in some readers and the issue of single fixation duration.
The next chapter will show how analysis of the parameters of individual SHARE models can
shed light on what aspects of eye-movement control develop as children become more skilled
readers.
89
CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS
The previous chapter has shown that the SHARE model is able to capture a wide range of
individual differences in reading eye movements. It may also prove useful in capturing in a
concise manner developmental differences in reading eye-movements, which will in turn provide
the basis for theorizing about what cognitive processes change with the acquisition of reading
skill.
Previous Research on the Development of Reading Eye Movements
How do eye movements change with age and reading proficiency? A few studies have
investigated this question, and most of them have reported only global statistics. Table 1 (from
Table 4 in Rayner et al., 1998) summarized some global measures of reading eye movements
from previous studies (Buswell, 1922; McConkie et al., 1991; Rayner, 1986; Taylor, 1965).
Mean fixation duration declines with age, although the absolute values of the means and the
range of developmental changes vary among studies. Developmental changes in saccade patterns
are more difficult to describe. Based on the incomplete list of two variables in Table 1, skilled
readers cover the same text with fewer fixations, although it is not consistent across studies
whether proficient readers have fewer regressions than beginning readers.
The only study that went significantly beyond global statistics is McConkie et al. (1991).
McConkie et al. examined distributions of fixation durations for first- through fifth-grade
students. Three findings were evident from the distributions. First, fixation duration distributions
typically had a single mode at approximately 180 msec, regardless age. Therefore, what drives
the developmental changes in mean fixation duration appears to be the right tails of the
distributions. In addition, there were substantial individual differences in the distributions of
90
fixation durations, especially among beginning readers. Lastly, McConkie et al. also showed that
the means and higher moments of fixation duration distributions were strongly correlated with
reading abilities.
With regard to saccade control, McConkie et al. (1991) found that first-grade students
showed distributions of landing positions similar to those of adults (McConkie et al., 1988).
Another eye-movement characteristic shared by beginning and skilled readers is within-word
refixations. McConkie et al. (1988, 1991) demonstrated that the probability of making a
refixation on a word is a U-shaped function of the landing site of the initial fixation on the word.
McConkie et al. (1991) also showed that the probability of skipping a word as a function of
saccade launching site increases with age, and the forms of the functions at different grades
resemble adult data (McConkie et al., 1994). Thus, it appears that many of the basic mechanisms
of eye-movement control in reading English are in place after a year of reading experience,
possibly even before any formal reading instruction.
Developmental Analyses Using SHARE
To the extent a SHARE model can simulate individual readers’ eye-movement patterns,
developmental differences in reading eye movements can be studied by analyzing parameters of
individual models. McConkie et al. (1991) showed that developmental changes are more
complicated than what can be described by global measures such as mean fixation duration. The
SHARE architecture is particularly suitable for studying these complex changes, because it
provides a rich set of structures and parameters to describe these differences and is able to
closely simulate readers’ eye movements, as shown in the previous chapter.
This chapter focuses on two developmental issues – the changes of eye-movement
91
control across age, and the changes of the effects of linguistic, perceptual, and oculomotor
factors on eye-movement control. These correspond to two levels in the SHARE model, namely
the control layer and the relationship between the input and the control layer. Age differences in
individual parameters of these layers are analyzed. Grouping by age risks overlooking
meaningful within-group differences, as age is only a crude indication of reading skill. In the
absence of an independent reading proficiency measure, reading speed (measured in words per
minutes, WPM) was used as an indicator of readers’ reading proficiency. Past research has
shown that reading speed is highly correlated with standardized reading test scores.
Development of Reading Eye-movement Control
One of the core assumptions of the SHARE model is discrete control of eye movements
in the control layer. The probabilities of making each eye-movement command – for example
“forward 2 words” or “long fixation” – form the basis of individual readers’ eye-movement
control strategy. The following analyses explore developmental differences in controlling
saccades and fixation duration.
Saccade targeting. Of the seven potential saccade targets in the ST node, what is the
probability of selecting a particular target? Figure 18 shows the probabilities40 of making
regressions (ST=-1 or –2*), refixations (ST=0), progressing one word (ST=1), and progressing
more than one word (ST= 2, 3, or 4*) as a function of age group and reading speed. Some
categories were combined to simplify data presentation.
40 These are the unconditional probabilities, i.e., ignoring the effects of word frequency and alike. They were
computed by collapsing the multidimensional frequency tables in the ST node into a single dimension table.
92
The probability of regressions did not differ across age, F(2, 73)=1.25, p=1.86,
MSE=0.0018. Regression rates were around 15% for all age groups, which is remarkable given
that the adult readers were reading simple, elementary-school-level stories. Some college student
reading as fast as 600 words per minute made 25% regressions, more than any third-grade
student did.
Refixation rates showed a significant decrease with age, F(2, 73)=105.3, p<0.001,
MSE=0.0036. A post-hoc comparison with Bonferroni adjustments showed that each age group
was significantly different from others.
The probability of progressing one word showed a significant decrease with age, F(2,
73)=12.1, p<0.001, MSE=0.0039. A Bonferroni post-hoc comparison showed that while both
third- and fifth-grade groups differed significantly from adults, they did not differ significantly
from each other. The magnitude of the difference was rather small – approximately 32% for
children versus 25% for adults.
Finally, the largest developmental difference was an increase in the probability of
progressing two or more words, F(2, 73)=134.4, p<0.001, MSE=0.0052. Each age group differed
significantly from others.
To summarize results on the developmental patterns of saccade control, the largest
differences between beginning and skilled readers lie in the tradeoff between making refixations
and making long (2 or more words) forward saccades. Comparatively, the differences in
regression rate and the probability of moving forward one word at a time are small.
Fixation duration. According to the present model the distribution of fixation durations is
a mixture of three components, each of which follows a lognormal distribution. Developmental
93
changes in the proportions, modes, and variance of the components are analyzed below.
Figure 19 shows the proportions of the three types of fixations. There was no significant
age effect on the probability of making short fixations, F(2, 73)=2.157, p=0.123, MSE=0.0021.
The probability of making long fixations showed a significant decrease with age, F(2, 73)=27.3,
p<0.001, MSE=0.0056. A post-hoc test showed that third- and fifth-grade students did not differ
significantly from each other, but both differed significantly from adults. Because they add up to
1, the probability of making medium fixations also had a significant age effect, increasing with a
age, and similar post-hoc results.
Figure 20 shows the modes (corresponding to the means of the logarithm of fixation
durations in the model) of the three components of fixation durations as a function of age and
reading speed. Overall, the largest change appears to be the decrease of long fixation modes with
age and reading speed.
For short fixations, the average mode increased slightly with age (from 62 msec to 67
msec to 73 msec) although the difference was statistically significant, F(2, 73)=8.591, p<0.001,
MSE=0.0080. The differences between the children were not significant in a post hoc test, but the
child-adult difference was significant.
There was also a significant but small age effect in the modes of medium fixations, F(2,
73)=14.8, p<0.001, MSE=0.0034. The drop from 202 msec (3rd grade) to 198 msec (5th grade)
was not significant, but the mode for adults, 179 msec, was significantly lower than either of the
children groups.
A strong age effect was observed for long fixations, F(2, 73)=35.9, p<0.001,
MSE=0.0017. Again, third-grade (319 msec) and fifth-grade (292 msec) values did not differ
94
significantly, but both differed significantly from that of adults (221 msec).
Lastly, variances of the three components as functions of age and reading speed are
shown in Figure 21. There was no significant age effect for the variance of short fixations, F(2,
73)=1.52, p=0.225, MSE=0.0028. The variance of medium and long fixations decrease
significantly with age, F(2, 73)=11.80, p<0.001, MSE=0.0004, and F(2, 73)=17.74, p<.001,
MSE=0.0014, respectively. In both cases, the two young age groups did not differ significantly
from each other but both differed significantly from adults. Again, the largest age-related
difference is the decrease in the variance for long fixations.
The above analyses of the control layer parameters provide some new pieces of
information to the understanding of reading development. Regarding saccade programming,
beginning and skilled readers differ not in regression rate, or in the overall probability of making
forward saccades. The developmental change is rather specific – skilled readers tend to make
fewer refixations, and make more rather long forward saccades (two words or more).
With respect to fixations, findings from the present study concur with McConkie et al.
(1991)’s observations that the modes of fixation duration distributions do not change much with
age but the tails of the distributions becomes less heavy. In addition, the discrete FDC node in
the SHARE model provides a quantitative description of these developmental changes. By
decomposing the overall distributions into three components, it is shown that the characteristics
of the briefest fixations do not change substantially with age. The medium fixation component,
corresponding to the modes of the distribution, becomes slightly shorter and denser, but the
changes are small compared to the third component. What really accounts for the developmental
changes is the third, long fixation component – its proportion, mode and, variance decreased
95
substantially with age.
Effects of Input Variables on Eye-movement Control
The above analyses ignore effects of input variables on the control layer. However,
Chapter 4 has shown clearly that these input variables contribute significantly to the explanatory
ability of the SHARE model. Their effects are investigated below.
Under the present implementation, the input variables were represented as ordinal
discrete variables (e.g., low, medium, and high frequency; although in the future they may be
continuous). Therefore their relations with the control nodes – ST and FDC, which are also
discrete and ordinal – are represented in multidimensional contingent tables. The current report
will focus only on the main effects of each individual input variable, that is, only the two-
dimension contingent table between an input variable and a control node will be analyzed.
Interactions between these variables will be explored in future research.
The strength of association of a two-dimensional contingency table is summarized using
Goodman and Kruskal’s Gamma (1954; 1963; see also Agresti, 1990). Gamma, a scalar ranging
from -1 to 1, measures the association between two ordinal, discrete variables. It is defined as the
difference between numbers of concordant and discordant pairs divided by the sum of the two
counts, where discordant pairs are cases where the two variables vary in opposite directions, and
concordant pairs are cases where the two variables change in the same direction (ties are
excluded; for mathematical definition, see Agresti, 1990). Goodman (1963) showed that a
Gamma computed from a sample follows an asymptotic normal distribution, whose mean is the
population Gamma and variance is a complex function of the frequencies of concordant and
discordant pairs. In the following analyses, the effect of an input variable on eye-movement
96
control is represented with the corresponding Gamma. The present report concentrates on two
issues related with development: the proportion of readers in each age group showing
statistically significant effects and the change of the absolute values across age. The input
variables include word frequency, the length of the next word, landing position of the current
fixation, and the state of the previous eye movement (i.e., the previous values of ST and FDC
nodes).
Input variables and saccade programming. Figure 22 shows the effects of the four input
variables on saccade targeting (the ST node). The two horizontal lines in each graph mark the
95% confidence interval of the Gamma41. In other words, data points that fall between the lines
were not statistically significantly different from zero.
Word frequency has a significant effect on saccade programming for nearly all young
readers but for fewer adults. The Gammas were significantly different from zero for 95% of
third-grade and 93.3% of fifth-grade students, but only for 67.7% of adults. A Chi-square test
showed significant age effect, χ2(2)=9.63, p=0.008. The ANOVA of the Gammas by age group
was significant, F(2, 73)=18.6, p<0.001, MSE=0.0018. A Post hoc analysis showed that both
third- and fifth-grade students differed from adults but not from each other.
The length of the next word appears to have the opposite pattern. The Gammas were
significantly different from zero for 40% of third-grade readers, 86.7% of fifth-grade students,
and 100% of adults. A Chi-square test showed significant age effect, χ2(2)=15.2, p<0.001. The
41 The confidence interval of Gamma varies somewhat for each reader. In order to visually represent the interval,
the average confidence interval is used here.
97
ANOVA of the Gammas by age group was significant, F(2, 73)=42.8, p<0.001, MSE=0.0033. A
Post hoc analysis showed that every age group is different from each other.
The picture for landing position is different. The Gammas differed significantly from
zero for 35% of third-grade, 23.3% of fifth-grade students, and 7.7% of adults. A Chi-square test
showed significant age effect, χ2(2)=0.615, p>0.5. The ANOVA was also nonsignificant.
In contrast, for the effect of the previous saccade move, the Gamma for every reader was
significantly different from zero. The ANOVA of the Gammas by age group was significant,
F(2, 73)=10.45, p<0.001, MSE=0.0039. A Post hoc analysis showed that the third-grade students
did not differ from adults but both differed from fifth-graders.
Input variables and fixation duration control. Similar analyses also examined the effects
of input variables on the FDC node, and are presented in Figure 23.
Word frequency showed a significant effect on saccade programming for 60% of third-
grade, 46.7% of fifth-grade students, and 50% of adults. The Chi-square test was not significant,
χ2(2)=0.314, p>0.50. The ANOVA of the Gammas by age group was also nonsignificant, F(2,
73)=1.184, p=0.312, MSE=0.0053.
The length of the next word showed a similar developmental pattern but overall weaker
effects. The Gammas were significantly different from zero for 10% of third-grade readers, 30%
of fifth-grade students, and 19.2% of adults. A Chi-square test was not significant, χ2(2)=0.3479,
p>0.50. The ANOVA was nonsignificant, F(2, 73)=1.232, p=0.298, MSE=0.0035.
Landing position showed a development effect. The Gammas differed significantly from
zero for 55% of third-grade, 50% of fifth-grade students, and 7.7% of adults. The Chi-square test
was not significant, χ2(2)=3.09, p=0.214. However, the ANOVA test was significant, F(2,
98
73)=14.37, p<.001, MSE=0.0055.
Lastly, for the effect of the previous saccade move, the Gammas differed significantly
from zero for 80% of third-grade, 53.3% of fifth-grade students, and 30.8% of adults. The Chi-
square test was not significant, χ2(2)=4.189, p=0.123. However, the ANOVA of the Gammas by
age group was significant, F(2, 73)=6.89, p<0.001, MSE=0.0130. A Post hoc analysis showed
that the third-grade students did not differ from adults but both differed from fifth-graders.
Overall, the above results demonstrated that readers at different proficiency levels are
sensitive to different information in programming reading eye movements. When programming
the next saccade, beginning readers are more affected by the frequency of the currently fixated
word but are less affected by the length of the next word, compared to skilled readers. Landing
position also seems to have a larger impact on young readers’ WHEN decision.
Additionally, not all variables have equal effects on different parameters of eye
movements. For example, the length of the next word has very little effect on the duration of the
current fixation but significant effects on the programming of the next saccade, at least for more
skilled readers.
Discussion
What develops in reading eye-movement control? Analyses of the parameters of
individual SHARE models suggest that that as readers become more proficient, their eye
movements are less affected by features of the currently-fixated word (e.g., word frequency and
landing position) or the state of the previous eye movements (e.g., previous values of ST and
FDC nodes). Skilled readers take into account of the length of the next word in programming the
next saccade, and they tend to move further into the unread text.
99
Results based on analyses of SHARE’s parameter space confirmed many previous
knowledge about the development of reading eye movements. Furthermore, SHARE was able to
explore important questions that were unanswered in prior research. For example, it is found that
temporal-dependency in eye-movement control decreases slightly with age, but the effects
remain for most adult readers.
A unique feature of the SHARE model is that it models temporal dependencies between
consecutive eye movements. Evidently, these temporal dependencies were among the largest and
most consistent effects on eye movement control. More interestingly, temporal dependencies
decrease in strength with reading proficiency, which suggests that skilled reading eye
movements become more like a zero-order Markov (random-walk) process.
100
CHAPTER 7. DISCUSSION
The goal of the current research is to describe reading eye movements mathematically
with minimal assumptions about the mechanisms of the processes. A stochastic, hierarchical
architecture for reading eye-movement, or SHARE, is developed, and a simple model based on
this architecture is tested.
What is SHARE?
SHARE is a mathematical model that is able to reproduce many essential characteristics
of reading eye movements. It is, to my knowledge, the first model that simultaneously accounts
for fixation duration and saccade length in their distributional details, as opposed to only group
means. Its Markovian architecture also gives straightforward explanations to the moment-by-
moment dynamics of eye movements with few a priori assumptions, compared with some
existing models.
SHARE is also unique because of its completely individualized modeling approach,
which contrasts strongly with most, if not all, previous models’ focus on “the average person.”
Reading eye movements are as diverse as are readers themselves. There is no reason to presume
a common set of parameters, or even mechanisms, for all readers. Besides the bias in psychology
to think in terms of “the average person,” a practical obstacle preventing individualized modeling
is that there may not be enough data collected from an individual reader to obtain sound
parameter estimates. The Bayesian method used in SHARE provides a promising way to get
around the problem.
However, the most important contribution of SHARE is not the model in its current form.
Rather, the hope of this research is to introduce a language for describing reading eye
101
movements.
I argued in Chapter 1 that researchers have struggled to depict reading eye movements
since the discovery of the basic phenomena over a century ago (Javel, 1878). The solutions,
ranging from early attempts to use verbal analogies and visual aids to the latest flourish of
composite eye-movement measures and theories of mechanisms, are far from satisfactory.
The direction of the current research is to separate description from mechanism, and
focus squarely on the former. As a result, the SHARE architecture was designed to satisfy three
logical requirements for describing reading eye movements – that they are probabilistic in nature,
that they are time-series data, and that they are affected by other factors. Of course, some of the
details – e.g., the choice of input variables, the specifications of the nodes (discrete vs
continuous, etc.), or the independence assumptions – are specific to the current implementation.
But nevertheless, the general hierarchical, stochastic architecture has been shown to be flexible
enough to capture much of the essence of reading eye movements, and it has the potential to
become a common language to talk about eye movements.
This brings up the fine but crucial distinction between architecture and a specific theory
implemented under the architecture. It can be argued that SHARE, as implemented in the current
study, is a particular theory of reading eye movements, because it has restricted linguistic effects
on reading to word frequency only, and assumed conditional independency between the WHEN
and WHERE pathways. However, the author has no intention to defend such a theory. Rather, it
was implemented as an example of modeling in the new architecture. The fact that even a
simple-minded “theory” like this could account for many facts of eye movements demonstrated
the power of the architecture.
102
What SHARE is Not
First and foremost, SHARE, as in its current implementation, is not a theory of reading
eye movements the author wants to promote. As argued above, it is merely a demonstration of an
architecture to mathematically describe patterns of reading eye movements.
Moreover, the SHARE architecture is not a theory of eye-movement control mechanisms.
On the contrary, it is assumed that data and mechanism can be described separately, and SHARE
is intended to be as independent of the mechanism assumptions as possible. For example,
SHARE models what effect word frequency has on saccade targeting, but makes no assumption
about how the effect is possible. It does not say anything about whether the effect of word
frequency happens earlier or later than word length, or whether or not attention has been shifted.
The arrows in the hierarchical architecture represent the direction of causality only; they do not
imply serial processing or even temporal order. In short, eye-movement description is at the
phenomenological level.
SHARE does not compete with existing theories of eye movements; it is a complement.
In a sense it provides a test-bed, where different theories may be implemented on a common
ground and compete with each other. For example, it is conceivable that the E-Z Reader model
could be implemented in a SHARE environment. It would add many processing assumptions to
SHARE, and would make specific predictions about how, for example, word frequency would
affect the control of saccade and fixation duration. In other words, the model would fix some of
the free parameters. The model would then fit empirical data (a built-in feature of the SHARE
architecture), and the result could be compared to a “full” model where the corresponding
parameters were not fixed. Standard statistical tests could be carried out to evaluate the power of
103
the model.
Of course, it is arguable whether description and mechanism are truly separable. Our
survey of existing eye-movement models suggests, to a large extent, they are. McConkie and
Dyre (2000) have shown that different mechanisms may result in almost identical fits to
empirical data. Conversely, the Mr. Chips model (Legge et al., 1997) demonstrated that a
complex deterministic process could be modeled successfully with simple probabilistic
heuristics. On the other hand, the probabilistic nature of the SHARE architecture precludes
implementing models such as READER, in which eye movements are deterministically decided.
Nonetheless, the Mr. Chips model hints that SHARE may also be compatible with a
deterministic model if the distributional properties of the model are well understood.
Composite Variables Revisited: Implications to Psycholinguistic Research
The proliferation of composite eye-movement measures may reflect researchers’
increasing frustration in describing complex eye movement patterns. However, new measures
have not solved the problem, and in many cases only complicate the matter more.
SHARE suggests a different approach. Instead of summing up fixation duration over time
in idiosyncratic ways, SHARE captures temporal dynamics with its Markovian structure at the
eye-movement control layer. Paired with the power of the hierarchical structure (input variables),
SHARE’s probabilistic representation naturally summarizes endless combinations of eye-
movement patterns. What is variable and elusive in the sample domain can be expressed as stable
parameters in the Markov transition matrices. An analogy is the two representations of speech
signals – what is difficult to perceive in the waveform may be obvious in the spectrogram, and
vise versa. Eye-movement patterns may be difficult to capture in the sample domain, but much
104
easier to deal with as a Markov transition matrix.
To psycholinguistic researchers, this points to a change in data analysis. For example, in
a hypothetical reading experiment, the researcher manipulates a target word in a sentence so that
in the experimental condition the word does not fit the sentence context whereas in the control
condition it does. The researcher is interested in whether readers detect the improper word within
the region of the next n words (or within x fixations, etc.). Instead of using gaze duration over the
region (or using Liversedge et al., 1998, measures), s/he may define each of the n words as a
state, feed the eye movements within the region into a simple SHARE model42, and estimate the
transition matrices of the ST node for the experimental and control conditions. If readers change
saccade patterns when they see an inappropriate word, different transition matrices are expected
for the two conditions. Fixation duration may be modeled similarly.
There are several potential benefits for using this approach. First, the results may be more
interpretable. Instead of “mean gaze duration increased 15 msec,” one may report something like
“the probability of regressing back to the target word increased from 0.1 to 0.5, and the
probability of making long fixations increased from 0.3 to 0.4.” Furthermore, with enough data
one may be able to estimate instantaneous transition probabilities, e.g., the probability of fixating
the target word in the 2nd, 3rd, … fixation after the first fixation on the target word. This is
valuable information that many researchers have tried to infer from traditional measures such as
first fixation duration and gaze duration. Last but not least, individual differences in reading eye
movements may be estimated and experimental effects may be estimated for individual readers.
42 A simple first-order Markov model may suffice in this hypothetical study.
105
In sum, for psycholinguistic reading research, the SHARE architecture may provide a
complementary or possibly alternative solution to the eye-movement measurement problem,
although many details have to be worked out in the future.
Applications in Reading Education
One of the original motivations of this research was to use eye movements to detect
processing difficulties in reading. In the early days of eye-movement research, the pioneers
(Buswell, 1922; Buswell, 1937; Dearborn, 1906; Gray, 1922; Huey, 1908) did not hesitate to
point to an “abnormal” eye-movement pattern and conclude that the reader was experiencing
difficulties. Buswell (1937) also distinguished general reading deficiencies from having trouble
with specific words. The problem, of course, was that the inference process was qualitative and
holistic. The “art” of detecting reading difficulties from eye movements disappeared soon after.
Logically, if readers have different eye-movement patterns when they are reading
normally versus experiencing difficulties, one should be able to compare the patterns and detect
the state of the reader. To carry out this process quantitatively, however, one has to be able to
faithfully describe eye-movement patterns associated with different states and probabilistically
infer the state from observed eye movement patterns.
SHARE was developed exactly for this purpose. It is able to summarize a wide range of
eye-movement patterns with the stochastic, hierarchical structure. The Bayesian method can be
used to probabilistically infer the state of an unobserved node in the structure given observed
data (e.g., the value of the FDC node was hidden and was estimated from data). In addition, its
ability to adapt to individual reader’s eye-movement parameters is also essential in performing
the detection task.
106
As an extension of the current research, a prototype of a reading difficulty detection
model has been developed. At its simplest form, it consists of an input layer with FREQ and
WLEN as input variable, a cognitive-state layer that contains a binary node (troubled versus
normal reading), and the eye-movement layer contains a discrete node that is similar to the ST
node in the current model. The cognitive state node is assumed to be unobserved, and the goal of
the model is to estimate the probability of the states given input variables and an observed
sequence of saccade movements. Initial testing shows that the prototype model is able to
distinguish different patterns of eye movements.
Although the prototype model is far from complete, the initial results are promising. A
next step for the current research is to explore the full potential of the SHARE architecture in
describing and detecting reading difficulties.
SHARE, the mathematical model developed in this research, grows out of the need to
quantitatively account for reading eye movements in both theoretical research and educational
applications. It demonstrates the feasibility and utility of modeling eye movements at a level
other than mechanisms and processes. Although researchers may have different theories about
mechanisms and processes, it is my hope that at least we can share a common description of eye
movements.
107
Table 1. Developmental Characteristics of Reading Eye Movements
TABLES
Grade level Article and characteristic 1 2 3 4 5 6 Adult Taylor (1965) Fixation duration (msec) 330 300 280 270 270 270 240 Fixations per 100 words 224 174 155 139 129 120 90 Frequency of regressions (%) 23 23 22 22 21 21 17Buswell (1922) Fixation duration (msec) 432 364 316 268 252 236 252 Fixations per 100 words 182 126 113 92 87 87 75 Frequency of regressions (%) 26 21 20 19 20 21 8Rayner (1985b) Fixation duration (msec) 290 276 242 239 Fixations per 100 words 165 122 110 92 Frequency of regressions (%) 27 25 24 9McConkie et al. (1991) Fixation duration (msec) 304 268 262 248 243 200 Fixations per 100 words 168 138 125 132 135 118 Frequency of regressions (%) 34 33 34 36 36 21Overall mean Fixation duration (msec) 355 306 286 266 255 249 233 Fixations per 100 words 191 151 131 121 117 106 94 Frequency of regressions (%) 28 26 25 26 26 22 14
Reproduced from: Table 4 in Rayner (1998)
108
Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting
Component Modes Variance of log(dur) Weights Components BNT (sec) GMM (sec) BNT GMM BNT GMM
3rd-grade: N=481, BNT Log likelihood= -386.86, GMM Log likelihood=-386.68 S 0.062 0.082 0.171 0.284 0.081 0.133M 0.204 0.212 0.133 0.105 0.537 0.530L 0.302 0.321 0.184 0.171 0.382 0.337
5th-grade: N=586, BNT Log likelihood= -415.85, GMM Log likelihood=-417.53 S 0.061 0.169 0.217 0.657 0.037 0.158M 0.191 0.194 0.103 0.092 0.595 0.611L 0.305 0.351 0.227 0.143 0.368 0.231
Adults: N=416, BNT Log likelihood= -231.08, GMM Log likelihood=-231.40 S 0.061 0.072 0.21 0.281 0.081 0.104M 0.177 0.173 0.089 0.079 0.673 0.531L 0.221 0.218 0.146 0.122 0.246 0.365
109
Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control
FIGURES
Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control (reproduced from Reilly,
1997, Figure 1). The circles represent connectionist modules and the rectangles non-connectionist control modules.
Thick lines indicate a flow of activation, thin lines a flow of control. The asymptote detectors determine when the
cascading outputs from the lexical and saccadic modules have reached asymptote.
110
Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5.
Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5 (reproduced from Reichle et al., 1997,
Figure 6). Preview benefit (gray area) increases as the frequency of the foveal word increases (x-axis). At time t(fn),
familiarity check has completed and a saccade to the next word is ordered, which would take a constant time,
t(mn+1)+t(Mn+1), to prepare and execute. During this time, if the lexical completion process is able to finish, t(lan),
there will be some time for parafoveal processing, marked in gray. Because the slope for t(lan) is larger than that for
t(fn), the gray area shrinks for low-frequency words.
111
Figure 3. Order-of-processing diagram for E-Z Reader 5
Figure 3. An order-of-processing diagram for E-Z Reader 5 (reproduced from Reichle et al., 1998, Figure
7). The boxes are possible states that the model could be in, with the ongoing processes represented in the box. Each
arrow is labeled by the process that has completed, and dotted arrows indicate that attention has shifted forward
(indicated by n = n + 1 on the diagram). Note that n indexes the attended word, not the fixated word. (The numbers
given to the boxes are essentially arbitrary.) f = familiarity check of the word; lc = completion of the lexical access
of the word; m = a labile stage of saccade programming that can be canceled by a subsequent saccade; M = a
subsequent nonlabile stage of saccade programming. The additional states added are for planning and executing
intraword saccades.
112
Figure 4. Illustration of components of the Mr. Chips model
Figure 4. Illustration of components of the Mr. Chips model, reproduced from (reproduced from Legge et
al., 1997, Figure 1). See Chapter 2 (page 34) for details.
113
Figures 5A and 5B. Landing Position of Fixations During Reading
Figures 5A and 5B. Landing position of fixations during reading (reproduced from McConkie et al., 1994,
Figures 1 and 2). Figure 5A shows empirical frequency distributions of fixation landing position as a function of
launching sites. The corresponding fitted Normal curves are plotted. Figure 5B shows the mean landing position as a
function of launch site, for seven-letter words. It can be seen that the range error is zero at launch site equals 7 letter
spaces.
114
Figure 6. Frequency of skipping four- and eight-letter words
Figure 6. Frequency of skipping four- and eight-letter words (reproduced from McConkie et al., 1994,
Figure 3). The probability of word skipping can be modeled with a logistic function (see Chapter 2, page 43 for
more details).
115
Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site
Figure 7. Mean landing positions of regressive saccades as a function of launch site (reproduced from
Radach & McConkie, 1998, Figure 3). The x-axis is numbered relative to the space following the target word, with
negative numbers indicating launch sites from within the word, and positive numbers indicating launch positions to
the right of the word boundary. The y-axis indicates mean landing position, and is numbered with respect to the
center of the word. Interword regressions do not show systematic range errors.
116
Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model
Figure 8. Fitting fixation duration distribution with a two-stage mixture model (reproduced from McConkie
et al., 1994, Figure 5). See Chapter 2 (page 46) for details.
117
Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press)
0
2
4
6
8
10
12
14
16
18
20
25 75 125 175 225 275 325 375 425 475 525 575 625 675 725
Fixation Durations(25ms Bins)
Perc
enta
ge
Normal+Normal-Nonword+X's+X's-Dash+Blank-
Figure 9. Distributions of fixation durations in Yang and McConkie (in press, reproduced from Figure 2).
Normal+ is the control condition in which the original text was displayed. In the Normal- condition all spaces were
replaced by the @ character. In the Nonword+ condition letters were replaced by randomly selected letters. In the
X’s+ condition all characters except for spaces were replaced by X’s. In the X’s- condition all characters, including
spaces, were replaced by X’s. In the Dash+ condition all characters except for spaces were replaced by dashes. All
characters were replaced by spaces in the Blank-condition.
118
Figure 10. Graphical representation of the SHARE model
PSLt
FDCt S | M | L
STt -2*| -1| 0 | 1 | 2 | 3 | 4*
ECCENn C | E
WLENn+1S | M | L
FREQn L | M | H
SACCt DURt
Figure 10. Graphical representation of the SHARE model. Each node represents a random variable. FREQn
is the frequency of the current word. WLENn+1 is the length of the next word. ECCENn is the eccentricity of the
current landing position. STt is the saccade targeting node that plans the current saccade (the one following the
current fixation t). FDCt is the fixation duration category of the current fixation. PSLt is the planned saccade length
of the current saccade. SACCt is the actual length of the current saccade. DURt is the log-transformed duration of
the current fixation. Nodes with rectangle boxes are discrete variables; nodes with oval boxes are continuous nodes.
Clear boxes represent observed variables; the shadowed box (FDC) represents a hidden variable. An arrow from one
node to another shows that the latter variable is dependent on the former; the lack of an arrow between two nodes
shows that the two nodes are conditionally independent. The circular arrows beside the ST and FDC nodes signify
temporal dependency, i.e., the value of a node at fixation t depends on that at fixation t-1.
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
350Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000
1200
1400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
350Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000
1200
1400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
120
140Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
120
140Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000
1200Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
20
40
60
80
100
120Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
100
200
300
400
500
600Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
200
400
600
800
1000Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
350Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600
700Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
50
100
150
200
250
300
350Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300
350Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200
250
300Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500
600Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
0 0.2 0.4 0.6 0.8 10
50
100
150
200Distribution of Fixation Duration
Fixation Duration (in sec)
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
EmpiricalSimulated
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Fixation Duration (in sec)
Cum
ulat
ive
Pro
b
−600 −400 −200 0 200 400 6000
100
200
300
400
500Distribution of Saccade Length
Saccade Length
Fre
quen
cy (
in e
mpi
rical
dat
a sc
ale)
−600 −400 −200 0 200 400 6000
0.2
0.4
0.6
0.8
1Cumulative Distribution Function
Saccade Length
Cum
ulat
ive
Pro
b
195
Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5Low Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5Medium Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
G3 G5 Adult
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5High Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
196
Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5Low Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5Medium Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
G3 G5 Adult
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5High Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
197
Figure 14. Simulated and Empirical Gaze Duration by Word Frequency
0 0.2 0.4 0.6 0.8 1 1.2 1.40
0.2
0.4
0.6
0.8
1
1.2
1.4
Low Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
0 0.2 0.4 0.6 0.8 1 1.2 1.40
0.2
0.4
0.6
0.8
1
1.2
1.4
Medium Frequency Words
Empirical (sec.)S
imul
ated
(se
c.)
G3 G5 Adult
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1High Frequency Words
Empirical (sec.)
Sim
ulat
ed (
sec.
)
198
Figure 15. Simulated and Empirical Skipping Probability by Word Frequency
0 0.02 0.04 0.06 0.08 0.10
0.02
0.04
0.06
0.08
0.1Low Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
0 0.05 0.1 0.15 0.2 0.250
0.05
0.1
0.15
0.2
0.25Medium Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
G3 G5 Adult
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5High Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
199
Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency
0 0.05 0.1 0.15 0.2 0.250
0.05
0.1
0.15
0.2
0.25Low Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
0 0.05 0.1 0.15 0.2 0.250
0.05
0.1
0.15
0.2
0.25Medium Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
G3 G5 Adult
0 0.1 0.2 0.3 0.4 0.50
0.1
0.2
0.3
0.4
0.5High Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
200
Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Low Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
0 0.02 0.04 0.06 0.08 0.1 0.12 0.140
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Medium Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
G3 G5 Adult
0 0.05 0.1 0.15 0.20
0.05
0.1
0.15
0.2High Frequency Words
Empirical prob.
Sim
ulat
ed p
rob.
201
Figure 18. Developmental Changes in Saccade Targeting Probabilities
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Probability of Regressions
Reading Speed (WPM)
Pro
b.
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Probability of Refixations
Reading Speed (WPM)
Pro
b.
G3 G5 Adult
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Prob. of Progressing 1 Word
Reading Speed (WPM)
Pro
b.
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Prob. of Progressing 2 or More Words
Reading Speed (WPM)
Pro
b.
202
Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short,
Medium, and Long Fixations
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Probability of Short Fixations
Reading Speed (WPM)
Pro
b.
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Probability of Medium Fixations
Reading Speed (WPM)
Pro
b.G3 G5 Adult
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1Probability of Long Fixations
Reading Speed (WPM)
Pro
b.
203
Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and
Long Fixation Durations
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Mode of Short Fixations
Reading Speed (WPM)
Tim
e (s
ec.)
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Mode of Medium Fixations
Reading Speed (WPM)
Tim
e (s
ec.)
G3 G5 Adult
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Mode of Long Fixations
Reading Speed (WPM)
Tim
e (s
ec.)
204
Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium,
and Long Fixation Durations
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Variance of Short Fixations
Reading Speed (WPM)
Var
. (se
c.)
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Variance of Medium Fixations
Reading Speed (WPM)
Var
. (se
c.)
G3 G5 Adult
0 200 400 600 8000
0.1
0.2
0.3
0.4
0.5Variance of Long Fixations
Reading Speed (WPM)
Var
. (se
c.)
205
Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next
Word, Fixation Landing Position, and the Previous Saccade Move
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5Word Frequency
Reading Speed (WPM)
Gam
ma
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5 Length of Next Word
Reading Speed (WPM)
Gam
ma
G3 G5 Adult
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5 Landing Position
Reading Speed (WPM)
Gam
ma
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5 Last Saccade Move
Reading Speed (WPM)
Gam
ma
206
Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the
Next Word, Fixation Landing Position, and the Previous Saccade Move
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5Word Frequency
Reading Speed (WPM)
Gam
ma
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5 Length of Next Word
Reading Speed (WPM)
Gam
ma
G3 G5 Adult
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5Landing Position
Reading Speed (WPM)
Gam
ma
0 200 400 600 800−0.5
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0.5 Last Fixation
Reading Speed (WPM)
Gam
ma
207
Figure 24. BNT Mixture of Gaussian Model Diagram
FDC S | M | L
DUR
Figure 24. Graphical representation of the BNT Mixture of Gaussian model for fitting fixation duration
distributions. FDC is a hidden node representing the fixation duration category. DUR is the log-transformed duration
of the current fixation. FDC is a discrete variable with three states: S, M, and L, with prior probabilities of 0.10,
0.55, and 0.35, respectively. DUR is a continuous variable following normal (Gaussian) distributions. The priors for
DUR conditioned on FDC value are set as follows: DURS~N(75, 80), DURM~N(180, 130), DURL~N(320, 320).
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time (in second)
Pro
b. D
ensi
ty
N= 45995, mean= 0.27084, LogLikelihood= −40497
Mode(linear)= 0.230, var(log)= 0.341, w= 1.000
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time (in second)
Pro
b. D
ensi
ty
N= 45995, mean= 0.27084, LogLikelihood= −38832
Mode(linear)= 0.218, var(log)= 0.636, w= 0.402Mode(linear)= 0.238, var(log)= 0.139, w= 0.598
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time (in second)
Pro
b. D
ensi
ty
N= 45995, mean= 0.27084, LogLikelihood= −38498
Mode(linear)= 0.081, var(log)= 0.353, w= 0.088Mode(linear)= 0.212, var(log)= 0.120, w= 0.608Mode(linear)= 0.362, var(log)= 0.246, w= 0.305
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time (in second)
Pro
b. D
ensi
ty
N= 45995, mean= 0.27084, LogLikelihood= −38437
Mode(linear)= 0.067, var(log)= 0.230, w= 0.071Mode(linear)= 0.170, var(log)= 0.066, w= 0.346Mode(linear)= 0.274, var(log)= 0.079, w= 0.399Mode(linear)= 0.444, var(log)= 0.210, w= 0.183
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time (in second)
Pro
b. D
ensi
ty
N= 45995, mean= 0.27084, LogLikelihood= −38458
Mode(linear)= 0.064, var(log)= 0.208, w= 0.066Mode(linear)= 0.155, var(log)= 0.057, w= 0.239Mode(linear)= 0.223, var(log)= 0.050, w= 0.339Mode(linear)= 0.338, var(log)= 0.061, w= 0.245Mode(linear)= 0.533, var(log)= 0.180, w= 0.111
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
Time (in second)
Pro
b. D
ensi
ty
N= 57015, mean= 0.24816, LogLikelihood= −43961
Mode(linear)= 0.217, var(log)= 0.274, w= 1.000
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
Time (in second)
Pro
b. D
ensi
ty
N= 57015, mean= 0.24816, LogLikelihood= −41707
Mode(linear)= 0.215, var(log)= 0.528, w= 0.393Mode(linear)= 0.218, var(log)= 0.109, w= 0.607
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
Time (in second)
Pro
b. D
ensi
ty
N= 57015, mean= 0.24816, LogLikelihood= −41175
Mode(linear)= 0.086, var(log)= 0.341, w= 0.080Mode(linear)= 0.198, var(log)= 0.091, w= 0.599Mode(linear)= 0.326, var(log)= 0.201, w= 0.321
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
Time (in second)
Pro
b. D
ensi
ty
N= 57015, mean= 0.24816, LogLikelihood= −40986
Mode(linear)= 0.077, var(log)= 0.248, w= 0.074Mode(linear)= 0.170, var(log)= 0.050, w= 0.396Mode(linear)= 0.266, var(log)= 0.069, w= 0.361Mode(linear)= 0.393, var(log)= 0.195, w= 0.169
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
Time (in second)
Pro
b. D
ensi
ty
N= 57015, mean= 0.24816, LogLikelihood= −40947
Mode(linear)= 0.059, var(log)= 0.155, w= 0.046Mode(linear)= 0.133, var(log)= 0.055, w= 0.119Mode(linear)= 0.182, var(log)= 0.034, w= 0.364Mode(linear)= 0.281, var(log)= 0.052, w= 0.325Mode(linear)= 0.422, var(log)= 0.173, w= 0.147
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8
Time (in second)
Pro
b. D
ensi
ty
N= 40478, mean= 0.19254, LogLikelihood= −23607
Mode(linear)= 0.176, var(log)= 0.188, w= 1.000
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8
Time (in second)
Pro
b. D
ensi
ty N= 40478, mean= 0.19254, LogLikelihood= −21839
Mode(linear)= 0.150, var(log)= 0.404, w= 0.276Mode(linear)= 0.187, var(log)= 0.093, w= 0.724
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8
Time (in second)
Pro
b. D
ensi
ty N= 40478, mean= 0.19254, LogLikelihood= −21812
Mode(linear)= 0.110, var(log)= 0.328, w= 0.156Mode(linear)= 0.173, var(log)= 0.072, w= 0.564Mode(linear)= 0.237, var(log)= 0.133, w= 0.279
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8
Time (in second)
Pro
b. D
ensi
ty
N= 40478, mean= 0.19254, LogLikelihood= −21814
Mode(linear)= 0.096, var(log)= 0.252, w= 0.133Mode(linear)= 0.154, var(log)= 0.050, w= 0.391Mode(linear)= 0.216, var(log)= 0.052, w= 0.348Mode(linear)= 0.285, var(log)= 0.133, w= 0.128
0 0.2 0.4 0.6 0.8 10
1
2
3
4
5
6
7
8
Time (in second)
Pro
b. D
ensi
ty
N= 40478, mean= 0.19254, LogLikelihood= −21764
Mode(linear)= 0.075, var(log)= 0.193, w= 0.075Mode(linear)= 0.124, var(log)= 0.043, w= 0.181Mode(linear)= 0.166, var(log)= 0.021, w= 0.323Mode(linear)= 0.231, var(log)= 0.033, w= 0.295Mode(linear)= 0.298, var(log)= 0.115, w= 0.127
223
APPENDIX A. PROBLEMS IN THE E-Z READER MODEL
Reichle et al. (1998; Reichle et al., 1999) developed a series of “E-Z Reader” models of
eye-movement control during reading. They concluded that the E-Z Reader models fit the data
well. However, as will be shown below, evaluating the goodness of fit of the model turned out to
be impossible because of serious problems in their goodness-of-fit index and limitations of the
empirical data used for modeling.
The Goodness-of-fit Index.
A goodness-of-fit index is arguably the most important part of a model. On one hand, it is
the criterion based on which a model is "optimized" and parameters are estimated. On the other
hand, it is an important criterion for comparing and selecting models. It is the link between
theory and data. However, the way goodness-of-fit was handled in Reichle et al. is questionable.
According to Reichle et al. (1998):
The model's overall performance was measured by using the root mean square of the
normalized difference scores (errors) between the observed and predicted means of the
five frequency classes for each of the dependent measures. The normalization process
allowed the errors to be evaluated on a common scale (i.e., milliseconds and probabilities
were converted to unitless scores). The normalization process that we used was to square
the difference between the observed and predicted values and then to divide this
difference by the standard deviation of the observed values. (p. 157)
To facilitate further discussion, let’s put the above into formulas. Let X, an eye-
movement measure, be a random variable with expected value µx and standard deviation σx . Let
{x1 .. xN} be a random sample of X, with a sample mean of x and sample standard deviation sd.
224
For large N we know that the distribution of sample mean is approximately normal with a
standard deviation that is estimated by the sample standard error, se = N
sd . Finally, let xs be
the mean of measure X from the E-Z Reader simulation. Because of the large size of N in the
simulation (1,000 “statistical subjects”), xs should be very stable and can be practically treated as
a constant. With the above notations, we can write Reichle et al.’s normalization algorithm and
the goodness-of-fit index (RMS) in the following formulas. For each measure of the M=30
measures, Xi, the normalized difference score, according to Reichle, et al. (1998, cited above), is
ii sd
y2
iis ) x - x ( = ,
and the goodness-of-fit index, root mean square (RMS), of a model is calculated as
M
yRMS
M
ii∑
=
2
There are at least two serious errors in the above goodness-of-fit index, each of which
will be shown to have a large impact on the evaluation and interpretation of the models. In
addition, the use of RMS as goodness-of-fit is also questionable. I will discuss each of them
below.
The “normalization.” Reichle et al. claim that their normalization process "allowed the
errors to be evaluated on a common scale" that is, rendering them unitless. The idea was
probably to normalize using Z-scores. But, their formula of normalization does not serve this
purpose:
iii
i Zsdsd
y ×=×== )x - x(x - x
)x - x() x - x (
iisiis
iis
2iis
225
Clearly Zi is a unitless Z-score, but Reichle et al.'s "normalized difference score" scaled Zi
by the difference between the observed and estimated mean of measure X. As a consequence,
when yi’s were used to calculate overall goodness of fit, different measures had different
contributions to the loss function and the weight depended on the scale of the measurers.
Specific to the E-Z Reader models, a rough estimation from Reichle et al. (1999) Table 1
showed that )x - x( iis for gaze duration, first fixation duration, and single-fixation duration are
anywhere from 2 to 18 (not counting 0's), while )x - x iis( for the probability of skipping, making
single fixations, and for making two fixations are in the range of 0.01 to 0.1. The difference
between the two groups of measures is in a factor of 100. Without doing any mathematical
analysis, it's obvious that the effects on the probabilities were grossly suppressed during the
model-fitting and parameter-estimating process. An immediate consequence of using this 100:1
"normalization" formula is that the E-Z Reader models were sensitive to fixation duration data
but practically ignored effects on skipping and refixation probabilities. It is not surprising then,
given this optimization criterion, that model fitting did not improve in any real sense from E-Z
Reader 2 to 6, and in many cases the fitness was actually worse.
It's interesting, though, that even under this extremely unfavorable treatment, the three
probability measures were fit reasonably well, judged by simply looking at the observed and
estimated means. A possible explanation is that the different measures of eye movements may
not be independent (as indeed they should not be if the E-Z Reader model is correct), and
consequently fitting a subset of the variables would guarantee that the rest of the variables are
also fit well. This hypothesis will be examined later.
Standard deviation versus standard error. In calculating
226
ii sd
Z x - x iis= ,
Reichle et al. (1998; Reichle et al., 1999) used standard deviation of the observed sample as the
denominator. Because the comparisons here were between means of observed versus simulated
observations, sample standard error should be used in the denominator (see Hayes, 1988). I
suspect that the confusion might stem from a seemingly similar situation, model training in
artificial neural networks, where after each cycle RMS is calculated on the basis of sample
standard deviation. This use of sample standard deviation is legitimate because a single
observation – activation level after this cycle – is the center of concern, rather than a mean of
some sort. However, the Monte Carlo simulation that Reichle et al. was doing is fundamentally
based on the Law of Large Numbers and is only concerned with means.
What impact does this have on goodness-of-fit indices? The answer depends on the
sample size. A rough guess on the N for each of the 30 means from Schiling et al. is
approximately 3,000 (48 sentences, 12 words long on average, 30 subjects, divided by 5
frequency categories). If Reichle et al. used standard error instead of standard deviation of each
measure, the Z scores, hence the overall goodness-of-fit index, would have been roughly 50
times larger. The RMS for E-Z Reader 6, for example, would have been in the neighborhood of
10, instead of 0.218. The Z-scores (using the correct formula) follow a unit Normal distribution
(for N=3,000). Therefore any |Zi| >2 clearly indicates a poor fit at point i, at an α level of .05. If
sd were used in place of se, as Reichle et al. did, the magnitude of Zi would be shrunk some 50-
fold and would never be significant.
RMS and goodness-of-fit testing. Reichle et al. chose to use the root mean square of error
227
(RMS) as an index of the goodness of fit during grid-searches of optimal parameters. There is
nothing wrong with the choice. However, RMS is rarely used in statistical modeling or Monte
Carlo simulations as a goodness-of-fit index because (a) it is difficult to test the fit of a model to
data or to compare different models on the basis of RMS, and (b) there are easier ways to do the
job.
One classical goodness-of-test statistic, Chi-square, is actually closely related to RMS.
When each of the M error components is independently and identically distributed (i.i.d.) as unit
Normal distribution (Z), the sum of squared errors (SSE),
∑=×=M
iiZMRMSSSE 22 ,
is distributed as a Chi-square distribution with degree of freedom (df) of M. Thus SSE can be
used to test against an appropriate Chi-square distribution to see if the hypothesis that the model
fits the data set should be rejected. Not only can the fit of a single model be tested this way, but
also a series of two or more hierarchically constructed models, with increasing numbers of free
parameters, can be compared using the Chi-square test in order to decide whether the
improvement in fit with additional parameters is statistically justifiable.
Reichle et al. did not formally test the fit of their models to the data or based model
selection on clear empirical criteria, being primarily concerned with psychological validity.
Well-developed statistical methods of model fitting exist, and can provide a more systematic
means of developing and comparing models.
Correlations, Multicollinearity, and Parsimonious Modeling.
A question raised previously is why E-Z Reader was able to model eye-movement
228
probability data fairly well even when these measures had little weight in model optimization
and parameter estimation. A possibility is that the probability measures were highly correlated
with duration data. There was a hint in the report that this was true, as Reichle et al. (1998) stated
that “the single-fixation duration and refixation means were not included in this [RMS] measure
because their values are largely redundant with the other measures.”
To test this hypothesis, I computed pairwise correlations between the six means of eye-
movement measures, mean category word frequency, and the logarithm of the frequencies given
in Reichle et al. (1999) Table 1. All eye-movement measure means are highly correlated. The
correlation coefficients range from .85 (between skipping rate and first fixation duration, p=.069,
N.S. for n=5), to .998 (between first fixation duration and single fixation duration, highly
significant). A Principle Component analysis on the six eye-movement measures showed that the
first component accounts for 94.6% of the total variance, the first two components account for
98.6%, and the first three component account for 99.999% of total variance. In addition, all eye-
movement measures are highly correlated with the logarithm of word frequency (all p's<.05). In
short, the six eye-movement variables can be effectively reduced to a single variable, with only
5% loss of information. The model fitting on the 30-point empirical data was practically based
on 5 points, which have an almost perfect linear relationship with log-transformed word
frequency.
The multicollinearity explains another puzzling aspect of the E-Z Reader models. First, as
E-Z Reader evolved from 1 to 5, its goodness-of-fit (measured by RMS) did not improve, and
often got worse. This goes against the common experience in modeling. Part of the reason for
this is because of the errors in the loss function. On the other hand, it could also be that the E-Z
229
Reader 1 was almost perfect given such a simple structure in the data. Any additional
mechanisms and parameters added in subsequent models could not possibly improve the fit.
Obviously, the most parsimonious model, possibly the only model, for this data set is
"any eye-movement measure is a linear function of log-transformed word frequency." Given the
extremely high correlations among all variables, a good model for one variable is automatically a
good one for another variable. The rest of the modeling process is to find out the intercepts and
slopes of the linear functions – an easy job for the grid-search algorithm.
The EZ-Reader modeling effort is one of the most ambitious attempts to model eye-
movement control parameters in a psychologically plausible fashion, but important errors in the
modeling approach severely limit the conclusions that can be drawn from this research.
230
APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION
DISTRIBUTIONS
Introduction
There has been empirical evidence that fixation duration may not follow a single
distribution but instead consist of a mixture of distributions (Gezeck et al., 1997; McConkie &
Dyre, 2000; Yang & McConkie, in press; see Chapter 3 for discussions on these studies).
Therefore, two critical modeling decisions are (a) the component distributions and (b) the
number of components.
To date, the most successful models of fixation duration distribution are the three models
from McConkie and Dyre (2000) – the two-state transition model, the two-stage race model, and
the two-stage mixture model. All of the three models are essentially mixture models of an early,
short component and a late, long component43. The choices of component distributions varied
(Weibull, exponential, convolutions of Weibull and exponentials), but they were largely
motivated by empirical hazard functions.
43 For the two-state transition model, the short fixations are assumed to follow a Weibull distribution with a power
(shape) parameter equals to 2 (which has a linearly rising hazard function). The long component is assumed to be
exponential. The mixing rate, i.e., the proportion to switch from State 1 to State 2, increases over time. For the two-
stage mixture and two-stage racing models, McConkie and Dyre (2000) assumed that the duration of Stage 1 is a
mixture of short and long components; the duration of Stage1 is then convoluted with that of Stage 2, which is an
exponentially distributed random variable. This is mathematically equivalent to saying that the final distribution is
composed of short and long fixations, each of which is a convolution of two random distributions – the
corresponding State 1 distribution and the exponential. Therefore, all three models are essentially mixture models.
231
There is no unique way to fit an empirical distribution with mixture models (c.f.
McLachlan & Peel, 2000). The success of these models suggests that other mixture models with
different component assumptions may also achieve good results. In addition, the components in
McConkie and Dyre’s (2000) models are complex and difficult to handle mathematically. The
current study was a search for a simpler solution.
A simple distribution, the lognormal distribution, was chosen as the distribution of the
mixture components44. There were two reasons for this choice. First, the hazard function of
lognormal distribution has the characteristics of the empirical hazard rates: an initially slow but
accelerating curve, reaching at a peak, which is followed by a very slow, graduate decreasing tail
(Johnson et al., 1994). Secondly, a mixture-of-lognormal distribution is easy to handle because
on the log-scale it becomes a mixture-of-normal distribution, which is the most extensively
studied mixture model class. Its mathematical properties are well understood, and many
statistical algorithms are available for model estimation.
Method
Data and Apparatus. See Chapter 4.
Modeling procedure. Model fitting was done in MatLab, a numeric computation software
package. Fixation duration was first log-transformed, so that the logarithm of it was to be fit with
mixture-of-component Gaussian models. Two fitting methods were used.
For maximum likelihood estimating, the Gaussian Mixture Model (GMM) Toolbox was
44 The log-normal distribution is closely related to Normal distribution in that if log(x), x>0 is Normally distributed,
then x follows a log-normal distribution.
232
used (Cadez, Smyth, McLachlan, & McLaren, 2001). The GMM algorithm fits a mixture of n
Gaussian model, where n is a pre-specified integer, to the data and iteratively changes model
parameters until it maximizes the likelihood of observing the data given the model. For more
discussions on mixture models in general or maximum likelihood estimation of mixture of
Gaussian models, see McLachlan and colleagues (McLachlan & Basford, 1988; McLachlan &
Peel, 2000) and Titterington, Smith, and Makov (1985). The logarithm of fixation durations was
fitted with n=1..7 Gaussian mixture models, and the best fitting parameters over 5 repetitions
(with different random initial values) were used.
In addition to the maximum likelihood method, Bayesian estimation was done with the
Bayes Net Toolbox (BNT) developed by Kevin Murphy (2001). A graphical representation of
the BNT Gaussian mixture model is shown in Figure 24.
The Bayesian method takes into account the prior probability distribution of a parameter,
which represents prior knowledge, and incorporates it with the information in data to maximize
the posterior probability, or the probability of parameter values given observed data. A unique
advantage of the Bayesian method over the maximum likelihood estimation is that it incorporates
prior knowledge about the likely values of parameters. In the current case, the prior knowledge
came from the empirical results of Yang and McConkie (in press), i.e., the modes of the
distributions in their Figure 9.
Results
Maximum likelihood estimates. Figures 25-1 through 15 show the empirical fixation
duration distribution, the best-fit n-component Gaussian-mixture models, and the (weighted)
component distributions for third-grade, fifth-grade, and adult data. A visual inspection suggests
233
that 3-component Gaussian-mixture models fit the empirical data very well. Most importantly,
the three components in each age group correspond fairly well with the results from Yang and
McConkie.
Formally determining the number of components, however, was difficult. The typical
log-likelihood ratio test, a statistical procedure for comparing a “full” versus a “reduced” model
by weighting the gain in the goodness of fit against additional number of parameters, cannot be
applied directly in this case, because a 2-component Gaussian-mixture model is not strictly a
“reduced” model of a 3-component Gaussian-mixture model (McLachlan & Basford, 1988;
McLachlan & Peel, 2000; Titterington et al., 1985). Many alternative tests have been proposed
(McLachlan & Peel, 2000). Here I adopted a modified log-likelihood ratio test by Wolfe (1971;
see also Everitt, 1981), which has been shown to work well when the number of cases is at least
five times larger than the number of components. Wolfe proposed that under the null hypothesis
that the data arise from a mixture of g1 populations versus the alternative that they arise from g2
(g1<g2) populations, the usual log-likelihood statistic 2 logλ would be approximated as
-2c logλ ~ χ2d ,
where the degrees of freedom, d, is taken to be twice the difference in the number of parameters
in the two hypotheses, not including the mixing proportions, and the correction factor, c, is given
by
(n-1-p-1/2 g2)/n
In the current case n is sufficiently large, so c is practically 1.
Wolfe’s test was carried out in sequence to test the minimal number mixture components
234
that provided satisfactory fit to empirical data45. Each additional Normal component added two
new parameters, and hence d=4, and the corresponding Chi-square critical value for α=0.005 is
14.8602. In other words, if the difference of log-likelihood in two consecutive models (in terms
of the number of components) was larger than 14.86, the null hypothesis (having a smaller
number of components) should be rejected and the hypothesis associated with a larger number of
components should be adopted.
In all age groups the 3-component Gaussian-mixture models provided significantly better
fit than 2-component models, and seemed to capture the basic characteristics of the distributions.
The statistical tests showed that one should prefer a 4-component model for third-grade data, 5-
component for fifth-grade, and 3-component for adults. The additional variance accounted for in
moving beyond 3 components was relatively small (e.g., the loglikelihood for 3rd-grade
distribution increased by 334 when 3 components were used instead of 2, but it only increased by
61 and 21 for each additional component above 3), although significant. Because the differences
were so small and in order to facilitate comparison between age groups, 3-component models
were used for all groups in analysis of the parameters.
Although the maximum likelihood estimates of 3-component means corroborate with
Yang and McConkie’s (in press) findings in general, the estimates for the first component (the
short fixations) were not numerically stable from run to run, and the estimated means and
variances had a sizable effect on the estimates of parameters of the third (the longest)
45 Here the potential problem of correlation in sequential testing was simply dealt with by using a more stringent α
level, α=0.005.
235
components. There was a need to “anchor” the first component so as to obtain more stable
estimates of other components.
Bayesian estimates. The Bayesian estimation method was used to achieve these goals. In
these analyses, the number of components was fixed to three. Rather than having the maximum
likelihood algorithm randomly guess the initial values of parameters, the Bayesian method
allows imposing constraints of parameter values using prior distributions. Based on Yang and
McConkie (in press), the prior distributions of the components were set to three normal
distributions: N(log(75), 80), N(log(180), 130), and N(log(320), 320)46. The prior distribution of
the mixture weights was set to a Dirichlet distribution, following Bayesian modeling
conventions, with pi= {0.10, 0.55, 0.35}. These prior weights were based on the maximum
likelihood estimates of the weights for 3-component models.
Because Bayesian estimation is notoriously time consuming, random samples of 10% of
the original data were used in Bayesian estimation. This procedure was repeated three times to
ensure stability of estimates. In fact, the estimates were very stable even if only 1% of data
(which correspond to approximately 200-500 cases in each age group) were used. For
comparison the same random samples were subject to maximum likelihood estimation as well.
Table 2 showed the parameters and log-likelihood indices of the Bayesian estimates and the
corresponding maximum likelihood estimates. The results of the two methods were generally in
agreement. The fittings of Bayesian estimates (log-likelihood) were at least as good as that of
maximum likelihood ones, and the differences were often within the range of random
46 The unit for the means is millisecond. Note that fixation duration was log-transformed first and then fit to
236
fluctuations caused by different random starting points in the maximum likelihood method. As
expected, expected the Bayesian method provided a more consistent estimate of the mean of the
first component, so that it was less likely to interfere with the parameters of the third component.
To summarize the fitting results of lognormal-mixture models, 3-component models
provided very close fit to fixation duration distributions of both children and adult readers.
Although it is impossible to compare the goodness-of-fit of the lognormal-mixture model to that
of McConkie and Dyre’s models, they appear to be largely comparable based on the distribution
plots. In addition, the parameters of the three component distributions were reasonably close to
empirical findings in Yang and McConkie. This was an encouraging support for the choice of
lognormal-mixture model.
Additional analyses showed that the 3-component lognormal-mixture model could also fit
distributions of individual readers. Fixation duration on low frequency words had higher
proportion of the “long” component, and the mode of the component was larger. A further
investigation on the frequency effect showed the effect could be accounted for solely by the
weight component, i.e., when the parameters of the three components were fixed and only the
weights were allowed to vary, model fitting was not significantly different from when all
parameters were allowed to vary.
Discussion
The current study showed that a 3-component mixture-of-lognormal model could
successfully model empirical fixation duration distributions of beginning readers and adults. The
mixture-of-Gaussian models, which is equivalent to fitting fixation duration with mixture-of-log-normal models.
237
fitting appeared to be as good as McConkie and Dyre’s (2000) models.
The 3-component lognormal-mixture model provided a simple, straightforward
interpretation for Yang and McConkie’s (in press) results. According to the current model, there
are three classes of fixations, each with different distributional properties. In normal reading, the
mixture rate of these fixation classes may change with linguistic or other factors, but is relatively
stable. The resulted mixture showed the typical unimodal, long-tailed distribution. Under
extreme experimental manipulations such as in Yang and McConkie’s study, however, the
proportions are knocked out of normal balance and therefore individual component were
revealed. The current mixture model would hypothesize that each individual reader should have
stable component parameters in normal reading and Yang and McConkie’s experimental
conditions. It would be interesting to see this hypothesis tested.
Interpreting Yang and McConkie’s findings (in press) in McConkie and Dyre’s (2000)
modeling framework is difficult, because they assumed a two-component structure. In this sense,
the current model seems to be more readily interpretable.
Unlike McConkie and Dyre (2000), no attempt was made to infer the underlying
processing mechanism from the forms of distributions. Reasoning about stochastic processes
from their marginal distributions is often risky, as many mechanisms may result in similar
distributions. The choice of using lognormal components, which were no more arbitrary than
those components in McConkie and Dyre’s models, may raise skepticism. There is no doubt that
choosing the lognormal distribution was for modeling convenience, but the results suggested that
the decision was not a particularly bad one. At the same time, there is nothing in the model that
requires a lognormal distribution, and any other reasonable distribution may just work as well.
238
REFERENCES
Agresti, A. (1990). Categorical data analysis. New York: Wiley.
Andriessen, J. J., & De Voogd, A. H. (1973). Analysis of eye movement pattern in silent
reading. IF’0 Annual Program Report, 30-35.
Bengio, Y. (1999). Markovian models for sequential data. Neural computing surveys, 2,
129-162.
Bengio, Y., & Frasconi, P. (1996). Input/output HMMs for sequence processing. IEEE
Transactions on Neural Networks, 1231-1249.
Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester, England: John
Wiley.
Birnbaum, Z. W. (1952). Numerical tabulation of the distribution of Kolmogorov's
statistic for finite sample size. Journal of the American Statistical Association, 47, 425-441.
Boyen, X., & Koller, D. (1998a). Approximate learning of dynamic models. Paper
presented at the Neural Information Processing Systems (NIPS-11).
Boyen, X., & Koller, D. (1998b). Tractable inference for complex stochastic processes.
Paper presented at the 14th Annual Conference on Uncertainty in AI (UAI), San Francisco.
Brysbaert, M., & Vitu, F. (1998). Word skipping: Implications for theories of eye
movement control in reading. In G. Underwood (Ed.), Eye guidance in reading and scene
perception (pp. 125-147). Oxford, England UK: Anonima Romana.
Brysbaert, M., Vitu, F., & Schroyens, W. (1996). The right visual field advantage and the
optimal viewing position effect: On the relation between foveal and parafoveal word recognition.
Neuropsychology, 10, 385-395.
239
Buswell, G. T. (1922). Fundamental reading habits: A study of their development.
Supplementary Educational Monographs, 21.
Buswell, G. T. (1937). How adults read. Chicago, Ill.,: University of Chicago.
Cadez, I. V., Smyth, P., McLachlan, G. J., & McLaren, C. E. (2001). Maximum
likelihood estimation of mixture densities for binned and truncated multivariate data. Machine
learning journal, special edition on unsupervised learning, in press.
Carpenter, P. A. (1984). The influence of methodologies on psycholinguistic research: A
regression to the Whorfian hypothesis. In D. E. Kieras & M. A. Just (Eds.), New methods in
reading comprehension research (pp. 1-12). Hillsdale, NJ: Lawrence Erlbaum Asso.
Carpenter, R. H. S. (1988). Movements of the eyes (2nd rev. & enlarged ed.). London,
England UK: Pion Limited.
Conover, W. J. (1999). Practical nonparametric statistics. (3rd ed.). New York: Wiley.
Cowell, R. (1998a). Advanced inference in Bayesian networks, Learning in graphic
models (pp. 27-50). Cambridge, MA: MIT Press.
Cowell, R. (1998b). Introduction to inference for Bayesian networks, Learning in graphic
models (pp. 9-26). Cambridge, MA: MIT Press.
Dearborn, W. F. (1906). The Psychology of Reading. (Vol. XIV). New York: The
Science Press.
Everitt, B. S. (1981). A Monte Carlo investigation of the likelihood ratio test for the
number of components in a mixture of normal distributions. Multivariate Behavioral Research,
16, 171-180.
240
Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of
phonological and orthographic information in reading Chinese and English. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100.
Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel
processing and competitive inhibition. Behavioral & Brain Sciences, 22, 661-721.
Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage: lexicon and
grammar. Boston: Houghton Mifflin.
Gezeck, S., Fischer, B., & Timmer, J. (1997). Saccadic reaction times: A statistical
analysis of multimodal distributions. Vision Research, 37, 2119-2131.
Goodman, L. A., & Kruskal, W. H. (1954). Measures of Association for Cross
Classifications. Journal of the American Statistical Association, 49, 732-764.
Goodman, L. A., & Kruskal, W. H. (1963). Measures of Association for Cross
Classifications III: Approximate sampling theory. Journal of the American Statistical
Association, 58, 310-364.
Gray, C. T. (1922). Deficiencies in reading ability: Their diagnosis and remedies.
Chicago, IL: Heath & Co.
Hacisalihzade, S. S., Stark, L. W., & Allen, J. S. (1992). Visual perception and sequences
of eye movement fixations: A stochastic modeling approach. IEEE Transactions on Systems,
Man & Cybernetics, 22, 474-481.
Hall, W. J., & Wellner, J. A. (1980). Confidence bands for a survival curve from
censored data. Biometrika, 67, 133-143.
241
Harris, C. M., Hainline, L., Abramov, I., Lemerise, E., & et al. (1988). The distribution of
fixation durations in infants and naive adults. Vision Research, 28, 419-432.
Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. Jordan
(Ed.), Learning in Graphic Models (pp. 301-354). Cambridge, MA: MIT Press.
Heller, D. (1982). Eye movements in reading. In R. Groner & P. Fraisse (Eds.), Cognition
and eye movements (pp. 139-154). Amsterdam: North Holland.
Henderson, J. M., & Ferreira, F. (1993). Eye movement control during reading: Fixation
measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of
Experimental Psychology, 47, 201-221.
Hogaboam, T. (1983). Reading patterns in eye movement data. In K. Rayner (Ed.), Eye
movements in reading: Perceptual and language processes (pp. 309-332). New York: Academic
Press.
Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods. (2nd ed.). New
York: Wiley.
Huey, E. B. (1908). The psychology and pedagogy of reading: with a review of the
history of reading and writing and of methods, texts, and hygiene in reading. Cambridge, Mass.:
MIT Press.
Inhoff, A. W., & Radach, R. (1998). Definition and computation of oculomotor measures
in the study of cognitive processes. In G. Underwood (Ed.), Eye guidance in reading and scene
perception (pp. 29-53). Oxford, England UK: Anonima Romana.
Irwin, D. E. (1998). Lexical processing during saccadic eye movements. Cognitive
Psychology, 36, 1-27.
242
Javel, E. (1878). Essai sur la physiologie de la lecture. Ann. Oculist, 79, 97-117, 240-274.
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous univariate distributions.
(2nd ed.). New York: Wiley & Sons.
Jordan, M., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In M.
C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems
(Vol. 9, pp. 501-507). Cambridge, MA: MIT Press.
Jordan, M., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM
algorithm. Neural Computation, 6, 181-214.
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to
variational methods for graphical models. In M. Jordan (Ed.), Learning in graphical models (pp.
105-159). Cambridge, MA: MIT Press.
Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to
comprehension. Psychological Review, 87, 329-354.
Kennison, S. M., & Clifton, C. (1995). Determinants of parafoveal preview benefit in
high and low working memory capacity readers: Implications for eye movement control. Journal
of Experimental Psychology: Learning, Memory, & Cognition, 21, 68-81.
Kerr, P. W. (1992). Eye movement control during reading: The selection of where to send
the eyes. Unpublished Doctoral thesis, University of Illinois, Urbana-Champaign, IL.
Kingstone, A., & Klein, R. M. (1993). Visual offsets facilitate saccadic latency: Does
predisengagement of visuospatial attention mediate this gap effect? Journal of Experimental
Psychology: Human Perception & Performance, 19, 1251-1265.
243
Kliegl, R. M., Olson, R. K., & Davidson, B. J. (1982). Regression analyses as a tool for
studying reading processes: Comment on Just and Carpenter's eye fixation theory. Memory &
Cognition, 10, 287-296.
Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of
reading. Psychological Review, 104, 524-553.
Liversedge, S. P., Paterson, K. B., & Pickering, M. J. (1998). Eye movements and
measures of reading time. In G. Underwood (Ed.), Eye guidance in reading and scene perception
(pp. 55-75). Oxford, England UK: Anonima Romana.
Liversedge, S. P., & Underwood, G. (1998). Foveal processing load and landing position
effects in reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp.
201-221). Oxford, England UK: Anonima Romana.
McConkie, G. W. (1981). Evaluating and reporting data quality in eye movement
research. Behavior Research Methods & Instrumentation, 13, 97-106.
McConkie, G. W., & Dyre, B. P. (2000). Eye fixation durations in reading: Models of
frequency distributions. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a
perceptual process. Amsterdam: Elsevier Science Ltd.
McConkie, G. W., Kerr, P. W., & Dyre, B. P. (1994). What are "normal" eye movements
during reading: Toward a mathematical description. In J. Ygge & G. Lennerstrand (Eds.), Eye
movements in reading. Tarrytown, NY: Pergamon.
McConkie, G. W., Kerr, P. W., Reddix, M. D., & Zola, D. (1988). Eye movement control
during reading: I. The location of initial eye fixations on words. Vision Research, 28, 1107-1118.
244
McConkie, G. W., Kerr, P. W., Reddix, M. D., Zola, D., & et al. (1989). Eye movement
control during reading: II. Frequency of refixating a word. Perception & Psychophysics, 46, 245-
253.
McConkie, G. W., & Rayner, K. (1973). An on-line computer technique for studying
reading: Identifying the perceptual span. In P. L. Nacke (Ed.), Diversity in mature reading:
theory and research (Vol. 1, pp. 119-130): National Reading Conference, Inc.
McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a
fixation in reading. Perception & Psychophysics, 17, 578-586.
McConkie, G. W., Reddix, M. D., & Zola, D. (1992). Perception and cognition in
reading: Where is the meeting point. In K. Rayner (Ed.), Eye movements and visual cognition:
Scene perception and reading (pp. 293-303). New York, NY: Springer.
McConkie, G. W., Zola, D., Grimes, J., Kerr, P. W., Bryant, N. R., & Wolff, P. M.
(1991). Children's eye movements during reading. In J. F. Stein (Ed.), Vision and visual dyslexia
(pp. 251-262). London: Macmillan Press.
McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. London ; New York:
Chapman and Hall.
McLachlan, G. J., & Basford, K. E. (1988). Mixture models : inference and applications
to clustering. New York, N.Y.: M. Dekker.
McLachlan, G. J., & Peel, D. (2000). Finite Mixture Models. NY: Wiley.
Miller, K., & Feng, G. (in prep.). Reading English and Chinese: A developmental eye-
movement study.
245
Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for
parallel programming of saccades. Journal of Experimental Psychology: Human Perception &
Performance, 10, 667-682.
Murphy. (2001). Bayes Net Toolbox for Matlab 5. Available:
http://www.cs.berkeley.edu/~murphyk/Bayes/bnt.html.
Murray, W. S. (2000). Commentary on Section 4. Sentence processing: Issues and
measures. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 649-664).
Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers.
O'Regan, J. K. (1990). Eye-movements and reading. In E. Kowler (Ed.), Eye movements
and their role in visual and cognitive processes (pp. 395-453). Amsterdam: Elsevier.
O'Regan, J. K., & Jacobs, A. M. (1992). Optimal viewing position effect in word
recognition: A challenge to current theory. Journal of Experimental Psychology: Human
Perception & Performance, 18, 185-197.
Perl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge.
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in
speech recognition. Proceedings of the IEEE, 77.
Radach, R., & McConkie, G. W. (1998). Determinants of fixation positions in words
during reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 77-
100). Oxford, England UK: Anonima Romana.
Rayner, K. (1986). Eye movements and the perceptual span in beginning and skilled
readers. Journal of Experimental Child Psychology, 41, 211-236.
246
Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and
scene perception. In J. M. Findlay & R. Walker (Eds.), Eye movement research: Mechanisms,
processes and applications. Studies in visual information processing, 6 (pp. 3-22). Amsterdam,
Netherlands: Elsevier Science Publishing Co, Inc.
Rayner, K. (1998). Eye movements in reading and information processing: 20 years of
research. Psychological Bulletin, 124, 372-422.
Rayner, K., & McConkie, G. W. (1976). What guides a reader's eye movements? Vision
Research, 16, 829-837.
Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, N.J.:
Prentice Hall.
Rayner, K., Reichle, E. D., & Pollatsek, A. (1998). Eye movement control in reading: An
overview and model. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp.
243-268). Oxford, England UK: Anonima Romana.
Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye
movement control in reading. Psychological Review, 105, 125-157.
Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:
Accounting for initial fixation locations and refixations within the E-Z Reader model. Vision
Research, 39, 4403-4411.
Reilly, R. (1993). A connectionist framework for modeling eye-movement control in
reading. In G. d'Ydewalle & J. Van Rensbergen (Eds.), Perception and cognition: Advances in
eye movement research. Studies in visual information processing (Vol. 4, pp. 193-212).
Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers.
247
Reilly, R. G., & O'Regan, J. K. (1998). Eye movement control during reading: A
simulation of some word-targeting strategies. Vision Research, 38, 303-317.
Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical
decision, and eye fixation times: Word frequency effects and individual differences. Memory &
Cognition, 26, 1270-1281.
Shillcock, R., Ellison, T. M., & Monaghan, P. (2000). Eye-fixation behavior, lexical
storage, and visual word recognition in a split processing model. Psychological Review, 107,
US: American Psychological Assn.
Stark, L. (1994). Sequences of fixations and saccades in reading. In J. Ygge & G.
Lennerstrand (Eds.), Eye Movements in Reading (pp. 135-161). Tarrytown, NY: Pergamon.
Stark, L., & Ellis, S. (1981). Scanpaths revisited: cognitive models direct active looking.
In R. A. Monty & J. W. Senders (Eds.), Eye movements, cognition and visual perception (pp.
193-226). Hillsdale, NJ: Erlbaum.
Suppes, P. (1990). Eye-movement models for arithmetic and reading performance. In E.
Kowler (Ed.), Eye movements and their role in visual and cognitive processes (Vol. 4, pp. 455-
477). Amsterdam: Elsevier.
Suppes, P. (1994). Stochastic models of reading. In J. Ygge & G. Lennerstrand (Eds.),
Eye movements in reading (pp. 349-364). Oxford, England: Pergamon Press.
Suppes, P., & et al. (1983). A procedural theory of eye movements in doing arithmetic.
Journal of Mathematical Psychology, 27, 341-369.
Taylor, S. E. (1965). Eye movements in reading: Facts and fallacies. American
Educational Research Journal, 2, 1965, 187-202.
248
Thibadeau, R. (1983). CAPS: A language for modeling highly skilled knowledge-
intensive behavior. Behavior Research Methods, Instruments, & Computers, 15, 300-304.
Thibadeau, R., Just, M. A., & Carpenter, P. A. (1982). A model of the time course and
content of human reading. Cognitive Science, 6, 101-155.
Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite
mixture distributions. Chichester ; New York: Wiley.
van Gisbergen, J. A. M., Gielen, S., Cox, H., Brujins, J., & Schaars, K. H. (1981).
Relation between metrics of saccades and stimulus trajectory in visual target tracking:
implications for models of the saccadic system. In A. F. Fuchs & W. Becker (Eds.), Progress in
oculomotor research. North Holland: Elsevier.
Vitu, F., & McConkie, G. W. (2000). Regressive saccades and word perception in adult
reading. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 301-326).
Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers.
Vitu, F., McConkie, G. W., & Zola, D. (1998). About regressive saccades in reading and
their relation to word identification. In G. Underwood (Ed.), Eye guidance in reading and scene
perception (pp. 101-124). Oxford, England UK: Anonima Romana.
Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal
of Association of Computing Machinery, 21, 168-173.
Walker, R., Kentridge, R. W., & Findlay, J. M. (1995). Independent contributions of the
orienting of attention, fixation offset and bilateral stimulation on human saccadic latencies.
Experimental Brain Research, 103, 294-310.
249
Wolfe, J. H. (1971). A Monte Carlo study of sampling distribution fo the likelihood ratio
for mixtures of multinormal distributions (Technical Bulletin STB 72-2). San Diego, CA: U.S.
Naval Personnel and Training Research Laboratory.
Yang, S.-N., & McConkie, G. W. (in press). Eye movements during reading: A theory of
saccade initiation times.
Zangemeister, W. H., Sherman, K., & Stark, L. (1995). Evidence for a global scanpath
strategy in viewing abstract compared with realistic images. Neuropsychologia, 33, 1009-1025.
250
CURRICULUM VITAE
Biographical Information
Name: Gang Feng
Date of Birth: March 16, 1968
Place of Birth: Beijing, China
Education
2001 Ph.D. University of Illinois at Urbana-Champaign Department of Psychology Major area: Developmental Psychology Minor area: Quantitative Psychology 1999 M.S. University of Illinois at Urbana-Champaign Department of Statistics 1998 M.A. University of Illinois at Urbana-Champaign Department of Psychology 1990 B. Edu. Beijing Normal University, Beijing, China Department of Psychology
Awards and Honors
1999-2000 Beckman Institute Graduate Fellow
1999 Cognitive Science/AI Summer Fellowship, UIUC
1990 Honor Graduate, Beijing Normal University
1986-1990 Government fellowships, Beijing Normal University
Research Experience
1999 - 2000 Beckman Institute Graduate Fellow, Beckman Institute, UIUC
251
Summer, 1999 CogSci/AI Steering Committee Summer Fellowship, UIUC
Summer, 1998 Data Analyst, Center for Reading Research, UIUC
1994 - 2000 Research Assistant, Beckman Institute, UIUC
1990 - 1994 Assistant Researcher, Institute of Psychology, Chinese Academy of
Sciences
Teaching Experience
1998-1999 Teaching Assistant, Child Psychology
1996-1997 Teaching Assistant, Research methods in developmental psychology
Publications
Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of
phonological and orthographic information in reading Chinese and English. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100.
Kelly, M., Miller, K., Fang, G., & Feng, G. (1999). When Days Are Numbered:
Calendar Structure and the Development of Calendar Processing in English and Chinese.
Journal of Experimental Child Psychology, 73, 289-314.
Feng, G. (1998). Homophone confusion in reading English and Chinese. Unpublished
master’s thesis, University of Illinois at Urbana-Champaign.
Fang, G., Fang, F., & Feng, G. (1995). A comparative study of elementary school
students’ mathematics achievement and motivations. Chinese University of Hong Kong
Elementary Education, 2, 51-56.
Fang, G., Feng, G., Fang, F., & Jiang, T. (1994). Preschoolers' estimation of time
duration and their cognitive strategies. Psychological Science (China), 17, 3-9.
252
Fang, G., Feng, G., Jiang, T., & Fang, F. (1993). Time duration estimated by preschoolers
and their strategies. Acta Psychologica Sinica, 25, 346-352.