Feng (2001) - Dissertation

SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYE-MOVEMENT

GANG FENG

B. Ed., Beijing Normal University, 1990 M.A., University of Illinois, 1998 M.S., University of Illinois, 1999

THESIS

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Psychology

in the Graduate College of the University of Illinois at Urbana-Champaign, 2001

Urbana, Illinois

SHARE: A STOCHASTIC, HIERARCHICAL ARCHITECTURE FOR READING EYE-MOVEMENT

Gang Feng, Ph. D.

Department of Psychology University of Illinois at Urbana-Champaign, 2001

Kevin F. Miller, Advisor

Advances in methods for capturing patterns of eye-movements in reading have not yet

been matched by corresponding methods for turning those data into a comprehensive quantitative

model that is able to account for patterns of reading eye movements.

The primary objective of the current research is to identify a set of mathematical tools

that are able to describe reading eye movements, which are complex time-series data that covary

with linguistic, perceptual, and other variables. A survey of existing quantitative models of

reading eye movements shows that many of the models are unable to account for distributions of

empirical eye-movement. Nonetheless, the variety of modeling approaches also point to

promising solutions to the problem.

Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic,

hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov

model helps to capture the temporal dependency between reading eye movements, and the

hierarchical structure concisely represents the logical relationships between covariate factors,

eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation

is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to

conduct probabilistic reasoning.

A simple model based on the SHARE architecture has been developed. Although it only

takes into account a limited number of covariates and only models dependency between adjacent

eye movements, it nevertheless is able to capture much of the dynamics of reading eye

movements. A simulation study shows that with its simple structure, the model is able to

reproduce the distributions of fixation durations and saccade length, as well as composite eye-

movement variables. Because each reader is modeled individually, analyses of model parameters

for readers of varying age and reading proficiency also shed light on the development of reading

skills.

The SHARE architecture is shown to be flexible enough to characterize both beginning

and fluent reading, which is particularly attractive for the study of reading development. Its

ability to capture eye-movement patterns also opens a wide range of possibilities for real-world

applications of the eye-movement technology.

ABSTRACT

Advances in methods for capturing patterns of eye-movements in reading have not yet

been matched by corresponding methods for turning those data into a comprehensive quantitative

model that is able to account for patterns of reading eye movements.

The primary objective of the current research is to identify a set of mathematical tools

that are able to describe reading eye movements, which are complex time-series data that covary

with linguistic, perceptual, and other variables. A survey of existing quantitative models of

reading eye movements shows that many of the existing models are unable to account for

distributions of empirical eye-movement. Nonetheless, the variety of modeling approaches also

point to promising solutions to the problem.

Based on analysis of modeling constraints, and inspired by previous efforts, a stochastic,

hierarchical architecture for reading eye-movement, SHARE, is proposed. An advanced Markov

model helps to capture the temporal dependency between reading eye movements, and the

hierarchical structure concisely represents the logical relationships between covariate factors,

eye-movement decisions, and observable eye-movement behaviors. Model parameter estimation

is based on Bayesian theory, which provides a natural way to incorporate prior knowledge and to

conduct probabilistic reasoning.

A simple model based on the SHARE architecture has been developed. Although it only

takes into account a limited number of covariates and only models dependency between adjacent

eye movements, it nevertheless is able to capture much of the dynamics of reading eye

movements. A simulation study shows that with its simple structure, the model is able to

reproduce the distributions of fixation durations and saccade length and to predict eye-movement

variables with reasonable accuracy. Because each reader is modeled individually, analyses of

model parameters for readers of varying age and reading proficiency also shed light on the

development of reading skills.

A distinctive strength of the SHARE architecture is that it makes minimal assumptions

about psychological mechanisms but concentrates on mathematical descriptions of eye-

movement patterns. To the extent that it separates objective descriptions from hypothetical

mechanisms, it presents a way to implement and test a variety of theories of reading eye

movement in a common platform. The SHARE architecture is shown to be flexible enough to

characterize both beginning and fluent reading, which is particularly attractive for the study of

reading development. Its ability to capture eye-movement patterns also opens a wide range of

possibilities for real-world applications of the eye-movement technology.

DEDICATION

To My Family

ACKNOWLEDGEMENTS

I would like to recognize those people who have helped me meet this part of the Ph. D.

requirement. I thank the members of my dissertation review committee, Richard C. Anderson,

Cynthia Fisher, George W. McConkie, Kevin F. Miller, and Douglas Simpson for the distinct

expertise that each person brought to the project.

I am greatly in debt to my academic and dissertation advisor, Kevin Miller, who has

given me generous support, intellectually, financially, and emotionally, for the past seven years. I

cannot think of any other labs where I could enjoy the total freedom to pursue my intellectual

interests, the thoughtful and timely guidance, and the extraordinary research facility that Kevin

offered me. His influence on me, both professionally and personally, will be felt in the years to

George McConkie showed me the way to eye-movement research. But more importantly,

he provided me with an example of an extraordinary scholar, an enthusiastic advisor, and simply

a good person. He has never refused a single request for help, no matter how big or small it was.

I cherish every opportunity to work with him, and am grateful for all the help he gave me over

the years.

The other members of my committee also made major contributions to my understanding

of reading, language, and statistical issues in modeling, as well as helping me to clarify my

thinking. Cynthia Fisher introduced me to many new concepts in linguistics and language

acquisition, and has read and given thoughtful comments on many papers over the years. Doug

Simpson made many incisive and constructive suggestions about the statistical aspects of this

project; his patience and encouragement had a major impact on this project. Richard Anderson

has been consistently supportive throughout my career at UIUC, and even made the supreme

sacrifice of returning from his summer home in Wisconsin to a hot and humid Champaign-

Urbana for my final orals meeting. Of course, none of my committee members can be held

responsible for the errors that remain in this project.

The greatest support throughout my graduate program comes from my family. My

parents, Sunqi Feng and Mei Chen, are always confident in me and forever encouraging. No

word can express my thanks to my wife, Xiuhong Cao, and daughter, Jessie. During the dull

moments of thesis writing, those joyful albeit brief after-dinner family moments were the only

source of power that recharged me after the many hours of daily work and carried me through

the long journey.

TABLE OF CONTENTS

TABLE OF CONTENTS............................................................................................................. viii

LIST OF TABLES......................................................................................................................... xi

LIST OF FIGURES ...................................................................................................................... xii

CHAPTER 1. INTRODUCTION .................................................................................................. 1

Describing a Single Reading Eye Movement ..................................................................... 2

Composite Eye-movement Variables: Measuring Local Dynamics................................... 3

Eye Movements as Stochastic Processes ............................................................................ 7

From Measurement to Modeling ...................................................................................... 10

CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS .................................................... 11

“Direct Control” Model and the READER Simulation .................................................... 11

“Attentional Shift” Theory and Reilly’s Connectionist Model......................................... 15

“E-Z Reader” Models ....................................................................................................... 18

“Strategy-tactics” Theory and the Reilly and O’Regan Simulations................................ 26

Mr. Chips: The Ideal Observer ......................................................................................... 33

Stochastic Models by Stark and Suppes .......................................................................... 36

Normal Eye Movements: McConkie and colleagues' mathematical modeling ................ 40

CHAPTER 3. DESIGN PRINCIPLES ........................................................................................ 48

Theory-driven vs. Data-driven Modeling ......................................................................... 48

Deterministic vs. Probabilistic Modeling ......................................................................... 50

The WHEN and WHERE Decisions................................................................................. 51

Linguistic vs. Low-level Variables ................................................................................... 52

Time-series vs. Independent Data..................................................................................... 53

Discrete vs. Continuous Control ....................................................................................... 54

Group vs. Individual Models ............................................................................................ 58

Descriptive vs. Predictive Applications............................................................................ 59

Choosing the Mathematical Tools .................................................................................... 60

CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING...................... 65

Modeling Environment ..................................................................................................... 65

Modeling Data .................................................................................................................. 65

Structure of the SHARE Model ........................................................................................ 66

Temporal Dynamics.......................................................................................................... 73

Model Fitting and Parameter Learning ............................................................................. 74

Model Adequacy and Comparison.................................................................................... 78

CHAPTER 5. SIMULATION RESULTS ................................................................................... 80

Simulation Method............................................................................................................ 80

Distributions of fixation durations .................................................................................... 83

Distributions of Saccade Length....................................................................................... 84

SHARE in Conventional Eye-movement Measures ......................................................... 85

Summary ........................................................................................................................... 87

CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS ........... 89

Previous Research on the Development of Reading Eye Movements.............................. 89

Developmental Analyses Using SHARE.......................................................................... 90

Development of Reading Eye-movement Control............................................................ 91

Effects of Input Variables on Eye-movement Control ..................................................... 95

Discussion......................................................................................................................... 98

CHAPTER 7. DISCUSSION..................................................................................................... 100

What is SHARE? ............................................................................................................ 100

What SHARE is Not ....................................................................................................... 102

Composite Variables Revisited: Implications to Psycholinguistic Research ................. 103

Applications in Reading Education ................................................................................ 105

TABLES ..................................................................................................................................... 107

FIGURES.................................................................................................................................... 109

APPENDIX A. PROBLEMS IN THE E-Z READER MODEL ................................................ 223

The Goodness-of-fit Index.............................................................................................. 223

Correlations, Multicollinearity, and Parsimonious Modeling......................................... 227

APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION

DISTRIBUTIONS ...................................................................................................................... 230

Introduction..................................................................................................................... 230

Method ............................................................................................................................ 231

Results............................................................................................................................. 232

Discussion....................................................................................................................... 236

REFERENCES ........................................................................................................................... 238

CURRICULUM VITAE............................................................................................................. 250

LIST OF TABLES

Table 1. Developmental Characteristics of Reading Eye Movements ....................................... 107

Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting .......................... 108

LIST OF FIGURES

Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control ................. 109

Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5. ....................................... 110

Figure 3. Order-of-processing diagram for E-Z Reader 5 .......................................................... 111

Figure 4. Illustration of components of the Mr. Chips model .................................................... 112

Figures 5A and 5B. Landing Position of Fixations During Reading.......................................... 113

Figure 6. Frequency of skipping four- and eight-letter words .................................................... 114

Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site......... 115

Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model .................. 116

Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press) ...................... 117

Figure 10. Graphical representation of the SHARE model ........................................................ 118

Figures 11-1 through 76. Simulating Fixation Duration and Saccade Length Distributions...... 119

Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency .................... 195

Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency.................. 196

Figure 14. Simulated and Empirical Gaze Duration by Word Frequency.................................. 197

Figure 15. Simulated and Empirical Skipping Probability by Word Frequency ........................ 198

Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency

............................................................................................................................................. 199

Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word ............... 200

Figure 18. Developmental Changes in Saccade Targeting Probabilities.................................... 201

Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short,

Medium, and Long Fixations.............................................................................................. 202

Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and

Long Fixation Durations ..................................................................................................... 203

Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium,

and Long Fixation Durations .............................................................................................. 204

Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next

Word, Fixation Landing Position, and the Previous Saccade Move................................... 205

Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the

Next Word, Fixation Landing Position, and the Previous Saccade Move.......................... 206

Figure 24. BNT Mixture of Gaussian Model Diagram............................................................... 207

Figure 25-1 through 15. Fitting 3rd-grade, 5th-grade, and Adult Fixation Duration with n-

Component Lognormal Mixture Models ............................................................................ 208

We are all working toward daylight in the

matter, and many of the discrepancies of facts

and theories are more apparent than real.

(E. B. Huey, 1908, p. 102)

CHAPTER 1. INTRODUCTION

The fact that the eyes travel through a line of text with a series of stops and jumps was

first documented over a century ago (Javel, 1878, cited in Huey, 1908). From the very beginning,

eye movements held great promise for revealing the mental processes involved in silent reading:

These movements [of the eyes during reading] are not only subject to the influence of the

direction of thought as words and phrases are read and assimilated, but they are also

directly concerned in the sensory processes of perception. ... This two-fold relation of

these movements with the control activities on the one hand, and on the other hand as the

necessary accessory to a peripheral organ of sensation gives them an intermediary

position between sensation and recognition and between thought and motor expressions

which is of particular interest for the cues or indices which study of them may give of

some of the workings of the mind. (Dearborn, 1906, quoted in Gray, 1922, p. 173-174)

However, the road from eye movements to the understanding of mental processes has not

been an easy one. What we can learn from reading eye movements depends on our ability to

quantitatively describe them. More than 80 years after Dearborn, O'Regan (1990) outlined the

basic logic for inferring the workings of the mind from eye movements:

The first step in making use of eye movements as a clue to cognitive and perceptual

processes is to proceed backwards: manipulate processing in a known way, and try to

understand the accompanying changes in eye movements. Later, when it is known how

eye movements react to processing changes, one can use eye movements to understand

the cognitive and perceptual processing that occurs in particular cases. (O'Regan, 1990,

p. 400)

In other words, the ability to describe eye-movement patterns, particularly how they

change in response to other factors, precedes and limits our ability to understand the

psychological processes of interest.

The central concern of this research is how to quantitatively describe reading eye

movements. The first two chapters briefly summarize some of the previous approaches and

problems associated with them. A stochastic, hierarchical architecture for reading eye

movements (SHARE) is developed and a simple model is implemented using this architecture.

Model fitting and simulation results are also presented.

Describing a Single Reading Eye Movement

Reading eye movements are generally described as an alternating sequence of fixations

(stops) and saccades (jumps). At this level of abstraction, the eye is assumed to be stationary

during a fixation and to make a fast, ballistic movement during a saccade. Oculomotor details

below this level of abstraction are rarely discussed. Two measures of eye movements – fixation

duration and saccade length – are most widely used in the reading literature (Inhoff & Radach,

1998).

The use of these two measures, however, is not without controversies. First of all, the

boundary between saccade and fixation is blurred. The transition between a saccade and a

fixation is gradual, and micro-saccades, tremors, and drifts happen during a fixation (Carpenter,

1988; Inhoff & Radach, 1998). Thus, in practice the numerical values of fixation duration and

saccade length depend on many factors, such as the temporal and spatial resolution of the eye-

tracking device and the algorithm that detects fixations and saccades (McConkie, 1981).

Secondly, even at the above level of abstraction, there may be a need for additional

measures. For example, Irwin (1998) showed that linguistic processing is not stopped during a

saccade, and therefore its time should be included when measuring processing time.

Last but not least, eye-movement measures, particularly fixation duration, are often

subject to censoring. It is a common practice to discard fixations, for example, shorter than 100

msec or longer than some threshold. The theoretical motivation for censoring seems to be the

belief that these fixations are not produced by cognitive processes and are thus uninteresting or

unrepresentative (see Inhoff & Radach, 1998, for a discussion). Because extreme scores can

greatly affect means and standard deviations, censoring also has the effect of making these

measures more representative of the data as well as “improving” the significance of statistical

analyses. This is particularly a concern with models that try to fit group data such as averages

rather than individual observations.

In the current study, we focus on the two traditional eye movement measures – fixation

duration and saccade length. No censoring of data is used in the current study, and individual

fixations and saccades are used as the unit of analysis.

Composite Eye-movement Variables: Measuring Local Dynamics

Beyond measuring individual eye movements, reading researchers face the challenge of

quantifying a series of eye movements. Psycholinguists are particularly interested in how eye-

movement patterns change in response to experimental manipulations. This requires a way to

summarize the dynamics of processing over multiple eye movements.

This turns out to be a difficult undertaking. Reading eye movements are intrinsically

dynamic. They occur in order, and the characteristics of one fixation depend in part on those of

the previous ones (e.g., Henderson & Ferreira, 1993; McConkie, Kerr, Reddix, Zola, & et al.,

1989). Eye movements also respond in real time to the content under the current fixation, or even

in the periphery (see Rayner, 1998, for a review). Finally, reading eye movements are extremely

variable. In fact, Huey (1908) commented “…the variation [of fixation duration] is so very great

that any average is misleading, and the pauses may really be of almost any length” (p. 33).

The use of composite eye-movement variables is an attempt to summarize eye-movement

dynamics over a short period of time. A composite variable, such as gaze duration or skipping

rate1, is essentially a sample statistic computed from a set of eye movements that satisfy certain

criteria (for example, all fixations that landed on a particular word or word group). This

effectively turns an eye-movement pattern into a single number, which then can be used in

statistical analyses. For example, in a hypothetical psycholinguistic study, a researcher interested

in how word frequency affects reading processes might manipulate the frequencies of some

designated words in the reading materials, and calculate readers’ gaze duration on the

experimental words. These data are then fed to an ANOVA to determine whether readers’ eye

movements were affected by the frequency manipulation.

1 Gaze duration is typically defined as the sum of the duration of all fixations on a word (or a predefined region)

provided that the eye has not left the word (region). Skipping rate is the probability of a word (or region) not being

fixated. A finer distinction may be made between cases where the word was later regressed to and those where the

word was never fixated during the entire reading.

This familiar scenario illustrates several problems with the use of composite eye-

movement variables. First, no single statistic can completely summarize the eye-movement

dynamics on a word. Therefore, multiple composite variables have to be computed with the hope

that collectively they will give a full description of the eye movement pattern. In a recent review,

Inhoff and Radach (1998) enumerated at least seven time-related composite variables: single

fixation duration, the duration of the first and second of two target fixations, first fixation

duration, gaze duration, mean fixation duration, total time, and total repair time. Each of them is

a different way of selecting from and summing over the set of fixations on a word (due to space

limitations their definitions are not listed here). New measures have been introduced to select

and sum fixations over time in order to capture additional eye movement patterns (e.g.,

Liversedge, Paterson, & Pickering, 1998). In addition to reading time measures, a variety of

variables have been used to describe saccade patterns, including the probability of skipping,

refixating, or regressing to and from a word (or a region) and the length of saccade going in and

out of a region, among others.

Having too many options may not be advantageous. In practice, it is impossible to search

through all of the composite variables for an effect. Researchers have to rely on rules of thumb to

select a small set of “reasonable” variables, and hope they will capture the desired effects.

Second, the correlations between these measures make it difficult to interpret findings.

One rationale for the multitude of composite variables is that each is sensitive to different aspects

of reading (e.g., Liversedge et al., 1998) or taps into different processing stages (Murray, 2000;

Rayner, 1998). In reality, however, very few of these variables are independent of each other,

and some pairs are often highly correlated. This is not surprising given that the various time-

related measures are just different and overlapping ways of selecting from the same pool of

fixation and saccades. As a result, it is difficult to establish a direct link between a composite

variable and a reading process. Similarly, because of the composite nature of the variables2, one

cannot be certain that an effect found in one variable is not caused by others. When an effect in

gaze duration is found, for example, it is impossible to conclude whether the difference is caused

by prolonged individual fixation duration or elevated refixation probability, both of which are

part of the definition of gaze duration. The complex relations between these variables create

obstacles in attributing and interpreting empirical discoveries (see Inhoff & Radach, 1998).

Moreover, the composite variables give the appearance of measures of independent

events, which may mislead researchers. It is easy to forget that the fixation duration is not only

determined by the characteristics of the currently fixated word, but also affected by that of the

neighboring words, for instance, through parafoveal previewing (e.g., Henderson & Ferreira,

1993). The probabilities of refixating and skipping a word are also strongly related to the

location of the previous fixation. Such information is lost when the composite variables are

calculated and entered in statistical procedures such as ANOVA, which are designed for testing

independent samples. When these temporal correlates are excluded from data analysis, one runs

the risk of overestimating the effects of factors related to the foveal words and overlooking

potentially important temporal effects.

2 Strictly speaking, the same problem also applies to measures such as first fixation duration and single fixation

duration. Although they do not involve summation over multiple eye movements, they are in fact contingent on the

fact that the word is being fixated (i.e., first fixation duration is defined as missing for skipped words), and thus are

not statistically independent from other variables. They have to be interpreted in relation with, e.g., skipping rate.

Finally, one’s choice of composite variables is often tied to a favored theory of eye

movement control. For instance, researchers who believe that lexical processes drive reading eye

movements tend to focus exclusively on measures related to fixation-duration, whereas

proponents of oculomotor or perceptually-oriented theories pay more attention to saccade

patterns. Some researchers believe that measurement and theory should be tightly bound. For

example, Rayner (1995) complained that many psycholinguistic researchers “probably don't have

a model of eye-movement control in mind. In fact, they probably feel that it's not necessary to

specify a model dealing with where the eye lands. All they care about is that gaze durations are

variable as a function of various linguistic variables” (p.12). Philosophically it may be

impossible to separate measurement from theory. But this does not mean that one has to

subscribe to a particular theory in order to describe eye movements.

Underlying the problem of composite variables is the mismatch between the dynamic,

stochastic nature of reading eye movements and the mathematical tools chosen to represent eye-

movement patterns. As illustrated above, a small set of simple statistics cannot sufficiently

summarize a series of eye movements. And the problem is exacerbated by simply adding

additional composite variables, which causes confusions at both the conceptual and empirical

levels.

Eye Movements as Stochastic Processes

The solution is to describe reading eye movements as stochastic processes rather than

independent events. Reading eye movements may be conceptualized as a series of events

(fixations), each of which may be measured by two continuous variables – fixation duration and

saccade length (of the saccade that follows the fixation). Eye movements are stochastic because

the values of fixation duration and saccade length at fixation t are probabilistically determined by

those of the previous fixation at t-1, or even t-2, … etc.

To further simplify the problem, we can code saccades in terms of a finite number of

moves, corresponding to the number of words each saccade covers (e.g., +2 for moving forward

2 words, -1 for moving backward 1 word, etc.). We assume that at the time of planning a

saccade, each move has certain probability of being chosen. Reading saccades are now described

as discrete events (different moves) happening at discrete times (when saccades are made, which

is assumed to be instantaneous under the current level of abstraction).

Such a stochastic system may be well modeled by a classical Markov model. In a simple

first-order Markov model, a system x is assumed to have a finite number of states, Xi, i=1..k (in

our case, there are k possible moves). The system may change from one state to another at a

designated time (making different kinds of saccades), and the probability of being in state Xi at

time t depends only on the previous state but not any earlier state history. Mathematically, the

probability of making move Xi at fixation t is

P(xt=Xi| xt-1, xt-2, xt-3, … x1)= P(xt=Xi| xt-1)

In other words, the probability of the current state is independent of events prior to the last state.

The above model is referred to as the first-order Markov model because the conditional

dependency extends one step back. In a zero-order Markov model, also known as the random-

walk model, the current state is completely independent of any previous history. The system

effectively describes a sample of independent events. It is also possible to derive higher-order

Markov models, in which the current state depends on the previous n states, but computational

cost becomes prohibitive as n increases. The first-order Markov model often offers a good

approximation of short-term temporal relations in data.

How would the Markov model help in conceptualizing and describing reading eye

movements (in this case, saccades)? Assuming that a first-order Markov model as described

above is applicable, all the dynamics of eye movements are summarized in the model’s transition

probability matrix. Any saccade-related composite variable, such as skipping rate or average

saccade length (in words), can be mathematically derived from the matrix. In fact, they can be

computed from the marginal probabilities of the transition matrix. With the transition matrix one

may answer much more detailed questions about saccade programming, such as “if the current

saccade is a refixation versus a regression, is it more likely to skip a word in the next saccade?”

Markov models have been used to summarize eye movement patterns in picture viewing and

scene perception (e.g., Stark & Ellis, 1981).

Describing reading eye movements, however, is a different matter. There are at least two

major obstacles to using a simple Markov model for reading. First, the Markov model described

above can only deal with discrete events, but fixation duration and saccade length are continuous

measures. Fixation duration is highly informative in reading research, perhaps more so than in

picture perception studies. Although one may be able to code saccade length as discrete values

(number of words), fixation duration cannot be treated in the same way. How to model

continuous data is a problem to be solved.

In addition, the classical Markov framework assumes a constant transition matrix, i.e., the

transition probabilities remain unchanged. This is unrealistic for reading, because it excludes the

possibility of linguistic or other factors affecting eye-movement programming. One possible

extension of the model is to allow relevant factors to change the transition probabilities. In other

words, the transition probabilities are probabilistically dependent on the values of linguistic and

oculomotor variables. The current research is an exploration in this direction.

From Measurement to Modeling

This chapter started by identifying the problem of describing eye-movement patterns as a

critical link for reading eye-movement research. It then pointed out problems associated with the

use of composite variables and argued that reading eye movements should be treated as time-

series data. Nonetheless, it not only does not offer a simple solution for describing eye

movements, but also calls for more sophisticated mathematical models.

The conclusion may come as a surprise, but it sheds light on the nature of the problem.

Describing eye-movement patterns is not a measurement problem. It is squarely in the domain of

mathematical modeling because it deals with numbers – measures of eye movements, not eye

movements themselves. Reading eye movements are complex, so their description requires more

than basic mathematical tools.

In searching for the right tools to model reading eye movements, it is critical to

understand their mathematical properties. There have been a number of quantitative models of

reading eye movements. Although they come from different perspectives, each summarized

constraints and regularities of reading eye movements. They present a natural starting point for

the current exploration.

CHAPTER 2. A SURVEY OF QUANTITATIVE MODELS

Much of the history of reading eye-movement research can be characterized by debates

over eye-movement control mechanisms (see Rayner, 1998, for a brief review of different

theories). Until recently, reading eye-movement theories were largely verbal descriptions of

hypothetical mechanisms with some supportive evidence. Testing these theories was difficult, if

not impossible, because they were often too vague and flexible to be disconfirmed by empirical

evidence. The past decade has seen a spurt of quantitative models that specify theories in the

language of mathematics or computer algorithms.

The current chapter reviews previous attempts at quantitative modeling of reading eye

movements, with emphasis on their modeling approaches, including mathematical models,

assumptions about eye movements and model fitting. The goal is to discover facts about reading

eye movements, successful modeling approaches, and reasons for failure. The findings of this

survey suggest principles for the design of the current model.

The survey is not intended to be a review of eye movement theories, although a brief

introduction to the theoretical background of a model is given when necessary. Comments after

each review are only relevant to the current research and are not meant to be comprehensive

discussions.

“Direct Control” Model and the READER Simulation

Just and Carpenter (1980) proposed two assumptions that linked eye movements and

cognitive processes. The immediacy assumption states that a reader tries to interpret each content

words of a text as it is encountered, making guesses if uncertain. The eye-mind assumption

asserts that the eye remains fixated on a word as long as the word is being processed. Together,

these two assumptions formed the basis for a reading model in which eye movements, measured

by gaze duration, are controlled entirely by cognitive processes. They supported the two

assumptions with regression analyses of reading eye movements, which showed that gaze

duration could be predicted from linguistic variables.

READER was a computer implementation of their theory of reading and eye movement

control (Thibadeau, Just, & Carpenter, 1982). It was designed to be “a natural language

understanding system that reads the text word by word, and whose processing time on each word

corresponds to the human gaze duration on that word” (Thibadeau et al., 1982, p.158). With

respect to eye-movement control, the only eye-movement variable it attempted to model was

gaze duration, which, according to Just and Carpenter, was equal to the mental processing time

on words.

Model structure. READER was implemented as a LISP program. To give the flavor of

the system, a partial representation of the word "are" in "Flywheels are …" would be in the

following form:

… (WORD2: HAS FEATURE1) (FEATURE1: IS 'A') (WORD2: HAS FEATURE2) (FEATURE2: IS 'R') … (WORD2: IS 'ARE') (WORD2: HAS SUBJECT2) (SUBJECT2: IS WORD1)

As a complete comprehension system, READER included a variety of components,

ranging from a lexicon to a schema-based knowledge representation. Reading started with

encoding letters one by one, until the word was found in the lexicon. The ultimate goal was to

produce a summary of the passage it “read.” At any moment lexical, syntactic, semantic, and

discourse-level analyses were being carried out concurrently and interactively.

READER’s gaze duration was measured by a linear transformation of the machine cycles

the model spent on processing a word. Just and Carpenter (1980) were explicit about when the

eyes should move: “When the perceptual and semantic stages have done all of the requisite

processing on a particular word, the eye is directed to land in a new place where it continues to

rest until the requisite processing is done” (p. 336). The “requisite processing” could be any

(combination) of the reading processes, for example, lexical access or text integration. What is

considered “required” depends on the goal of reading.

READER assumed a word-by-word reading strategy, targeting the next word in line after

finishing processing the current word. The model, however, did allow word skipping when the

comprehension processes were able to “predict” the next word – when the lexical activation of

the next word was elevated beyond a threshold by other reading processes. The skipped words

turned out to be short function words such as “of” and no content word was ever skipped in the

model.

Parameter estimation. The empirical data for modeling were gaze duration results

obtained from a study in which undergraduate students were asked to read some short scientific

passages, including the “flywheel” passage that READER read. Gaze durations on each word

were first averaged across participants, and then entered as the dependent variable in multiple

regression analyses in order to determine the contributions of various textual factors, such as

word length and syntactic role.

Although primarily a symbolic processing system, READER had quite a few activation

weights, memory decay rates, and thresholds in the system that required parameterization. The

authors did not mention how values were assigned, nor did they perform any systematic

optimization of the parameters.

Model fitting. READER’s “reading” performance was evaluated in several ways.

READER did a fair job as a comprehension system because it was able to “recall” a reasonable

amount of information after reading the passage. Thibadeau, Just, and Carpenter (1982) also

compared the effects of various linguistic factors on human and model performances, and

concluded that the effects were qualitatively, and sometimes quantitatively, similar. However,

they did not perform formal statistical tests to support their conclusions. In fact, Carpenter (1984)

argued against overall statistical goodness-of-fit tests and preferred examining mismatches

between the model and data. The only quantitative index of model fit was the correlation

coefficient between human and READER’s gaze duration over the 140 words, which was

approximately r=0.80.

Comments. READER might be a successful model of reading comprehension, but it is

quite limited as an eye-movement control model. The most obvious problem is that it accounted

for only gaze duration and left no explanations for any other eye-movement phenomena. Equally

problematic is the fact that the READER simulation was based on a single 140-word passage.

The model was never extended to “read” other stories, and there was no evidence that it could be

easily generalized to other reading materials.

Methodologically, Kliegl, Olson, and Davidson (1982) pointed out that, because the

independent variables (linguistic factors) were correlated in their regression analyses, the

regression coefficients might not reflect the effects of the factors in the presence of other factors.

The validity of the model is consequently undermined because the READER model was tuned to

reflect the effects as shown in the regression coefficients.

“Attentional Shift” Theory and Reilly’s Connectionist Model

In contrast to Just and Carpenter's ambitious project, Morrison's (1984) model was

designed to explain basic eye-movement patterns with minimal assumptions. Morrison suggested

that eye movements were driven by word recognition. It was assumed that during a fixation,

attention would focus on the foveally fixated word until it was recognized. At this moment a

signal was sent to the oculomotor system to start programming a saccade to the next word, while

in the meantime attention shifted to work on the next word based on peripheral visual

information. If the peripheral word was recognized quickly, before the oculomotor system would

finish programming the saccade, this saccade command was cancelled and the oculomotor

system was instructed to program a new saccade to the word after it. Even if the peripheral word

was not completely recognized by the end of the current fixation, the partial processing would

still improve word recognition in the next fixation.

Various modifications to the model have since been proposed (Henderson & Ferreira,

1993; Kennison & Clifton, 1995; Rayner & Pollatsek, 1989; Reilly, 1993). The most recent

version of the Morrison model is the E-Z Reader models (Reichle, Pollatsek, Fisher, & Rayner,

1998; Reichle, Rayner, & Pollatsek, 1999), discussed in the next section.

Reilly (1993) aimed to build a common platform, based on a connectionist framework,

for testing different reading eye-movement control models. He chose a connectionist modeling

approach because of its “ability to model the blending and merging of constraints in lexical

encoding and in the production of saccadic shifts” (p. 210). The Morrison model, termed the

“Attentional Shift Model (ASM),” is the only model implemented in the paper.

Model architecture. Reilly’s connectionist model was composed of three main

components: (a) a visual input module, (b) a lexical module, and (c) a saccade programming

module (see Figure 1).

The visual input module mimicked some interesting details of the human retina. It

consisted of a matrix of 26x20 units, representing a horizontal visual field of 20 English letters.

When the model “fixated” on a word, letters within the visual field would activate the

corresponding units. The farther away a letter was from the center of the fovea3, the lower its

overall activation level. In addition, the model implemented two blurring mechanisms – spatial

blurring and category blurring – to simulate decreased acuity for eccentric letters. Reilly's model

provides a fairly intuitive and physiologically plausible account for visual input during reading.

Visual attention was modeled as an inverted “spotlight” on the visual field, which

functioned as a filter that severely suppressed the activation of unattended regions4. Attention

could be shifted by moving the ‘spotlight,” which in turn would modify the visual input and

trigger saccade programming.

The lexical module was a fully connected feed-forward network, which took input from

the visual input module. The network represented 222 word types in the training corpus. During

simulations, a word was considered “identified” if the output activation level became stable.

3 The center of the fovea was the 8th letter position from the left, not the geometric center of the visual field. This

simulated the asymmetric perceptual span (McConkie & Rayner, 1975).

4 Reilly (1993) was unclear about the size of the spotlight, but suggested that it has to be small enough to provide a

relatively noise-free target for saccade programming. He was also vague on how the movement of the spotlight was

guided. Presumably it always jumped to the center of the next word in the periphery.

The saccadic control module was a feed-forward network that also took input from the

visual module, and activation levels for each letter position were averaged to simulate low-level

visual information. The two output units represented saccade directions (left and right); their

activation values corresponded to the distance of the saccade, which was used to update visual

input after each saccade was carried out.

Following Morrison (1984) and Henderson and Ferreira (1993), the saccadic control

module was activated either when there was an attention shift or when the fixation “timed out.”5

An attention shift was only triggered when the current word was identified. This lexical access

time, in turn, depended on the frequency of the word in the training corpus. Thus, the decision of

when to move the eyes was primarily lexically based but was affected by the eccentricity of the

word relative to the fovea.

Model training and testing. The connectionist model had approximately 65,000

modifiable weights, and the values of these parameters were set through back-propagation

training. The lexical and saccadic modules were trained independently.

The lexical module was trained using a corpus of three short stories consisting of 222

word types and 863 word tokens. During training, the lexical module learned to identify words at

random “retinal” positions (i.e., the word and the attention “spotlight” were randomly placed).

The training stopped when the lexical network was able to identify 98.7% of the fixated words.

The saccade control module was trained to move to the location of the attention

“spotlight.” Special care was taken in Reilly (1993) so that the proportions of progressions,

5 Henderson and Ferreira (1993) suggested that if during a fixation lexical access was not completed after a

regressions, and refixations in the training samples closely matched those found in normal adult

reading. The saccade module was trained to reach an 80% accuracy level so as to mimic the less-

than-perfect performance of the human saccadic mechanism.

Reilly (1993) presented some example output from the simulation study, demonstrating

that the model was able to reproduce a range of empirical eye-movement phenomena, including

skipping, refixations, the word frequency effect, and the penalty of eccentricity viewing. Reilly

(1993) acknowledged that the model was preliminary, and needed fine-tuning to ensure a

quantitative fit to empirical processing time and saccade length measures, particularly their

distributional properties. Therefore, no formal goodness-of-fit testing was performed.

Comments. Reilly’s (1993) neural network implementation of the Morrison (1984) model

is unique among the models reviewed here. The model’s connectionist framework and less-than-

perfect training criteria imply that eye-movement control is probabilistic. In addition,

consecutive eye movements are not independent because parafoveal processing would change

the activations in the lexical unit and thus facilitate or hinder word recognition during the next

fixation. In short, Reilly’s model strongly suggests a stochastic control mechanism of reading

eye movements.

“E-Z Reader” Models

"E-Z Reader" (Rayner, Reichle, & Pollatsek, 1998; Reichle et al., 1998; Reichle et al.,

1999), a series of six computer simulation models, is the latest incarnation of Morrison’s theory.

One of the problems with the original Morrison model is that it predicted that the time to process

deadline, the fixation would be terminated automatically.

a parafoveal word, which was the time to execute the current saccade, is independent of the

characteristics of the word under the current fixation. Experimental evidence suggests that

parafoveal processing benefits diminish when the word under fixation is difficult to process

(Henderson & Ferreira, 1993).

To solve this problem, Reichle et al. (1998) proposed that the signal to shift attention and

the signal to program a saccade should be decoupled. Saccade programming was moved to an

earlier point, allowing variable time for parafoveal preview of the next word(s). This is arguably

the most significant change from Morrison's original model. Other improvements included

incorporating contextual predictability to capture effects of higher processes, adding a default

refixation strategy in the oculomotor system, implementing penalties for processing non-

centrally fixated words, and the incorporation of landing position effects (see McConkie, Kerr,

Reddix, & Zola, 1988). The E-Z Reader model is probably the most ambitious modeling

endeavor among all models, therefore it deserves more detailed scrutiny.

One of the most impressive features of the E-Z Reader modeling effort is the way in

which the models have evolved over time. E-Z Reader models were initially built on simplistic

assumptions, and became progressively more complex as more assumptions were added to make

them more psychologically plausible. The “E-Z Reader 1” model included the basic structure of

the models, but did not utilize contextual predictability information and did not have the ability

to simulate within-word refixations. Contextual predictability was incorporated into the “E-Z

Reader 2” model. “E-Z Reader 3” added a mechanism for intra-word refixations. Penalties for

eccentric viewing positions were implemented in “E-Z Reader 4 and 5.” “E-Z Reader 66”

(Reichle et al., 1999) is a recent attempt to improve Model 5 by adding the capability to model

the effect of within word landing positions (McConkie et al., 1988). Our discussion focuses on

the E-Z Reader 5 and 6 models as they were considered the state-of-the-art models by the

authors.

Model architecture of E-Z Reader 5. E-Z Reader 5 was composed of a lexical module and

an oculomotor module. In order to decouple the signal for attention shift from that for saccade

programming, lexical access was divided into two sequential processes. The first was the

familiarity check (fc), which corresponded to “a rapid feeling of familiarity” or “matching on the

basis of global similarity” (Reichle et al., 1998) to all entries in the mental lexicon. It was

followed by a process called completion of lexical access (lc), which actually finished word

identification. The signal to start programming the next saccade was triggered at the end of the fc

stage, before the fixated word was completely identified. Attention shift, on the other hand, was

triggered only after the lc stage, when lexical processing is finished.

The oculomotor module also included two sequential processes – (a) an early, labile stage

(m) of saccade programming that could be cancelled by subsequent saccadic programming, and

(b) a later, nonlabile stage (M) in which saccades could no longer be cancelled. The original

Morrison model did not have a mechanism for refixations. To explain refixations, Reichle et al

6 Reichle, Rayner, and Pollatsek (1999) had refused to call it “E-Z Reader 6” because they considered it an

incremental improvement over the E-Z Reader 5 rather than a qualitatively different one. However the name “E-Z

Reader 6” appeared in data tables. It is referred to as ‘E-Z Reader 6” in this paper, because the addition of landing

position modeling significantly changed the basic architecture of E-Z Reader 5.

(1998) hypothesized a default refixation mechanism that was essentially the same as that of

Reilly and O’Regan (1998, 1998): the oculomotor system was assumed to plan refixation at the

beginning of each fixation, which was subject to cancellation by a progressive saccade triggered

by lexical processing.

As in all Morrison family models, reading phenomena in the E-Z Reader model result

from variations in the mixture of different processes that take different amounts of time to

complete different processes. With respect to the lexical processes, it assumed that the

processing times for both fc and lc were linear functions of the logarithm of word frequency,

albeit with different slopes, which allowed more parafoveal processing time for high-frequency

words (see Figure 2). Additionally, the fc and lc processing times were also functions of

contextual predictability and eccentricity of words relative to the retina. To avoid determinism,

random variation was explicitly introduced. The lexical processing times were assumed to follow

Gamma distributions, with standard deviations equal to one third of their means.

For the oculomotor system, the times to complete the labile and nonlabile programming

processes were assumed to follow Gamma distributions with means of 150 msec and 50 msec,

respectively, and standard deviations of 1/3 of their respective means7. The oculomotor

processing times were independent of lexical processes.

The E-Z Reader model was able to generate fairly complex eye-movement behaviors.

The computer simulations were implemented as stochastic finite state machines, as illustrated in

7 The Gamma distributions were chosen because they showed similar shapes to the empirical distributions. All

Gamma distributions in the E-Z Reader series had standard deviations equaled to 1/3 of their means. The ratio was

picked for convenience by the authors.

Figure 3. Each of the square boxes represents a possible state of the whole system, which is a

combination of the states of the lexical and the oculomotor modules. There were 14 states in E-Z

Reader 5. The model moved from one state to another if one of the processes terminated and a

new process started. The arrows on the diagram mark legal transitions from one state to another.

For example, at State 1 the lexical system was doing familiarity check on word N (f(n)) while the

oculomotor system was planning a refixation on word N (r(n)). If after some time the labile

programming stage (r(N)) of refixation to word N ended and turned into nonlabile programming

(R(N)), the system now would move from State 1 (f(n) r(n)) to State 2 (f(n) R(n)).

It should be emphasized that although the lexical processes may appear to “drive”

reading eye movements in the model, every decision was in fact a result of an interaction, or

more precisely competition, between the lexical and oculomotor processing time. This is clearly

illustrated in Figure 3.

Improvement of E-Z Reader 6. The primary motivation of the E-Z Reader 6 model

(Reichle et al., 1999) was to extend the E-Z Reader 5 model to account for landing position

effects (McConkie et al., 1988). McConkie et al. found that saccades tend to overshoot targets

closer than approximately 7 letter spaces and undershoot those farther than 7 letter spaces. The

magnitude of this systematic error was in the range of 0.5 letters per letter PSL. The landing

positions were also subject to random error, which follows a Normal distribution. The longer the

distance of a saccade the greater the variance in the Normal distribution.

These effects were implemented in E-Z Reader 6 with a pair of linear regression

formulas. For a given planned saccade length (PSL, the distance between the current fixation

position and the center of the intended word; same as launch site in McConkie et al., 1988), the

actual saccade length was

Saccade length EPSLPSL mb +Ψ⋅−Ψ+ )(= ,

where Ψb=7 and Ψm=0.4 were fixed parameters derived from McConkie et al.’s (1988) study,

and E was a normally distributed random error with a mean of zero and standard deviation given

PSLmb ⋅+= ββσ

where βb and βm were free parameters to be estimated.

Parameter estimation and model fitting. E-Z Reader 5 was modeled on a corpus of adult

reading data (Schilling, Rayner, & Chumbley, 1998). Words in the corpus were classified into

five categories based on their word frequency. Six eye-movement variables were calculated for

each of the categories: (1) mean gaze duration, (2) mean first fixation duration, (3) mean single

fixation duration, (4) the mean probability that the word was skipped, (5) the mean probability of

making a single fixation, and (6) the mean probability of making two fixations. Model

parameters were estimated based on these 30 means.

An E-Z Reader model was essentially a Monte Carlo simulation. It took texts, coded in

terms of word frequency and contextual predictability, and traveled through the state transition

diagram (Figure 3) by random sampling from the Gamma distributions. The simulations were

run 1,000 times and the above six eye-movement measures were calculated from the simulated

“eye-movement” data.

8 McConkie et al. (1988) estimated that the standard deviation was a cubic function of PSL (see discussions on

Reilly & O’Regan’s model in the next section). Reichle et al. (1999) apparently simplified it to a linear function.

Model fitting was done using a “grid search” procedure, which involved repeated Monte

Carlo simulations with different parameter values that covered the whole (or a reasonable part9

of the) parameter space. The parameter values that maximized the overall fit between the model

and empirical data were reported.

EZ-Reader is clearly the most ambitious and systematic attempt to date to model control

of eye-movements in reading. At the same time, two serious shortcomings in E-Z Reader’s

parameter estimation and model fitting led to problems in the model-fitting program. These

problems are briefly summarized here; further discussion can be found in Appendix A.

First, the computation formula for the goodness-of-fit measure, as described in Reichle et

al. (1998), contains two errors. Reichle et al. mistakenly squared one of the elements in the

formula, which, instead of normalizing differences, scaled the differences by as much as 100

times. In addition, they used standard deviations when standard errors (of the means) should be

employed, which resulted another unintended scaling in the magnitude of about 50. The resulting

RMS values, measuring how much variation was left after model-fitting, were reported as

statistically nonsignificant, but should have been highly significant.

This computational mistake can help to explain another puzzle in the evolution of the E-Z

Reader models: the goodness-of-fit measure, RMS, did not improve much, and sometimes even

dropped, when new structures and free parameters were introduced. Reichle et al. ignored this

warning sign and based their model selection on theoretical arguments rather than on fit with

9 Reichle et al. (1998, 1999) were vague on how they chose the range of parameter space.

Another problem with the modeling effort was a severe multicollinearity in the measures

being fit. I analyzed the basic data for the E-Z Reader modeling, which consisted of 30 means of

eye-movement variables. As shown in Appendix A, all six eye-movement measures were so

highly correlated in the empirical dataset that after a principle component analysis, a single

factor explained 94.6% of variance, and three factors accounted for 99.999% of total variance. In

effect, the free parameters in E-Z Reader 1 through 6 were estimated on only 5 points. In

addition, the first component was also a linear function of (log-transferred) word frequency.

Thus, the only “correct” model based on this dataset of 30 means would be “any eye-movement

measure is a linear function of log-transformed word frequency.” Given that this linearity was

built-in since E-Z Reader 1, it is not surprising that the later models did not improve model fits.

Comments. At the conceptual level, the E-Z Reader model represents a substantial

improvement of the original Morrison (1984) model. In particular, two new mechanisms

proposed by Reichle et al. (1998) – the decoupling of attention-shift and saccade signals and the

default refixation strategy – enabled the model to simulate more phenomena than the original

Morrison model. On the other hand, there is as yet little empirical evidence to support the two

new assumptions. Their psychological plausibility remains to be seen.

As a quantitative simulation endeavor, E-Z Reader has major limitations. Besides the

mathematical errors, fitting the model on a small set of means proved to be very problematic.

Even if there were not the multicollinearity problem in the data and the modeling were carried

out correctly, there would be still no guarantee that the model really described reading eye

movements. In fact, it would almost certainly not capture the distributional characteristics of

fixation duration and saccade length, given the arbitrary use of gamma distributions.

“Strategy-tactics” Theory and the Reilly and O’Regan Simulations

O'Regan (1990) suggested that the oculomotor guidance system works according to the

following two heuristics:

1. Between-word strategy. Readers fixate on a word until the completion of lexical access

or some other significant stage of recognition. Then they pick a target word from the right

periphery, attempt to move to the generally optimal viewing position (word center) of the word.

In other words, triggering of the between-word saccades is under the control of ongoing

psycholinguistic processing, but word targeting is simply an oculomotor process.

2. Within-word tactics. If the landing position is too far from the generally optimal

position, the system immediately makes a saccade to the other side of the word, and then returns

to the between-word strategy. These tactics are purely oculomotor phenomena and fixation

duration and saccade length are independent of psycholinguistic factors.

Most models assume a word-by-word reading strategy, but word targeting in the

Strategy-tactics model is flexible. O’Regan (1990) presented analyses based on a “careful, word-

by-word” reading strategy, but also explored alternative scanning routines. An important

challenge for the strategy-tactics theory is to find the word-targeting strategy used in normal

reading.

The Reilly and O’Regan (1998) simulation study was an attempt to answer this question.

The study was based on McConkie et al.’s (1988) finding that the distributions of landing sites

on a word tend to follow a normal distribution. Reilly and O’Regan (1998), however, noticed

that the there were systematic mismatches between the observed distributions and the predicted

normal curves. They argued that the mismatches resulted when the over/undershooting fixations

ended up landing on neighboring words. They further predicted that different word-aiming

strategies (e.g. “jump to each successive word,” or “skip high frequency words”) would result in

different patterns of over/undershooting, and therefore different patterns of deviation from the

normal curves. By simulating different word targeting strategies and comparing the simulated

landing position distributions to empirical data, Reilly and O’Regan (1998) hoped to identify the

most likely word aiming strategy in reading.

Reilly and O’Regan (1998; 1998) hypothesized at least six potential word-targeting

strategies, which fell in two categories – oculomotor strategies and linguistic strategies. The

oculomotor strategies do not require any lexical processing in selecting the next word. They

included (1) Random Control10, (2) Word by Word (WBW), (3) Target long word (TLW), and

(4) Skip short words (SSW). The linguistic strategies included (5) Skip high-frequency word

(SHFW) and (6) Attention shift (AS). The first five strategies are self-explanatory based on their

names. The AS strategy was the Rayner and Pollatsek (1989) version of the Morrison (1984)

model without the Henderson and Ferreira (1993) deadline hypothesis.

Model architecture. All word-targeting strategies were simulated within the same basic

framework and differed only in the strategy used for selection of the next target word. Like E-Z

Reader, Reilly and O'Regan's model was implemented as a finite-state simulation program.

There were three main modules in the model – a lexical system, an oculomotor system for

generating refixations, and a saccade triggering system. Before going into details of the modules,

let us first get a flavor of how the simulation worked.

10 The Random Control strategy was not modeled because it was rejected outright as impossible.

At the onset of a fixation on a word, the lexical and the oculomotor systems worked in

parallel. The latter would start to prepare a refixation by default. When the lexical process was

completed, it would program a progressive saccade, the target of which was determined by the

word-targeting strategy being modeled. When the refixation generation process finished, it

would program a refixation. Eye-movement commands such as "move forward" or "stay" were

taken by the saccade-triggering module, which handled the oculomotor details of saccade

programming. Each programmed saccade took a random time to be triggered. Thus, during each

fixation there was a competition between "move forward" and "stay," and the result depended

probabilistically on the processing times of the three modules.

The above illustrates two interesting features of the Reilly and O'Regan's (1998)

simulation. First, although the goal was to simulate landing position distributions, processing

times played the most significant role during the simulations. Thus, the Reilly and O'Regan

simulations qualify as comprehensive eye-movement models. Second, the default-refixation

mechanism clearly reminds us of the E-Z Reader model. In fact, despite the heated debates

between the strategy-tactics and Morrison’s theories, they were remarkably similar when

implemented as quantitative models, as will be seen in the following discussion of model details.

In the Reilly-O’Regan model, the average lexical identification time was a linear function

of the logarithm of word frequency. It was also a function of the length of the currently fixated

word and landing position eccentricity. Individual lexical access times followed a normal

distribution, whose standard deviation was 1/10 of its mean (chosen for convenience).

Refixations have a special importance in the Strategy-tactics theory11. The probability of

refixation was a function of word length and eccentricity of landing position (McConkie et al.,

1989). The time to prepare a refixation was a linear function of eccentricity (off-center fixations

resulted in shorter refixation latencies) but was independent of word frequency. It was assumed

to be normally distributed with a standard deviation of, again, 1/10 of its mean.

The time between programming and actually triggering a saccade – the oculomotor delay

– was assumed to be a random variable12 with a mean of 150 msec and a standard deviation of 50

msec, and was not affected by lexical or any other processes.

The landing position of a saccade was a normally distributed random variable whose

mean and standard deviation were determined according to the original McConkie et al. (1988)

formulas:

m= 3.3 + 0.49 d ,

sd= 1.318 + 0.000518 d3 ,

where d is the distance (in letters) between the launch site and center of the intended word, which

was effectively the PSL in the E-Z Reader 6 model.

Parameter estimation. Most parameters of the model were fixed. They were assigned

11 Interestingly, Reilly and O'Regan (1998) did not specify where refixations are targeted. It is possible that, like

inter-word saccades, they all aim at the center of words. However, O'Regan (1990) maintained that refixations tend

to land on the opposite side of the launching site. There is no basis in Reilly and O'Regan to judge how this was

implemented in their simulations.

12 Reilly and O'Regan (1998) did not state the distributional form of the oculomotor delay. I assume it is a normally

distributed random variable, just like all other random variables in the model.

either on the basis of previous findings or with convenient values. There were, however, a few

free parameters, all of which were part of the word-targeting strategies. For example, in the

Target Longest Word (TLW) strategy one had to determine the size of the visual field from

which the "long" word would be picked. When there were one or more free parameters, Reilly

and O'Regan (1998) picked some reasonable and convenient values and ran the simulation

multiple times. There was little systematic parameter estimation.

Modeling results and Model testing. Simulation materials were taken from the same text

as in McConkie et al. (1988); only word length and frequency information were used. For each

strategy, 20 trials were run with different random seeds. For each simulation, analyses similar to

McConkie et al. were conducted. Simulated landing site distributions were subtracted from the

hypothetical normal distributions for individual words. The authors looked at the patterns of

discrepancies for each word-targeting strategy and searched for ones that were close to the

empirical pattern.

Simulation results were reported mostly qualitatively. Reilly and O'Regan (1998) did not

perform any statistical test to compare the fit of models based on different strategies because the

strategies had different numbers of parameters and might not be readily comparable. The only

quantitative measure of the models' goodness of fit with empirical patterns was correlation

coefficients13, along with statistical tests of whether each was significantly different from zero.

Reilly and O'Regan relied heavily on the magnitude of the correlation coefficients to choose the

most likely word-targeting strategy.

13 The "concordance measure (rc)" in Reilly and O’Regan (1998) was a correlation coefficient. When there were free

Findings of the simulations were complicated and will not be reported here in detail. The

Word-by-Word strategy was shown to fit the data poorly. As for Morrison’s Attention Shift

model, Reilly and O'Regan concluded that there was not enough time to identify words in the

parafovea with the attentional shift mechanism14, and that the details of the AS model might need

some revision15. Reilly and O'Regan (1998) favored the “Target the Longest Word” strategy.

They concluded, “The results, therefore, suggest that the eye-movement guidance system does

not generally use linguistic information, but exploits word-length information in the right

parafovea to target the next saccade” (p.316).

Comments. These conclusions, however, are highly suspicious because of several

methodological and conceptual problems. The first concern is whether Reilly and O'Regan's

findings were robust. The effects they tried to model (deviations of fixation position distributions

from normal distributions) were very small. Comparing models based on these statistics thus

becomes very tricky. With an arbitrary simulation sample size of n=20, the statistical power of

these tests is very questionable. In addition, the normal distribution hypothesis was a convenient

parameters and there was a "grid-search", rc's of all simulation trials were reported in a table.

14 Reilly and O'Regan rejected an alternative explanation that the time estimates for word identification were too

long. They argued that the lexical processing time estimates were based on those of Rayner & Pollatsek (1989, p.

176), which had been shown to be quite reliable and was supported by other sources. Without direct evidence, this

argument does not seem strong. In fact, even if individual parameters of lexical processing time were accurately

estimated, the overall time could still be an overestimate. See later discussion on the use of regression coefficients

when independent variables are correlated.

15 Reilly and O'Regan suggested adding contextual predictability to reduce lexical identification time, which,

interestingly, was exactly one of the new features in Reichle et al.'s (1998) E-Z Reader models.

modeling choice16 in McConkie et al. (1988). Suppose the actual landing position distribution

was a slightly positively skewed distribution (e.g. a lognormal distribution), it might well require

a word-targeting strategy other than TLW to produce a pattern that would match the empirical

The second problem is the use of a correlation coefficient rc as the goodness-of-fit index.

Given that Reilly and O'Regan were modeling a fairly small effect, all deviations would be close

to zero and thus correlation coefficients would be expected to be low and variable. Choosing a

model on the basis of absolute values of correlation coefficients, as Reilly and O'Regan did, is

risky. There is no guarantee that a model with r= 0.34 is statistically better than one with r= 0.30.

A better goodness-of-fit indicator is needed to evaluate Reilly and O'Regan's conclusions.

In addition, many modeling decisions were quite arbitrary. The assumption that

processing times are normally distributed implies that fixation durations, the sum of the

component times, would also be normally distributed. This contradicts the well-known fact that

fixation durations, like reaction times, follow a positively skewed distribution that systematically

differs from normal (McConkie, Kerr, & Dyre, 1994). Similarly, most of the parameters in the

model were fixed to convenient values rather than being systematically estimated from data. A

different set of values may yield a different conclusion.

At the conceptual level, it is unclear why readers would necessarily follow a single word-

targeting strategy. It is conceivable that the eye may be attracted by a host of different features,

such as word length, orthographic structure (Liversedge & Underwood, 1998), or the likelihood

16 Reilly and O'Regan dismissed the choice of distribution other than Normal as "unparsimonious."

of being identified parafoveally (Brysbaert & Vitu, 1998). There may also be individual

differences in word-targeting strategies. If these are true, Reilly and O’Regan’s attempt to

identify strategies is doomed to fail. A more fruitful approach seems to be to describe directly

how readers actually target words in reading, instead of presupposing any fixed strategy.

Mr. Chips: The Ideal Observer

The ideal observer models take a different modeling approach from the previous ones.

“An ideal observer is an algorithm that yields the best possible performance in a task that has a

well-specified goal…” (Legge, Klitz, & Tjan, 1997, p. 525). In other words, an ideal observer

model begins by specifying a goal and task constraints and tries to find an optimal solution. Its

objective is not to describe human data but to compare human performance to that of the optimal

algorithm. “The ideal observer provides an index of task-relevant information by showing the

performance level that can be achieved when all of the information is used optimally.

Comparison of human performance to ideal performance can establish whether human

performance is limited by the information available in the stimulus or by information-processing

limitations within the human” (p. 525).

Mr. Chips (Legge et al., 1997), a computer simulation program, attempted to identify the

optimal strategy for saccade programming that minimizes uncertainty in word recognition. In the

simple world Mr. Chips lived in, reading had one goal – to identify each and every word – and

two constraints – the limited visual acuity of the retina and inaccurate control of eye movements.

Mr. Chips attempted to “read” a word list with the minimum number of saccades and identify

each word in order. This was achieved by carefully calculating the best landing position of the

next saccade so as to minimize uncertainty in word identification. Its calculation was based on its

lexical knowledge, the (partial) information from its "retina," and characteristics of the

oculomotor system. Note that Legge et al. did not try to simulate the temporal dimension of

reading17.

Model architecture. As shown in Figure 4, Mr. Chips had three main modules – the

retina, the lexicon, and the oculomotor system.

Mr. Chips' retina consisted of three regions: (a) high-resolution vision in which letters

can be identified, (b) low-resolution vision (relative scotomas) in which spaces can be

distinguished from letters but letters cannot be identified, and (c) blind spots (absolute scotomas)

where there is no vision.

Mr. Chips had a lexicon composed of the 542 most common words in written English,

along with their relative frequencies. The reading materials (word lists) were randomly sampled

from Mr. Chips' lexicon.

At the core of Mr. Chips was the algorithm for calculating and minimizing uncertainty

about the current word. This was done in two steps. Based on the partial visual information from

the retina (some identified letters and word length), Mr. Chips extracted from the lexicon a list of

candidate words. If the list had more then one word (i.e., the word could not be uniquely

identified) Mr. Chips would compute an entropy value, an index of the amount of uncertainty,

based on the frequencies of the candidate words, for every possible landing position of the next

saccade (most likely refixations) and select the movement that was most likely to identify the

word. This is the "entropy-minimization principle" underlying the ideal-observer model.

17 Legge, Klitz and Tjan (1997) did include a section discussing the "reading speed" of Mr. Chips, but this speed was

Like humans, Mr. Chips' saccade execution could be imperfect. In one version of the

model, its saccade length followed a normal distribution. Mr. Chips had to incorporate this

statistical information into saccade programming.

Parameter estimation. Because it is an ideal-observer model, Mr. Chips’ parameters were

manipulated by the modeler rather than estimated from data. For example, Legge et al. (1997)

explored the effects of smaller vocabulary size and abnormal retina on reading saccade

programming. Parameters were not estimated from human data.

Modeling results. The virtue of an ideal-observer model is not how well it approximates

behavioral data, but how it can help to understand human behavior. Several human eye-

movement phenomena, such as refixations, regressions, word skipping, etc., emerged from

following the simple entropy-minimization algorithm. Mr. Chips also showed an “optimal

viewing position” – it tended to land on the third letter position on a word.

Interestingly, Legge, et al. (1997) showed that the “eye-movement behaviors” of Mr.

Chips could be characterized with a few simple heuristics, despite the complex internal

mechanisms of the model. For example, Legge et al. (1997) demonstrated that almost identical

performance could be obtained when only word length information was used. This is consistent

with the finding in reading literature that eye-movement guidance is primarily based on word

boundary information (McConkie & Rayner, 1975; Rayner, 1986). Legge and colleagues also

showed that Mr. Chips’ eye-movement strategies, such as the optimal viewing position effect,

could be summarized by a set of simple if-then heuristics. Together these findings suggest that an

estimated from its saccade length by assuming an average 250 msec fixation duration.

eye-movement control system may achieve optimal reading performance without actually doing

expensive entropy calculations or using high-level information.

Comments. The Mr. Chips model sheds light on some important issues in modeling eye

movements. It demonstrated that eye movements could be described at a behavioral level

separate from the underlying mechanisms. Another important insight is that simple discrete

algorithms (“targeting word centers”) could achieve near optimal performance compared to the

costly “continuous” control (“minimizing entropy”). These became important design principles

for my research.

Stochastic Models by Stark and Suppes

Two scholars, notably not mainstream reading researchers, have tried to describe reading

eye movements with stochastic models (Stark, 1994; Suppes, 1990, 1994). Both of them chose to

use Markov models (see the first chapter for a brief introduction) to capture the dynamics of eye

movements.

Scanpath theory of reading. Based on his research on scanpaths (Hacisalihzade, Stark, &

Allen, 1992; Stark, 1994; Stark & Ellis, 1981; Zangemeister, Sherman, & Stark, 1995), Stark

(1994) proposed that the sequence of reading fixations could be modeled as a Markov process, or

a “scanpath.” Stark proceeded by treating each word in a text as a possible state and describing

reading as going through a series of states. The probability of jumping from one state (word) to

another constituted a Markov transition matrix, and the transition matrix could fully describe the

stochastic properties of reading fixation sequences. Further more, Stark introduced string-editing

distance (Wagner & Fischer, 1974) as a measure of the similarity between two fixation

sequences, which could be desirable for reading research.

Comments on the scanpath model. Stark’s scanpath model has been largely overlooked in

the reading research community. One of the reasons is that the way Stark formulated the Markov

transition matrix originated from picture perception studies and might not be suitable for reading

research. By setting each word as a state, Stark implied that the eye might jump from a word to

any other word in reading. While this is possible, such wild saccades are very rare in reading.

Compared to picture viewing, reading is a much more constrained task, where the eyes almost

always move to adjacent words and wild jumps are rare. It is more intuitive to consider a more

localized Markov process, in which the possible moves of the eye are limited to nearby words.

Suppes’ Stochastic model. Suppes' (1990, 1994) reading eye-movement control model

provides a relatively comprehensive treatment of eye movements – modeling both fixation

duration and saccade programming – and thus is discussed in more detail.

The stochastic model was derived from Suppes’ earlier models of eye movements in

doing multi-digit arithmetic (Suppes & et al., 1983). The reading counterpart consisted of two

increasingly complex models – the minimal-control model and the text-dependent probabilistic

control (TDPC) model. In the minimal-control model, Suppes attempted to simulate fixation

duration as a pure random variable that was not affected by on-going reading processes. In

contrast, saccade direction and size were under complete cognitive control18. The minimal-

18 Suppes (1990) was inconsistent about this. Despite the facts that (a) the axioms unequivocally showed that

saccade targeting was determined by the underlying cognitive processing, and (b) he clearly stated that “direction

and size of saccade are under cognitive control in this minimal model” (p. 466), Suppes maintained the following:

“It was assumed that most of the process is an automatic low-level process, little disturbed by cognitive and

linguistic aspects of reading. The two basic assumptions of the minimal control model were (a) durations of

control model did not cover many empirical findings, therefore a revised model, the TDPC

model, was derived to “take into account the local variables that have the largest effects on eye

movements” (p. 472). Because the revised model does not change the fundamental architecture

of the “minimal control” model, the following discussion is primarily based on the initial model.

Model architecture. Suppes’ models were defined in terms of axioms, or fundamental

hypotheses about the principles of eye-movement control. A system of axioms was then

translated into mathematic functions, for instance, a distribution density function of fixation

duration. Some of the axioms would undoubtedly surprise mainstream reading researchers. For

example,

AXIOM F1. The execution time of each eye-control instruction is independent of past processing and the present stimulus context.

… AXIOM D1. If processing is complete in a given region of regard,

then move to the next word of text.

… AXIOM D5. A saccade is independent of past motion and earlier

stimuli.

With respect to fixation duration, the axioms implied that it should be a mixture of an (a)

exponential random variable and (b) a convolution of two identical exponential distributions.

For saccade programming, Suppes proposed a Markov model that was more intuitive

than Stark’s scanpath formulation. He categorized saccade moves into five states: move forward,

regress, refixate, skip the next word, and others. According to the axioms in the minimal-control

fixations are not affected by the content of the reading text, and (b) the length of saccades is not influenced by text

context but only by the physical layout of the page” (p. 465).

model, saccade programming was a zero-order Markov process, also known as a “random walk.”

At any time point in time, the probabilities of making the five moves were constants,

independent of previous states19.

The revised TDPC model added only one change to the fixation duration axioms – the

execution time of each eye-control instruction decreases monotonically along the line of text

(Heller, 1982). Factors that have been central to other models, such as word frequency or

syntactic effects, were dismissed as having “only relatively small effects” (Suppes, 1990, p. 473).

More changes were made to the axioms for saccade control, incorporating the effects of the

optimal viewing position, word length, and syntactic difficulty. However, these patches were

added in such a haphazard fashion that it became impossible to evaluate the mathematical

properties of the model.

Parameter estimation, model testing, and model comparison. The distribution of fixation

durations was a fully parameterized mathematical model, which had been fitted to eye-

movement data from Suppes’ arithmetic experiments. Models with the best fitting parameters

showed a “reasonably good” fit, but Suppes acknowledged that they would have been rejected by

a formal goodness-of-fit test. He did not report the fitting of any reading data. There are reasons

to believe that the fit would not be better than that of the arithmetic data20.

19 Suppes (1990) was not consistent on the nature of the Markov process. While he clearly intended to promote a

random-walk model (p. 467), a few axioms referred to an undefined concept of “processing.” Depending on the

outcome of the processing, different saccadic moves might be taken. This violated the basic assumptions of a

random-walk mode.

20 Suppes (1990) acknowledged that reading fixation duration was typically less variable than those in doing

Suppes did not develop the saccade control system in any depth beyond the five axioms.

This part of the model was not explicitly expressed in a mathematical form. No quantitative test

of the models was given in Suppes (1990; 1994). The choice of the TDPC model over the

minimal control model was based solely on theoretical analyses.

Comments. Although an extremely limited attempt, Suppes (1990; 1994) outlined the

possibility of Markovian models in describing reading eye movements, both fixation duration

and saccades. An obvious problem with the Markov models in both Stark’s (1994) and Suppes’

models is that they were not flexible enough to take into account other factors, such as word

frequency. A Markov model with a hierarchical structure will be explored in the current research.

In addition, Suppes’ model is one of the first attempts to explicitly model the distribution

of fixation durations. Although it failed (McConkie & Dyre, 2000), it called much needed

attention to the importance of modeling not only the means but also their distributions.

Normal Eye Movements: McConkie and colleagues' mathematical modeling

The goal of McConkie and colleagues' research is best summarized by the title of

McConkie, Kerr, and Dyre (1994) – “What are ‘normal’ eye movements during reading: toward

a mathematical description.” Some of their representative studies include the modeling of

landing position distributions (McConkie et al., 1988; Radach & McConkie, 1998), refixation

frequencies (McConkie et al., 1989; Radach & McConkie, 1998), skipping rates (Kerr, 1992;

McConkie et al., 1994), regressions (Vitu & McConkie, 2000; Vitu, McConkie, & Zola, 1998),

arithmetic, therefore an exponential-based model may not work well. Furthermore, the mixture distribution Suppes

proposed typically shows two modes, but reading fixation duration distribution is usually unimodal.

and distributions of fixation durations (McConkie & Dyre, 2000; McConkie et al., 1994).

Summarizing this line of research turns out to be difficult, because models for individual

components are still evolving and pieces of the model have not been completely put together.

Nevertheless, the central theme of this line of research is to mathematically describe regularities

and constraints that are inherent in eye-movement data. Many of its findings have become the

foundations of other modeling efforts (e.g., Reichle et al., 1998; Reilly & O'Regan, 1998).

McConkie and colleagues decomposed the problem of reading eye-movement control

into two separate decisions: (a) where to move the eyes and (b) when to move them. With respect

to the WHERE decision, a further distinction has been made between where the eyes are

intended to go and where they actually land. Therefore there are three main components in

McConkie and colleagues’ eye-movement control model: saccade target selection, saccade

execution, and fixation duration control.

Saccade execution. McConkie et al. (1988) found that the landing positions of fixations

relative to a word was a bell-shaped curve centered near the center of the word (see Figure 5A

and 5B). The shape of the curve could be approximated with a normal distribution, whose mean

and variance were functions of the launch site (planned saccade length, PSL, in Reichle et al.,

1998) and word length, among other factors. McConkie et al. (1988) proposed that saccades

were targeted at word centers but missed the targets because of two sources of error in the visuo-

motor system. A saccadic range error was responsible for the systematic overshooting of near

targets and undershooting of far away targets. A random placement error caused the random

spread in landing positions. Together the landing position distribution could be summarized with

a linear regression function, as discussed in the E-Z Reader model and the Reilly and O’Regan

model.

McConkie, Kerr, and Dyre (1994) concluded that landing position was not under the

control of higher levels processes. McConkie et al. (1994) reported that the landing position

distributions on pseudo-words or nonsense letter strings, embedded in continuous text, were

essentially the same as those for normal words. This was further confirmed in Radach and

McConkie (1988), which found that landing position distribution was affected by word length

and word position in a line, but not by the duration of the previous fixation or the

“informativeness” of the initial trigram of the next word. These findings suggested that saccade

execution should be modeled independently from cognitive processes.

Saccade target selection. An essential assumption in McConkie and colleagues’

framework is that eye movements are targeted at the center of words when they are planned.

Which words are selected to be the targets, then, becomes the key question. Three types of eye

movements are particularly interesting – refixations, word skipping, and regressions.

1. Refixations. McConkie et al. (1989) examined the frequency of refixating a word

immediately following the first fixation on it. Based on a large corpus of reading eye

movements, they found that the frequency of refixation is a U-shaped function of the initial

landing position on the word. The probability of making a refixation is higher if the eye lands

near the ends of a word then at the word center. McConkie et al. concluded that the initial

landing position is the primary determinant of refixations. In addition, Radach and McConkie

(1998) analyzed landing positions as a function of launching site for both forward and regressive

saccades and concluded that there is no evidence for different mechanisms, which questioned the

basic hypothesis of the strategy-tactic theory (O'Regan, 1990).

2. Skipping. McConkie, Kerr, and Dyre (1994; see also Kerr, 1992) found that the

frequency of skipping the next word could be expressed in a three-parameter function21:

BLaunchSiteAeMinMaxskipp −×+

−−=

where Max is the maximum of the curve and equals 1, Min is the minimum value reached by the

function, A controls how rapidly the function rises, and B is the inflection point of the curve. The

parameter values depended on word length, as shown in Figure 6.

McConkie, Kerr, and Dyre (1994) hypothesized a word-skipping mechanism based on

the concept of a visual clarity threshold that must be met for a word to be skipped. The above

equation could be interpreted as the proportion of words exceeding the threshold for a given

distance (measured as launching site). Brysbaert and Vitu (1998) proposed a similar theory based

on the “Extended Optimal Viewing Position (EOVP)” effect (Brysbaert, Vitu, & Schroyens,

1996), where the eye guidance system constantly estimated the probability of recognizing a

peripheral word within typical fixation duration. The system would probabilistically skip words

that were highly likely to be recognized at the end of the current fixation. Brysbaert and Vitu

(1998) obtained good fit to empirical skipping rate data with a one-parameter model.

Determining whether or not to skip a word is only part of saccade programming. To

complete the picture one needs to know how the saccade targeting system selects among many

potential targets. Neither McConkie et al. (McConkie et al., 1994) nor Brysbaert and Vitu (1998)

21 McConkie, Kerr, and Dyre (1994) presented the equation in a equivalent but slightly confusing form:

BLaunchSiteAeMinskipp −×+

addressed this issue.

3. Regressions. The phenomenon of regressions has been less well understood, in part

because of the long-held belief that they were results of comprehension break-down and thus

should be excluded from analysis (e.g., Reichle et al., 1998). Most recently, McConkie and

colleagues (Radach & McConkie, 1998; Vitu et al., 1998) have made some intriguing discoveries

about regressions. Vitu et al. found that both low-level factors (e.g., the length of the previous

saccades) and linguistic factors (e.g., word frequency of skipped words) affected the likelihood

of regressing after a word is skipped. Their results indicated that the phenomenon is complex and

is unlikely to have a single cause.

Radach and McConkie (1998) looked at the question of whether regressions are

generated by a different mechanism from that which produces other kinds of saccades. The

analyses of launch site effects showed that there was little systematic range error in interword

regressions (see Figure 7). Regressive refixations, on the other hand, show the same range of

errors and random errors as forward saccades do. Their results indicated that the control of

interword regressions was functionally different from that in making forward saccades or

refixations.

Fixation duration. Early attempts to model the distribution of fixation durations have

been incomplete and unsuccessful (Harris, Hainline, Abramov, Lemerise, & et al., 1988; Suppes,

1990), in part because their model choices were mainly based on theoretical speculations22. In

22 Suppes’ (1990) fixation duration model was derived from the axioms, which had no empirical evidence (at least

in reading research). Harris et al. (1988) presumed that saccade latency involved two (independent) consecutive

processes. This is logically possible, but there has not been experimental evidence to support it.

contrast, McConkie, Kerr, and Dyre (1994) and McConkie and Dyre (2000) emphasized the

inherent constraints in the data.

McConkie, Kerr, and Dyre (1994) studied the hazard function23 of the first fixation

duration distribution, and found it could be approximated by three piecewise linear functions – a

slow-rising early piece, a fast-rising period, and a flat, constant tail. Their subsequent modeling

effort capitalized on this characteristic form of a hazard function.

Like Harris et al. (1988), McConkie, Kerr, and Dyre (1994) hypothesized a two-step

process – ordering a saccade and executing a saccade. They further assumed that once a saccade

was ordered, there was a random waiting time before the saccade was executed. The random

waiting time was assumed to follow an exponential distribution24. The time to order a saccade

was modeled by a mixture of two Weibull components with linear, raising hazard functions (for

23 A hazard function, loosely speaking, characterizes the instantaneous probability of an event happening given that

it has not yet happened. Formally, it can be defined as a function of the probability density function, f(t):

∫−= t

Luce (1986) demonstrates that, compared to the cumulated probability function or the probability density function,

the hazard function was more readily interpretable and was more sensitive in differentiating distributions.

24 Interestingly, in Harris et al.’s (1988) model, the exponential component, the “β-period,” corresponded to the

wait-time for ordering the next saccade, not the executing time. McConkie et al.’s (1994; McConkie & Dyre, 2000)

interpretation is problematic because a mechanism with exponential wait-time would to be too unreliable to carry

out saccadic movements, one of the most frequent movements in humans. In reaction time literature, there had been

similar confusions, and the consensus now is that the exponential component corresponds to cognitive or signal

processing rather than to the execution (see Luce, 1986).

discussion of the Weibull distribution, see Johnson, Kotz, & Balakrishnan, 1994). There was no

theoretical reason to choose the Weibull distributions except that they characterized the empirical

hazard functions. Putting the two steps together, the distribution of fixation durations (sum of

ordering and executing times) was the convolution25 of the two components. This “two-stage

mixture” model fitted the empirical distribution very well, as seen in Figure 8, although no

goodness-of-fit statistics were reported.

Following this initial success, McConkie and Dyre (2000) explored two additional

models – a “two-state transition” model and a “two-stage race” model. Although the three

models, including the 1994 “two-stage mixture” model, have different assumptions about the

underlying mechanisms that determine fixation duration, they were designed to closely mimic

the piecewise linear hazard function of the empirical data. Consequently, they fit empirical data

equally well. There was no evidence that one mechanism was more plausible than another.

Comments. While there has not been a unified model, this line of research has

contributed much quantitative knowledge to our understanding of reading eye movements. The

power of the data-driven modeling approach is self-evident as two competing models – the E-Z

Reader 6 model (Reichle et al., 1999) and Reilly and O’Regan’s (1998) model – both

implemented McConkie et al.’s (1988) formulas.

With respect to saccade programming, McConkie et al.’s (1988) proposal of a two-level

saccade control model has been widely accepted. In this hierarchical model, cognitive effects are

25 The distribution of the sum of two random variables is the convolution of the two distributions. Mathematically,

dxxtgxftht

fg )()()(0

−⋅= ∫+

confined to the level of selecting of target words, and have only discrete control – selecting

which word but not where in the word to land the eyes. The continuous nature of saccade length

is a result of random and systematic errors, and saccade execution is conditionally independent

of higher processes. This conceptualization greatly simplified the interpretation of saccade

control in reading. The SHARE architecture is an extension of this probabilistic, hierarchical

structure.

McConkie et al.’s (1994; McConkie & Dyre, 2000) modeling of fixation duration

distribution is also inspiring. The reason for their unprecedented successes is not a superior

theory or mechanism, but their data-driven modeling approach – the choice of using piece-wise

linear models to estimate empirical hazard functions. This suggested that one might go a step

further and question the only major a priori mechanism hypothesis in their models, the

assumption of the saccade ordering and executing steps.

CHAPTER 3. DESIGN PRINCIPLES

The previous chapter surveyed some of the previous attempts to quantitatively account

for reading eye movements. Their successes and failures illustrate some important issues that any

quantitative model trying to describe reading eye movements has to address. A modeler has to

make conscious decisions about them. The choices will constrain his or her modeling

approaches.

Eight such issues are presented below as dichotomies, although the choices are often

neither mutually exclusive nor limited to two. They represent the decision process through which

the current model has been shaped, and provide a framework for presenting the rationale for the

basic modeling choices made in the research to follow.

Theory-driven vs. Data-driven Modeling

Rayner (1995, see chapter 1) raised an important issue – do we need a theory of eye

movements in order to measure and describe them? The question may be pursued in two senses:

whether we should try to describe eye movements without subscribing to a particular theory, and

whether we are able to do so.

My response to the first question is that we should try to develop a theory-neutral

descriptive framework for eye-movements, to the extent we can. Current theories of reading eye-

movement control – e.g., the strategy-tactics theory (O'Regan, 1990; O'Regan & Jacobs, 1992)

and theories based on Morrison (1984; e.g., Rayner & Pollatsek, 1989) – are collections of

hypotheses about the underlying mechanisms and processing. While these hypotheses are

inspired by empirical findings, there is no evidence that any particular theory is indisputable.

The field of reading eye movement research has not reached a stage where theories are well

established and few facts are left to be found. On the contrary, as some most recent studies

suggest (e.g., McConkie & Dyre, 2000; Shillcock, Ellison, & Monaghan, 2000), we are just

starting to discover some of the basic constraints and regularities of eye movements. At this

point, our observations should not be limited and biased by existing theories and models.

The extent to which we can describe reading eye movements without subscribing to a

particular theory is an empirical question. The SHARE architecture is an attempt to model eye

movements with a minimal number of assumptions about the underlying mechanisms and

processes. The current research approaches the problem by analyzing the logical constraints for

the modeling task, carefully selecting the mathematical model, and employing powerful

algorithms to estimate model parameters. The goal of the model is to capture the “essence” of

eye movement patterns so that it can reproduce eye movements with the same pattern, or predict

the next fixation, among other things.

What can we gain from an “atheoretical26” model, assuming it does achieve its goal? First

of all, such a data-driven modeling approach is just an extension of several lines of successful

research looking for structures in the eye movement data. By using a more powerful

26 The term is used in contrast with a model based on a particular existing theory, in particular a theory that heavily

emphasizes on hypothetical mechanisms. There is no such thing as atheoretical modeling. Every mathematical

operation imposes, explicitly or implicitly, structure and assumptions on the subject matter, and these assumptions

are part of the theory. Consider, for example, why the model “1+1=2” fails to model the volume of a cup of sugar

mixed with a cup of water, or what a better-fit model “1+1=1” (more correctly f(1,1)=1) reveals about the

underlying mechanism of the above mixing process. The assumptions of the current model will be discussed in the

rest of this chapter and the next chapter.

mathematical model (see discussion in Chapter 1) more should be learned about the inherent

regularities in the data. Secondly, although the model does not hypothesize about the

mechanisms, it tests whether a mathematical structure is adequate to describe some aspect of eye

movements, which in turn constraint potential mechanisms. Last but certainly not least, the

ability to faithfully describe eye-movement patterns will enable many applications of eye-

movement methodology that were previously unavailable.

In short, a data-driven modeling approach is a valuable way to contribute to our

understanding about reading eye movements, and at the current state of knowledge it is a much-

needed complement to the development of eye-movement mechanisms. The rest of the chapter

discusses some of the important modeling decisions in choosing the modeling structures and

tools.

Deterministic vs. Probabilistic Modeling

There is enormous variation in reading eye movements. One may try to account for every

bit of the variation in a model, or assume at least part of the variation is due to random

fluctuation. The models surveyed in the last chapter vary along this dimension. The READER

model (Thibadeau, 1983; Thibadeau et al., 1982) exemplifies the deterministic approach, where

variation in gaze duration was precisely determined by the intricate comprehension processes. At

the other extreme, Suppes (1990; 1994) hypothesized that fixation duration was a pure random

variable independent of any other factors.

Most models took the middle ground, but the sources of random variance were

introduced very differently. The noise in Reilly’s (1993) connectionist model was built into the

neural network architecture and training. Both the E-Z Reader and the strategic-tactics models

introduced arbitrary (and different) random variance to lexical and oculomotor processes. It is

particularly interesting for the E-Z Reader model, because Morrison’s original model was

presented as a deterministic machine. Neither model took the step to verify that their models

have probabilistic characteristics similar to the empirical data27. In contrast, distributional

properties of random components, such as means and standard deviation, were directly taken

from McConkie and colleagues’ estimates (McConkie & Dyre, 2000; McConkie et al., 1988;

McConkie et al., 1989).

The most illuminating example on the issue of deterministic versus probabilistic

modeling is Mr. Chips. The basic model was purely deterministic. Every move was carefully

calculated to minimize lexical uncertainty. However, the outcome of the complex deterministic

process could be modeled with surprisingly simple probabilistic heuristics. It suggests the

strength of probabilistic modeling, even if there is a complex deterministic underlying

mechanism. The current research employs a probabilistic framework.

The WHEN and WHERE Decisions

The WHEN and WHERE decisions refer to the mechanisms that determine fixation

duration and saccade length, respectively. Not all models reviewed above considered both

dimensions. Of those that did, the READER (Thibadeau et al., 1982) assumed a single

mechanism – reading comprehension – determined both, whereas in Suppes (1990) the two

27 Reichle et al. (1998) showed figures of distributions of simulated and empirical fixation duration measures and

claimed that they were similar without any quantitative support. The fittings were far from satisfactory compared to

McConkie and Dyre’s (2000) work. The simulated distributions would almost certainly be rejected as appropriate

models if any statistical analysis were performed.

decisions were completely independent. In both E-Z Reader and strategy-tactics models the two

decisions were made through interactions between the lexical and the oculomotor systems.

There is strong neurophysiological evidence that there exist two separate pathways, one

carrying spatially coded information and the other conveying the triggering signal of saccades

(e.g., van Gisbergen, Gielen, Cox, Brujins, & Schaars, 1981). Behavioral data also support the

separation of the two pathways (Kingstone & Klein, 1993; Walker, Kentridge, & Findlay, 1995).

These motivated Findlay and Walker (1999) to model the two pathways as a loosely coupled

parallel system, in which cognitive factors may affect both pathways but via different

mechanisms.

Whether the WHERE and WHEN pathways are closely or loosely coupled systems has to

be determined empirically. As a general architecture, the two pathways should be represented

separately, while still allowing interdependencies between the two systems. On the other hand, a

modular model, in which subsystems are only loosely connected, seems to be more desirable for

model fitting and interpreting. Therefore, in the SHARE model the two pathways are

implemented as separate subsystems that can be statistically dependent on each other. But the

first model built on the basis of SHARE will assume they are conditionally independent

subsystems. Whether or not they should be modeled as stochastically dependent processes is a

question to be answered by the fit of the model to empirical data.

Linguistic vs. Low-level Variables

There is no doubt that eye-movement decisions are not independent of what is on the

page. But whether eye movements are driven by high-level linguistic variables (e.g. word

frequency and contextual predictability) or by low-level visual factors (e.g. word length and

landing position) is under theoretical debate. This is clearly reflected in the various quantitative

models, each of which proposed some idiosyncratic set (including the empty set in the case of

Suppes’ model) of variables that determine fixation duration and saccade targeting.

The strategy for the SHARE architecture is to give all variables equal opportunities, and

let data determine which variable is relevant to which eye-movement outcome. As a first step,

the current implementation includes two relatively uncontroversial variables, namely the

frequency of the currently fixated word and the length of the next word (see Rayner, 1998),

which represent linguistic and low-level information, respectively. The model is not limited to

these two variables, however. It is designed to make it easy to incorporate other variables

without changing the fundamental structure of the model.

Time-series vs. Independent Data

Eye movements occur in order, therefore they naturally constitute time-series data. Most

eye-movement research tries to summarize eye movements using statistical models designed for

independent samples, for example, by using composite variables and analysis of variance.

However, unless one can prove eye movements are time-independent, they should be modeled as

time-series data. In other words, the burden of proof is on those who treat eye movements as

independent samples.

There have been attempts to study the temporal relations of eye movements. Several

studies calculated autocorrelations among eye movements and found them to be negligible

(Andriessen & De Voogd, 1973; Hogaboam, 1983; Rayner & McConkie, 1976). However, a zero

correlation coefficient does not guarantee statistical independence. There is empirical evidence

that eye movements are not independent samples. For example, regressions are more likely to

occur after long forward saccades (Andriessen & De Voogd, 1973). McConkie et al. (1988;

1989) found that various aspects of an eye movement (e.g., probability of word skipping) depend

on the characteristics of the previous eye movement (e.g., landing position and launch site).

The survey of quantitative models leads to a similar conclusion. Although Suppes’ (1990)

minimal-control model assumed that both fixation duration and saccade moves were

independent, identically distributed random variables, all other models treated fixation duration

and saccade length as time-dependent.

In conclusion, there is no strong a priori reason to believe eye movements can be

modeled as independent samples. Therefore, reading eye movements should be modeled as time-

series data. On the other hand, most temporal connections proposed in the literature are relatively

short term – in most cases between adjacent eye movements. This suggests a relatively simple

stochastic model may be sufficient to capture these relations.

Discrete vs. Continuous Control

Eye-movement data – fixation duration and saccade length – are continuous, but that does

not necessarily preclude the possibility that they were “intended” to be discrete. For example,

Radach and McConkie (1998) argued that saccade programming is discrete. They suggested that

saccades are targeted at word centers, and the spread of landing position is a result of errors in

the oculomotor system (McConkie et al., 1988; O'Regan, 1990; Radach & McConkie, 1998; see

also Rayner, 1998). The discrete-control model is in contrast with continuous-control theories

(e.g., Liversedge & Underwood, 1998), in which eye movements are directly aimed at particular

locations in words.

Theoretical debates aside, the discrete-control conceptualization offers some advantages

from a modeling point of view. For example, it insulates the effects of cognitive factors from

saccade execution details, so that the subsystems can be modeled separately. A discrete

stochastic system is also easier to model than a continuous one, and is often more interpretable.

One concern with the discrete-control approach is that the underlying mechanism may be

truly continuous. The Mr. Chips model sheds some light on this issue. The Mr. Chips model was

a strict continuous-control model, in which saccade length is meticulously calculated to

maximize information. However, the saccadic “behaviors” could be well modeled as outcomes

of a probabilistic, discrete control system in which eye movements were directed to the optimal

viewing position of each word. Therefore, to the degree that descriptions of eye movements can

be separated from the possible underlying mechanisms, a discrete-control model provides at least

a good approximation of the eye movement outcome. Because of its relative simplicity and the

likelihood that continuous data can be modeled via discrete underlying processes, it makes sense

to begin with a discrete model of eye-movement control.

While a discrete-control theory for saccade programming (McConkie et al., 1988) has

been widely accepted, fixation duration, on the other hand, has almost always been assumed to

be under continuous control. Our survey shows that the most popular, unchallenged assumption

is that fixation duration (e.g., first fixation duration or gaze duration) is a linear function of the

logarithm of word frequency (Just & Carpenter, 1980; Reichle et al., 1998; Reilly & O'Regan,

1998; see also Rayner, 1998, for a review). In some quantitative models (Reichle et al., 1998;

Reichle et al., 1999; Reilly, 1993; Reilly & O'Regan, 1998), it is also a continuous function of

landing position (eccentricity), word length, and duration of the previous fixation.

In fact, there is empirical evidence hinting a discrete control system in the WHEN

pathway. Distributional analyses of fixation duration have shown that linguistic factors such as

word frequency (McConkie, Reddix, & Zola, 1992) or semantics (Feng, Miller, Zhang, & Shu,

2001) tend to have strong effects on some fixations and little effects on others. These findings

contradict traditional continuous-control models based on linear regressions (Reichle et al., 1998;

Reilly & O'Regan, 1998; Thibadeau et al., 1982), which assume linguistic factors affect all

fixations by changing the means of fixation durations.

The clearest demonstration of the existence of different kinds of reading fixations is Yang

and McConkie (in press), in which they experimentally manipulated the information readers

could perceive at any given fixation using the eye-movement contingent display change

technique (McConkie & Rayner, 1973). The manipulations to the text ranged from extreme (such

as blanking the whole page or replacing a line of text with X’s) to modest (replacing text with

non-words or filling all spaces with a symbol). Yang and McConkie found three categories of

fixations (see Figure 9). The first group included short fixations (shorter than approximately 125

msec), which occurred even when all visual information was removed. The second group peaked

at approximately 175 to 200 msec. These fixations did not require linguistic information but the

content being fixated needed to be “text-like.” For instance, the position of the peaks of these

fixations were largely unaffected when a line of text was replaced with X’s but the spaces were

preserved, but the distributions were severely suppressed when the spaces were removed. Lastly,

there was a group of long fixations that peaked roughly at around 350 msec and extended well

beyond 700 msec in some cases.

Corroborating evidence for the existence of three distinct types of fixations also came

from oculomotor research. Gezeck, Fischer, & Timmer (1997) also found, in simple saccadic

reaction time experiments, three distinct categories of fixations – “express” (90-120 msec), “fast

regular” (135-170 msec), and “slow regular” (200-220 msec). Interestingly, the three peaks are at

the same position for naive and trained subjects but the weights differ, with more express

saccades for trained subjects. The positions of the peaks differed from those in Yang &

McConkie (in press), not surprisingly given the task differences, but both strongly suggest the

existence of different categories of fixations, each having distinct parameters and possibly

responding to different information.

To determine whether fixation duration in normal reading can be modeled with a discrete

model, I fitted a mixture-of-lognormal model to fixation duration from a large dataset (details of

the study are presented in Appendix B). The hypotheses are similar to the discrete-control

framework for saccade programming. The mixture-of-lognormal model assumes a two-level

fixation duration control system. At the “control” level, there are n discrete categories of

fixations, each having different parameters (e.g., intended duration). For each fixation, the

control system chooses the appropriate kind of fixation and sends the command to the “output”

level. At the output level, the command is carried out but with random error added, which is

assumed to follow lognormal distributions (the justifications are discussed in Appendix B). Thus,

over the long run, the distribution of all fixation durations follows a mixture of lognormal

distributions.

To summarize the findings, the distributions of fixation durations can be very well fitted

with a 3-component mixture-of-lognormal model. This model not only fits group data from

children and adults, but also fits individual distributions (these results are presented in detail in

Chapter 5). Most importantly, the parameters of the three classes of fixations are largely

consistent with the estimates from Yang & McConkie (in press). This suggests that the good

fitting achieved by the 3-component lognormal-mixture model is not coincidental. Based on

McConkie et al. (1988) and the above fixation duration modeling study, both WHERE and

WHEN pathways are modeled by a hierarchical probabilistic model, where eye-movement

commands are discrete at the control level and random errors come into play at the output level.

Group vs. Individual Models

Individual differences28 in reading eye movements are enormous, and they were probably

the very reason why the eye movement method attracted early researchers (Buswell, 1937; Huey,

1908). The value of the eye-movement methodology, especially in reading education, largely

depends on our ability to describe and understand these individual differences.

Nonetheless, practically all models of eye movement control are designed to eliminate

individual differences so as to model an “average skilled reader.” An understandable argument is

that after the general mechanism is discovered, individual differences may be accounted for by

simply adjusting some model parameters. Although this is not an unreasonable modeling

approach, there is no sign that many of the existing models can be easily modified to

accommodate individual differences. For example, in most of the models in the survey, the rules

(e.g., the axioms in Suppes, 1990), mechanisms (e.g., familiarity check versus lexical completion

in Reichle et al., 19988), and constraints (e.g., minimizing lexical uncertainty in Legge et al.,

1997) are hard-coded. It is unlikely that the same rules, mechanisms, or constraints will apply to

28 The term “individual differences” is used loosely here to represent both inter-personal differences and intra-

personal differences under different situations, e.g., reading for different purposes.

each individual under every circumstance.

As a descriptive model, the current model is designed to be flexible – it can be used to

describe group as well as individual eye-movement data. It imposes as few hard-coded

constraints as possible so that it can be maximally flexible in accounting for variance in eye

movements. In the meantime, its hierarchical framework helps to structure individual

differences, captured in model parameters, in a meaningful way.

Descriptive vs. Predictive Applications

The original motivation for developing the descriptive model was to use it in a predictive

application – detecting processing difficulties during reading. The idea was that if we could

faithfully describe the different eye-movement patterns during normal reading versus reading

difficulties, we would be able to predict whether the reader was experiencing processing

difficulty based on a sample of his/her reading eye movements. Furthermore, if the diagnosis can

be done accurately and quickly enough, it may be possible to provide real-time assistants to

readers who experience difficulties in reading.

There are several major obstacles in achieving this goal. Firstly, the eye-movement model

has to be flexible enough to capture both normal and troubled reading. Most previous theories or

models were unable to do this (e.g., E-Z Reader models excluded regressions). The current

model is designed to be able to accommodate a wide range of eye-movement patterns.

Secondly, prediction or diagnosis requires the model to be individualized; a set of

predefined criteria will not fit all readers. This is especially critical because the application is

intended for children, whose reading proficiency and eye movements vary substantially. In a

real-world computer assisted reading instruction setting, the system needs to quickly adapt to a

particular reader, preferably within a few practice trials. Learning a model from sparse and

incomplete data is computationally challenging because parameter estimates become unstable

and possibly biased. One of the most promising solutions to this problem is to incorporate prior

domain knowledge to guide parameter estimation (Heckerman, 1998). For example, if the reader

is a third-grade student, what we know about third-grade readers’ reading eye movements should

be used to help estimating the parameters for this particular reader.

Finally, a computer assisted reading system needs to support probabilistic decision-

making. Given a set of parameters and observed eye movements, it needs to probabilistically

decide whether or not the reader was in trouble. Previous models do not have a mechanism to

perform this task. The current model is designed to support such probabilistic classifications.

Choosing the Mathematical Tools

This chapter identified the goals and task constraints of the current model. The model

attempts to summarize reading eye-movement patterns mathematically while being neutral about

eye-movement control mechanisms as much as possible. The eight design principles enlisted

above have outlined its basic structure – a hierarchical, stochastic model that fully supports

individualization and probabilistic decision-making.

What mathematical tools will serve these needs?

The Markov models (see Chapter 1) are a natural choice for modeling stochastic

processes (e.g., Bengio, 1999). Suppes (1990; 1994) used a zero-order Markov model

(independent) for fixation duration and saccade targeting. Stark’s scanpath employed a first-

order Markov transition matrix to describe reading fixation sequences. However, as discussed

previously, classical Markov models have at least two limitations: they are only suitable for

modeling discrete events, and they do not allow the hierarchical structure necessary for

modeling reading eye movements.

In light of these problems, I chose to use the Hidden Markov decision tree (HMDT)

model (Jordan, Ghahramani, & Saul, 1997; Jordan, Ghahramani, Jaakkola, & Saul, 1998). The

HMDT is a marriage between a Hidden Markov model (HMM; Rabiner, 1989) and a

Hierarchical Mixture of Experts (HME) model (Jordan & Jacobs, 1994).

The HMM is a class of Markovian models known for its successful applications in

automatic speech recognition. It is a two-layered (representing two random variables)

probabilistic model that unfolds over time. The “state” variable is assumed to be unobservable

and follows the classical Markov process (thus the term Hidden Markov); the “output” variable

is observable and is conditionally independent of everything except for the concurrent value of

the state variable. Temporal dynamics are captured by the discrete, unobservable state variable,

whose value is probabilistically revealed by the observed output variable. For example, words

are composed of phonemes; phonemes are discrete, abstract categories that are not directly

observable, but they are probabilistically related to the observable acoustic waveforms. One way

to do speech recognition is to model this relationship with an HMM, where phonemes

correspond to the different states of the state variable, the output variable represents various

acoustic features of the speech, and words are characterized by different state-transition

probabilities, i.e., phoneme sequences. The goal of the HMM is often to probabilistically

determine the most likely value or value sequence of the (unobservable) state variable from a

given sequence of input, i.e., to “recognize” phonemes or words from the waveforms. In order to

do this, the HMM has to be “trained” with training data to optimize model parameters – the

recognition accuracy clearly depends on the model’s ability to capture the statistical regularities

in data.

The HME is a probabilistic decision tree model for classifying independent samples.

Statistically it is closely related to the multinomial logit modes, a special form of generalized

linear models (GLIM; McCullagh & Nelder, 1983). In its simplest form (e.g., see the HME

example in Murphy, 2001), the HME may be reduced to a piece-wise linear regression model.

However, its power lies in the hierarchical structure, where there are multiple layers of “gating”

variables, or “experts.” As the input goes down the hierarchy of “experts,” the data space is

recursively divided, until at the end the final categorization is reached. Thus, HME outperforms

pure linear models and other models in complex data clustering tasks (Jordan & Jacobs, 1994).

The HMDT architecture integrates the best features from both HMM and HME models. It

may be viewed as an HMM with multiple “state” layers instead of one, which makes it possible

to model more complex control mechanisms. Alternatively, it can also be seen as an HME with

temporal structures, which allows it to model not only independent data but also time-series data.

The current model uses a three-layer HMDT model, which is also known as the Input-

Output Hidden Markov model (IOHMM; Bengio, 1999; Bengio & Frasconi, 1996). In the

IOHMM terminology, word frequency, word length, and landing position are “input” variables,

and fixation duration and saccade length are “output” variables. Between the input and output

layers is the eye-movement control layer, represented as the “state” variables in the IOHMM.

Looking at the static structure, the generation of eye movement commands (word targeting and

fixation categories) at the control layer is probabilistically affected by the linguistic and visual

input variables, and the actual eye movements, the output variables, are probabilistically

controlled by the eye movement commands. In the temporal dimension, an eye-movement

decision is probabilistically based on not only the current input variables but also on the previous

eye movement.

It should be noted, however, that the SHARE architecture is not limited to one layer of

eye movement control. For example, in order to model the fact that eye movement patterns are

different when a reader experiences problems in reading, one may implement a four-layer

HMDT model, in which a “cognitive state” node with two states – troubled and normal reading –

is linked to the “control” layer described above. Such an implementation allows modeling of

long-term changes in eye-movement patterns, in addition to the effects between adjacent eye

movements. The modular, hierarchical structure of SHARE minimizes the effects of model

extension on existing structures.

In addition to the HMDT structure, another important element of the SHARE architecture

is the use of Bayesian methods for estimating model parameters and conducting statistical

inferences. Unlike other commonly used methods such as maximum likelihood methods,

Bayesian methods provide a natural way to combine prior knowledge and observed data during

estimation (see Bernardo & Smith, 1994for Bayesian theory in general, and Bengio, 1999, and

Jordan et al., 1998, for an introduction to the use of Bayesian methods in stochastic modeling).

At least two aspects of the Bayesian method are attractive for the current application.

First, because the model will be fitted at the individual reader level, there may not be enough

data to reliably estimate all parameters using traditional methods. By using prior knowledge

(e.g., the distribution of parameters for third-grade readers), the Bayesian method is able to

stabilize estimations and deal with missing data naturally. The other advantage of the Bayesian

method is that it provides a way to adapt a generic model to an individual. One may start with a

model with parameter values based on the grade level, but as eye movements are collected, the

model parameters may be updated using the Bayesian method and the model gradually and

quickly becomes individualized. Few other methods provide flexibility like this.

To summarize, the objectives of the current research requires a probabilistic description

of reading eye movements, and the stochastic model based on IOHMM provides the

mathematical tool for modeling. The architectural and computational details of the current model

are discussed in the next chapter.

CHAPTER 4. SHARE: STRUCTURE, DYNAMICS, AND MODEL FITTING

SHARE, a stochastic, hierarchical architecture for reading eye-movement, is designed to

mathematically describe reading eye movements. The rationales for choosing the IOHMM

framework have been laid out in the previous chapter. The current chapter focuses on the

specifications and the workings of the model.

Modeling Environment

The model was implemented using MatLab, with the Bayes Net Toolbox (BNT;

Murphy, 2001). BNT is an open source MatLab package that supports graphical modeling

(Jordan et al., 1998) and Bayesian inference (Bernardo & Smith, 1994; Heckerman, 1998), which

are two crucial elements of the SHARE model. The source code for the SHARE model is

available on request.

Modeling Data

The eye-movement data used for model fitting came from Miller & Feng (in prep.), in

which English- and Chinese-speaking children (third- and fifth-graders) and adults

(undergraduate students) were asked to read ordinary short stories on a computer screen. The

current study focused only on the English data. There were 20 third-grade students, 26 fifth-

grade students, and 30 adults, each reading 16, 18, and 27 pages of text, respectively. The stories

were selected to be at the children’s age levels (third- and fifth-grade levels, respectively); adult

readers read the children’s stories for comparison.

Eye movements were recorded using the EyeLink system, a video-based system with

sampling rate of 250 Hz and spatial resolution of 0.005°. Typical calibration-recalibration

accuracy is approximately 0.5° to 1°. The default saccade detection algorithm in the system was

used. Eye-movement recording was binocular, but data from only the left eye were analyzed.

Reading materials were presented on a 17-inch monitor in the standard VGA mode (640 x 480

pixels), 60-70 cm away from the reader. English materials were displayed in Espy Sans font, a

font optimized for screen display. Each letter subtended an average of 0.31 visual degrees or 7.9

screen pixels.

The whole dataset consisted of more than 140,000 fixations. Eye movement variables

such as gaze location, fixation duration, and saccade length were recorded, along with relevant

information such as word frequency (Francis & Kucera, 1982), word length (in letters and

pixels), and landing position within words (in pixels).

Structure of the SHARE Model

A graphical representation of the SHARE model is shown in Figure 10. Each node in the

graph represents a random variable. Nodes with rectangular boxes are discrete variables; nodes

with oval boxes are continuous variables. Clear nodes represent observed variables; the

shadowed box (FDC) represents a hidden variable. An arrow from one node to another shows

that the latter variable is dependent on the former; the lack of an arrow between two nodes shows

that the two nodes are conditionally independent. The circular arrows beside the ST and FDC

nodes signify temporal dependency, i.e., the value of a node at time t depends on that at time t-1.

There were eight nodes in the SHARE model, forming three layers.

The top three nodes form the input layer. Three variables represented linguistic, low-level

visual, and oculomotor input information to the eye-movement control layer.

1. FREQn is the word frequency (Francis & Kucera, 1982) of the currently fixated word.

Numerous studies have shown that word frequency affects fixation durations and saccade

programming (see Rayner, 1998, for a review). For computational simplicity29, frequency was

divided into three categories – less than 100 occurrences per million (L), between 100 and 1000

per million (M), and more than 1000 per million (H). The three categories had roughly equal

sizes. Although the cut-off point for the low frequency category – 100 per million occurrences –

was higher than that typically used for adult psycholinguistic studies (around 40 per million), it is

more appropriate for third- and fifth-graders.

2. WLENn+1 is the word length of the word following the one currently fixated30. The

length of the word in the right periphery has been shown to affect skipping rates (Kerr, 1992) and

landing position (McConkie et al., 1988), among other eye-movement parameters. As with word

frequency, word length was classified into three levels – less than 4 letters long (S), between 4

and 8 letters (M), and longer than 8 letters (L). By token or by type, there were more short words

than long words in the reading materials.

3. ECCENn is the eccentricity of the current fixation relative to the fixated word.

McConkie et al. (1988) and O’Regan (1990) have shown that refixation rate is a function of

landing position. Fixations that land at or near word centers are less likely to result in refixations

29 In general discrete variables are less computationally demanding in Bayesian network modeling. Although in the

current study the cut-off points are more or less arbitrary and probably not optimal, the discrete variables should

show qualitatively similar effects as the continuous ones. As the very first step it was more important to implement

a simple but working model than to perfect all details. In the future continuous input variables may be used to avoid

these arbitrary decisions.

30 In case the current word is the last word of a line, WLENn+1 is the length of the first word in the next line.

Although psychologically return sweep planning may be different from that of normal saccades, no special

than are eccentric fixations. In the current model, ECCENn was a binary variable: eccentric (E)

fixations were those that landed on the beginning or end quarter of a word; those that landed on

the central two quarters were central (C) fixations. This served as a simplified measure for

landing position effects.

The middle layer is the eye-movement control layer, which includes the saccade targeting

(ST), fixation duration class (FDC), and planned saccade length (PSL) nodes. The control layer

receives information from the input variables and probabilistically determines the target of the

next saccade and the category of the current fixation duration. These two eye-movement

commands are passed to the output layer to generate actual eye movements.

4. STt is the saccade-targeting node. In the current model, it was assumed to be directly

observable from data.31 It was modeled as a discrete variable with seven values, or “states,”

representing seven different kinds of saccadic moves – (a) regress two or more words32, (b)

regress one word, (c) refixate the current word, (d) move forward one word, (e) move two words

forward, (f) move three words ahead, (g) move forward four or more words. Each state was

associated with a probability, which was in turn conditioned on the values of the input variables

mechanism is implemented in the current model for simplicity.

31 It is a standard assumption that the word the eye lands on is the intended word. According to this assumption, the

value of ST is directly observed. However, the assumption ignores the possibility that the eye missed the intended

target because of oculomotor errors (McConkie et al., 1988). In the current model, I chose to ignore these cases

during model fitting because it greatly simplified computation. These cases were dealt with in simulations.

32 Because only around 1-1.5% saccades were regressions longer than 2 words, these were combined with 2-word

regressions. For the same reason, forward saccades longer than 4 words (about 1% for children, 2% for adults)

and the previous value of ST (STt-1). In other words, the probability of each movement might go

up or down depending on the current linguistic, visual, and oculomotor information, as well as

the last saccadic move. The ST node achieved this by keeping track of all combinations of the

input variables. Internally, it had a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 7 (STt-1) x 7

(STt) = 882 probabilities, 144 (2x2x1x6x6) of which were free parameters. How these

parameters were adjusted during model fitting will be discussed in the next section.

Modeling saccade targeting as a discrete, word-based process is consistent with

McConkie et al. (1988; McConkie et al., 1994) and many other theories (e.g., O'Regan, 1990;

Rayner & Pollatsek, 1989; Stark, 1994; Suppes, 1990; but see Legge et al., 1997, and Shillcock

et al., 2000). Unlike models that assume a default word-by-word reading strategy (e.g., Morrison,

1984; O'Regan, 1990; Reichle et al., 1998), the current model assumes that each word within the

window of ST node has a certain probability of being fixated, and the actual decision is made

probabilistically. It also differs from the two previous Markov models (Stark, 1994; Suppes,

1990). In Suppes’ model WHERE and WHEN decisions were made independent of previous eye

movements. The current model extends it to represent dependencies between consecutive eye

movements. One problem with Stark’s model is that by making every word a potential target at

any moment, the model has a necessarily large transition matrix that contains mostly near-zero

probabilities, making probability estimation very difficult. In contrast, the current model uses a

local representation – only words near the current fixation are considered, which allows more

accurate estimation.

were combined with 4-word forward saccades.

5. FDCt represents the fixation duration category of the current fixation. As shown in

Appendix B, fixation duration could be modeled as a mixture of three lognormal distributions.

The FDC node controlled the mixture rate. It was modeled as a discrete random variable with

three states – short (S), medium (M), and long (L) fixation. The FDC was a hidden node because

its state was not directly observable. Its value was probabilistically inferred (estimated) from

observed fixation duration. Like the ST node, the probability of making a short, medium, or long

fixation was conditioned on the input variables and the previous fixation duration category

(FDCt-1). Internally, it kept a table of 3 (FREQ) x 3 (WLEN) x 2 (ECCEN) x 3 (FDCt-1) x 3

(FDCt) = 162 adjustable probabilities, 16 of which were free parameters.

6. PSLt is the planned saccade length, which is the distance (in pixels) from the current

fixation location to the center of the intended word. It was modeled as a continuous random

variable. It was an observed node during model fitting, because it was calculated from empirical

eye-movement data. Therefore, the arrow between STt and PSLt should be ignored during model

fitting. During simulations, it was computed based on the current fixation position and the

coordinates of the intended word, which was determined by the value of the ST node. The arrow

with a dotted line between STt and PSLt signifies this dependency during simulation.

At the bottom of the figure is the output layer of the model, which includes SACCt and

DURt nodes. They take commands from the eye-movement control nodes and “execute” eye

movements. Both of the variables were continuous, corresponding directly to what would be

measured by an eye-tracker.

7. SACCt is the saccade to be carried out at the end of the current fixation t. It is

measured in pixels in the current model. A positive number corresponds to a saccade to the right

of the current fixation position. Normally this means a forward saccade, but under rare

conditions it would also be a regressive saccade going from the beginning of a line to the end of

the last line. Conversely, a negative number typically means a regression, except for return

sweeps, in which the eye goes from the end of a line to the beginning of the next.

Following McConkie et al. (1988)33, SACCt was assumed to follow a normal distribution,

whose parameters – mean and variance – were determined by the STt and PSLt nodes. More

specifically,

mean(SACCt)= ai + bi * PSLt , and

var(SACCt)= si ,

where i (i= 1..7) corresponds to the current state of the ST node, PSLt is the currently intended

length of saccade, and ai , bi, and si are constants estimated during model fitting. In other words,

the SACCt node kept a different set of parameters (ai , bi, and si ) for each type of saccade move.

Note that the current parameterization was a simplified version of McConkie et al.’s results34. In

the current model no assumption about the variance for each saccade move (which determined

the planned saccade length) was made; it was left for the model to learn from data.

33 Using the notations in E-Z Reader model (Reichle et al., 1998; see Chapter 2), McConkie et al.’s formula for

landing position may be reformulated in terms of mean saccade length:

Mean Saccade Length PSLbaPSLPSLPSL mmbmb ⋅+=⋅Ψ−+Ψ⋅Ψ=Ψ⋅−Ψ+= )1()(

34 Some factors, for example word length, were not taken into account. In addition, McConkie et al. estimated that

the variance of the landing position distribution was a cubic function of launch sites (PSLt). This cubic function is

not implemented in the current model because the scatter plot in their paper (Figure 4) showed that the cubic trend

was not strong.

8. DURt represents the logarithm of the duration of the current fixation. DURt followed a

normal distribution, with a different mean and variance for each state of the FDCt node. More

specifically,

mean(DURt)= ai, and

var(DURt)= si,

where i (i= 1..3) corresponds to the current state of the FDC node, and ai and si are constants

estimated during model fitting.

Over the long run the output of the DURr node would be a mixture of three normal

distributions because of the three different set of parameters. The exponent of the DUR variable,

consequently, would follow mixture-of-lognormal distribution, which has been shown to be a

good model of the distribution of fixation durations (Appendix B). During model fitting the

empirical fixation duration was first log-transformed. In the simulation the reverse

transformation (exponential) was applied to the output values of the DUR node.

In addition to the nodes, the arrows in the figure were equally important to the structure

of the model. They represented the direction of causality in the model (Heckerman, 1998; Perl,

2000). In particular, the current model assumed that both WHEN and WHERE decisions were

affected by the three input variables – FREQ, WLEN, and ECCEN. The strength of these factors

was to be estimated from empirical data.

From the control layer to output layer, the current model assumed that the WHERE and

WHEN pathways are (conditionally) independent. There was no arrow between ST and FDC

nodes, ST and DUR nodes, or FDC and SACC nodes. The model also excluded any cross-

pathway connections from fixation t to fixation t+1. These independence assumptions were

made to simplify model conception and computation. However, this did not imply that SACC

and DUR nodes are independent. On the contrary, statistically and conceptually, saccade length

and fixation duration in the current model were correlated because they both shared the same

“parents” – the input nodes. If a close examination of the model shows that the empirical relation

between saccade length and fixation duration cannot be captured by the current model structure,

some of the independence assumptions may be relaxed.

Temporal Dynamics

SHARE modeled three kinds of variation in reading eye movements – (a) the inherent

randomness of perceptual, cognitive, and oculomotor processes, (b) the variation of the current

linguistic and other input, and (c) the time-dependency of the eye-movement process. The first

two were captured by the hierarchical, probabilistic model structure. The time-series nature of

eye movements was modeled with the temporal links (the two self-pointing arrows beside ST

and FDC nodes) at the eye-movement control level.

Like other arrows, the self-pointing arrows indicated that the state of the random

variables (ST or FDC) at time35 t was dependent on that of t-1. Conditioned on the input nodes,

the ST and FDC nodes followed a first-order Markov model. The model used this short-term

temporal dependency to approximate possibly complex time-series effects in eye-movement

programming. Given that most temporal effects reported (e.g., the spill-over effect, optimal

viewing position effects) are in fact confined to consecutive eye movements, it was expected that

the first-order Markovian process should capture most of the temporal dynamics in reading eye

35 Eye movements were treated as discrete time events.

movements.

Model Fitting and Parameter Learning

Three features distinguished the fitting of the SHARE model from the modeling efforts

reviewed previously. First, the model was completely individualized, which means that every

parameter was adjusted so that the model best captured the reading eye movements of a

particular reader. Wide ranges of individual differences in reading eye movements have been

well documented for over a century. One of the goals of this research was to find a way to fully

describe these differences. I did not attempt to construct age-group-average models because

without an understanding of the differences between individuals, a group average model would

be impossible to interpret.

In addition, the model parameters were not estimated from a set of statistics computed

from eye movements, as all previous models have done. Instead, the present model was fitted to

the raw data. In other words, every fixation and saccade a reader made was used to adjust, or

“train,” model parameters. The goal of the model fitting process was to maximize the overall

goodness of fit of the model. The goodness-of-fit index used here was the log-likelihood of the

model, which is the logarithm of the probability of the data being produced by the model.

Finally, the Bayesian method was employed to achieve the above two goals. A critical

challenge for fully individualized modeling is that there may not be enough data to reliably

estimate all parameters. For example, the overall probability of making a 5-word forward

saccade was often less than 0.05 for third-grader readers. If a child made 2,000 fixations, there

would be fewer than 100 in this category. Further divide these 100 fixations by the number of

combinations of FREQ, WLEN, ECCEN, and STt-1 nodes, which is 126, and some of the cells

were bound to be empty. Thus, estimating parameters of these cells would have been impossible.

Conceptually, a sensible way to deal with this situation is to estimate them with group

averages – when data from many readers are pooled together, hopefully these parameters become

estimable. The Bayesian method is uniquely suited to implement this intuition. With the

Bayesian method, we first impose a prior probability distribution, centered at the group average,

over the parameter we want to estimate. The prior probability distribution represents our belief or

knowledge about the value of the parameter. When there is no observed evidence regarding this

parameter, the posterior probability distribution is simply the prior distribution, and our best

guess in this case is the group mean. In addition to these trivial cases, the true power of the

Bayesian method is its ability to estimate posterior probability distribution when there are limited

observed data, in which case the combination of prior knowledge and empirical data narrows

down the posterior distribution, resulting in accurate parameter estimation (see Bernardo &

Smith, 1994, and Smyth, Heckerman, & Jordan, 1996, for the Bayesian methods). Therefore, in

the current model priors were used in estimating all parameters.

The fitting of an individual SHARE model involved two major steps – (a) specifying the

prior distributions for each parameter and (b) looping through eye movements of a reader and

adjusting the parameters according to the Bayes rule.

Specifying prior distributions. Because the input variables FREQ, WLEN, ECCEN, and

PSL (during model fitting) were observed, they were not estimated and therefore did not need

priors.

The prior distributions for parameters of the ST node were assumed to follow Dirichlet

distributions (the most common prior distribution for discrete variables; see Bernardo & Smith,

1994, and Murphy, 2001). The parameters of the Dirichlet distributions were determined in the

following way. First, the overall probabilities of the seven saccadic moves (see previous

discussion on the ST node) were calculated over the whole age-group dataset36. This set of

probabilities was replicated 126 times to fill all combinations of FREQ, WLEN, ECCEN, and

STt-1, and these 882 probabilities were set as the parameters of the 126 Dirichlet distributions.

The above steps defined our a priori knowledge about the individual reader – we assumed that

the reader was an average reader of his/her age group, and that none of the input factors had any

effects on his/her saccade programming.

The prior distributions for the FDC node were also Dirichlet distributions, but their

parameters were estimated differently from that of the ST node because FDC was unobservable.

The first step was to estimate the overall probabilities of making short, medium, or long

fixations. This was done by fitting the reader’s fixation duration to a simple Gaussian-mixture

model (McLachlan & Peel, 2000), as in Appendix B37. Once the personalized overall

probabilities were estimated, they were copied 54 times to fill all combinations of FREQ,

WLEN, ECCEN, and FDCt-1, and these 162 probabilities were set as the parameters of the 54

Dirichlet distributions. This was equivalent to the assumption that neither the input variables nor

the previous state of the FDC node had any effect of the current state of FDC.

There were three parameters for the SACC node – the intercept (ai), the slope, (bi), and

36 These simple probabilities would be all the information necessary for a zero-order Markovian minimal-control

model (Suppes, 1990, 1994).

37 Note that the fitting of the Gaussian-mixture model itself involved Bayesian modeling, where its prior was set to

a Dirichlet distribution with group averages as parameters.

the variance (si), all of which were conditioned on the state of STt. The SACC node itself

followed a normal distribution whose mean was determined by ai, bi, and PSLt. The priors were

assumed to follow normal-gamma distributions (the most common prior distribution for normal-

distributed random variables; see Bernardo & Smith, 1994). The initial values of ai, bi, and si

were estimated using regression analyses of all eye-movement data from the appropriate age

group. For example, to obtain estimates of the intercept, slope, and variance for refixations, all

refixations in the age group were entered to the regression model

SACC = a + b * PSL,

and the estimated parameters were used as the parameters for the prior distribution for

refixations.

Finally, the DUR node also followed a normal distribution, but its parameters were

assumed to be “clamped,” meaning that they were fixed and were not adjusted during model

fitting. The reason to clamp the parameters was to be consistent with the 3-component

lognormal-mixture model (Appendix B). If the mean and variance were allowed to change under

different combinations of the input variables, the resulting distribution of the DUR node would

be a mixture of many normal distributions rather than a 3-component normal mixture. The values

of the parameters (means and variances) were estimated as a by-product of estimating the prior

distribution for the FDC node. In fitting the (personalized) Gaussian-mixture model, the mixture

rate was used as the prior for FDC, and the estimated mean and variance for each component

normal distribution were set as fixed parameters for the DUR node. Therefore, although the DUR

parameters did not change in model fitting, they were still fully individualized.

Bayesian parameter estimation. Once the priors were set, the model was ready to be

trained with empirical eye-movement data. An exact inference version of the Boyen-Koller

inference algorithm for dynamic belief networks (Boyen & Koller, 1998a; Boyen & Koller,

1998b; see Murphy, 2001for implementation details) was used. The technical details of the

algorithm will not be discussed here. Conceptually it looked for the maximum posterior

probability solution given the prior distribution and data (Cowell, 1998a; Cowell, 1998b;

Heckerman, 1998). The iterative algorithm stopped when the improvement of the goodness of fit

index – the log-likelihood – was under a threshold. The chance of stopping at a local maximum

instead of global optimum solution was minimized by both the use of reasonable prior

distributions and using multiple (3) runs with different random seeds in estimating the Gaussian-

mixture model.

Model Adequacy and Comparison

From the perspective of an empirical researcher, it is natural to ask the question of

whether a model is adequate. However, the question is difficult to answer in the absolute sense.

Statistically, it is more sensible to compare the relative goodness of fit of different models. The

ultimate answer to the question depends on one’s goals.

The adequacy of the SHARE model is addressed in two ways. First, compared to various

reduced versions of the model, the complete and trained model gained significantly in likelihood

ratio tests. The improvements were examined separately for the WHEN and WHERE pathways,

because they were conditionally independent and the overall log-likelihood was the sum of the

log-likelihood indices for the two channels. Likelihood ratio tests were performed for each

individual and the following findings held for each individual reader.

For the WHEN pathway, there was a statistically significant gain in goodness of fit of the

simple Gaussian-mixture model when the parameters were individualized. When the Gaussian-

mixture model was further compared with the full SHARE model (WHEN pathway only) that

took into account the input variables and temporal dynamics, there was a statistically significant

gain by the latter. Similarly, the complete WHERE pathway was shown to be statistically

superior to a model that assumed no individual differences, no effects from the input variables,

and no temporal connections. Together, these results suggest that the more complex structure in

SHARE is necessary to account for reading eye-movement data, and its performance was better

than some simple models of eye movements.

Because the emphasis of the present research is to establish the SHARE architecture, a

comprehensive analysis of the model is beyond the scope of the current report. Future studies

will address some important issues, such as the relative contribution of different input variables

to the two pathways and whether some interaction between the two pathways would increase

model fitting. The next chapter will focus on simulation studies of the SHARE model and

compare eye-movement behaviors of the model to real readers.

CHAPTER 5. SIMULATION RESULTS

The Markovian structure of the SHARE model is very suitable for running simulations.

The model took a text, coded in terms of word frequency, word length, and the x-coordinates of

the beginning and end of the words, “read” through it according to its parameters, and stopped

reading when it reached or passed the last word of the text.

In the simulation study, each individualized model read through the same texts that the

corresponding human reader had read. Eye-movement characteristics of the reader and the model

were compared.

Simulation Method

Materials. Preparing reading materials for the model was straightforward. Each word in

the texts used in Miller and Feng (in prep.) was simply coded with four variables – FREQ,

WLEN, x1, and x2. The latter two simply marked the horizontal position of the word in screen

coordinates38. FREQ and WLEN were defined in the last chapter (see Figure 10).

Procedures. Model parameters of the particular reader were loaded. For each trial, the

model was assumed to always start with a fixation on the first word. Other parameters were

initialized as follows39: STt=0 was set to “forward 1 word,” FDCt=0 was set to a medium fixation,

and ECCENt=1 was “central fixation.”

With these initial values and the values of the input variables FREQN=1 and WLENN=2,

SHARE was able to find the appropriate STt=1 and FDCt=1. For example, STt=1=x was the

38 In the Miller and Feng (in prep.) study there were multiple lines of text per screen. However, the y-coordinates

are not interesting in reading except for distinguishing lines. They were trivial to model and were not included here.

conditional probability:

P(ST=x| STt-1=STt=0, FREQ=FREQN=1, WLENN+1=WLENN=2, ECCEN=ECCENt=1)

All combinations of these probabilities were estimated and stored internally in a parameter table

in the ST node. Therefore, finding P(STt=1=x) was simply a table lookup with the values of the

input variables and the previous ST as indices. The procedure for finding P(FDCt=1=x) was

similar.

The next step was to generate eye-movement commands. The value of the ST node was

randomly generated from the discrete distribution P(STt=1=x), where x was one of the seven

possible saccadic moves. The resulting random sample indicated the target word for the next

saccade. Similarly, the value of the FDC node was also randomly generated, which was the

category of the current fixation duration. An additional step in the WHERE pathway was to

calculate the value of the PSL node. The planned saccade length was the displacement between

the current position of the fixation and the center of the targeted word, as indicated by the current

value of ST. The calculation of PSL from ST was completely deterministic.

Next, the eye-movement commands were passed down to the WHEN and WHERE

pathways for execution. For the WHEN pathway, the conditional mean and variation of the DUR

node, given the current FDC value, were retrieved from the table of parameters stored in the

DUR node. Then a random sample from the normal distribution specified by the conditional

mean and variation was drawn. The exponent of this random sample was the duration of the

current fixation. The processes in the WHERE pathway were similar. The SACC node was also

39 Hereby t represents the current fixations, and N represents the current word number.

assumed to be a normally distributed variable, whose mean and variance were determined by the

current values of ST and PSL nodes:

mean(SACCt)= ai + bi * PSLt , and

var(SACCt)= si ,

where i is the current value of ST, ai, bi and si are parameters associated with i that were

estimated during model training. The actual saccade length was a random sample from the

normal distribution specified above.

Now the first fixation on a page had terminated and the first saccade had been made.

Some information needed to be updated at this point. Now, t=2, and N=N+STt=1 (i.e., the current

word was set to the targeted word; see below for exceptions). The ECCENt=2 was computed as

specified in the last chapter. The FREQ and WLENN+1 values were also updated. With all values

of the input nodes updated and the past values of ST and FDC nodes available, the model was

ready to repeat the above process and generate the next fixation duration and saccade move. The

process would iterate until the targeted word in ST node was beyond the last word in the text.

Problems arose when the difference between PSL and SACC was so large that the next

fixation would land on a word other than the targeted word. In this case the model simply took

the fixated word as if it were the targeted word, and calculated ECCEN, FREQ, etc. based on the

actual fixated word. Other treatments were possible but not explored here. If, after a regression,

the “eye” was sent to a word before the first word, it was simply redirected to the first word.

Ten simulation trials were run for each of the 76 individualized models, including both

children and adults, with different random seeds. The “eye movements” and the corresponding

word information were recorded for further analyses.

Distributions of fixation durations

The upper left panels of Figures 11-1 through 76 (one for each participant) show the

frequency distributions of empirical and simulated fixation duration. Note that the simulated

frequencies were divided by the number of simulation trials (10) so as to be with the same scale

as the empirical figures. In general, the simulated data appeared to follow empirical distributions

closely and was responsive to individual differences.

A formal statistical test of the hypothesis that two distributions are identical is the

Kolmogorov-Smirnov (K-S) test (Birnbaum, 1952; Conover, 1999; Hall & Wellner, 1980). The

K-S statistics involves calculating a critical value, w1-α, which is a function of the confidence

level α. If at any point along the distribution, the cumulative distribution function of another

distribution is more than w1-α away from that of the sample distribution, we reject, with

confidence level α, the hypothesis that the other distribution is the population distribution of the

sample. For large n Hollander & Wolfe (1999) introduced an approximation formula:

w1-α = n2

)2/ln(α− .

For α=0.05 and n=1000 (most readers have between 1,000 and 2,000 fixations), w1-α is

approximately 0.043.

The K-S test can be carried out visually. The lower left panels of Figures 11-1 through 76

show the cumulative distribution functions of empirical and simulated fixation duration. A

vertical bar at the top-left corner of each figure shows the magnitude of w1-α for that particular

reader. If the vertical difference between the two cumulative distribution functions exceeds the

length of the bar, SHARE is not a statistically adequate model of fixation duration. In fact, none

of the 76 individual simulations differed statistically significantly from the empirical data.

Distributions of Saccade Length

The empirical and simulated distributions of saccade length were compared for each

individual model with the same procedures as for fixation duration. Frequency distributions for

saccade length were shown in the upper right panels of Figures 11-1 through 76. The simulated

frequency distributions appear to fit fairly well with the empirical data. The model was able to

generate return sweeps as well as progressive and regressive saccades in approximately correct

proportions. Cumulative distribution functions were shown in the lower right panels of Figures

11-1 through 76. The small vertical bar at center-top of each figure represents the magnitude of

the w1-α for the reader. The simulated distributions, the smoother curve, also appear to follow the

empirical distributions closely. However, the K-S tests showed that in each of the 76 trials the

simulated distribution was statistically significantly different from the empirical one.

Three systematic discrepancies were apparent in the frequency distributions and

cumulative distributions. First, the model sometimes failed to show the dual-peak structure near

zero in the some of the empirical data. The saddle around zero indicated that readers were

unlikely to make very small saccades. This is consistent with O’Regan’s (1990) finding that a

refixation tends to land on the opposite size of the word from the previous landing position.

Interestingly, however, not every reader showed the fine structure (adult readers were more

likely to show the saddle), and the model was able to demonstrate the saddle in some cases. This

suggests that the model was able to capture the phenomenon, but some of the parameters were

probably not optimized.

In addition, the model slightly but consistently overestimated the longest saccades, which

was evident in the lower right panels of Figures 11-1 through 76, where the simulated

distribution function was consistently lower than the empirical curve near the top of the chart.

The likely cause of the problem is that the variance parameter in SACC node was overestimated

for the “four words or longer forward saccade” category. This category had relatively few but

very heterogeneous cases, which tended to lead to an unstable variance estimate. It is also

possible that in these cases the landing position distributions might no longer follow a normal

distribution but instead a skewed distribution. This would also lead to elevated variance

estimates under the normal distribution assumption. Future research is needed to explore ways to

model this heterogeneous category by using non-normal distributions.

Lastly, the model predicted a small but visible number of between-line regressions –

extremely long saccades involving regressions from the beginning of a line to the end of last line.

These saccades did occur in data, but were not as frequent as the model suggested. Between-line

regressive saccades did not require any special mechanism in the present model. The ST node

generated a regression command without knowing in which line the word was located. If the

target happened to be in the previous line, a long, between-line regression was generated. Thus,

the frequency of this type of regression was no different from regular regressions, according to

the current model. However, the empirical data suggest that the frequency of between-line

saccades is lower than expected. In the future information such as line number may be added to

the model to suppress these between-line regressions.

SHARE in Conventional Eye-movement Measures

To relate the SHARE model to traditional eye-movement theories, and to demonstrate its

ability to capture moment-by-moment processes, the following analysis compared simulated and

empirical eye movements in terms of some conventional eye-movement measures.

The structure of the analysis was intentionally borrowed from the E-Z Reader modeling

(Reichle et al., 1998). Reichle et al. classified words into five frequency categories and

summarized eye-movement data using six word-based measures. The measures were: (a) first

fixation duration, (b) single fixation duration, (c) gaze duration, (d) skipping rate, (e) the

probability of making a single fixation on the word, and (f) the probability of making two

fixations on the word.

In the current analysis, the same procedure was followed, except word frequency was

coded in three levels (as part of the model specification) instead of five. But instead of predicting

one set of group means, the current model had to predict 76 sets of individual means. This was a

more stringent test because the model needed to accommodate a wide range of individual

differences – from beginning readers to adults. The added degrees of freedom also made the

results more interpretable, in view of the collinearity problem in these measures (see Appendix

Figures 12 through 17 compared the simulated and empirical values of the six measures.

Each point represents an individual mean. As seen in the figures, not only were the empirical and

simulated values highly correlated, but the model also reproduced the absolute values with

reasonable accuracy. It is worth noting that in Figure 17, the probability of making double

fixations on a word had a fairly restricted range, and yet the model was still able to predict the

values.

On the other hand, there were some systematic differences between the simulated and

empirical data. For example, the model was able to reproduce fairly closely the probability of

making single fixations on a work (Figure 16), but consistently (although only slightly) under-

predicted single fixation duration (Figure 15). The model did not have a special mechanism to

program a single fixation, and therefore its single fixation duration means should be identical to

first fixation duration. The simulation results suggested that average fixation duration increased

when only one fixation was made on a word, compared to cases with multiple fixations. Future

research is needed to examine whether the increase of mean fixation duration is a result of

change in the weights of the fixation duration categories or in the means of these components.

Overall, the analysis showed that, with few assumptions about mechanism, the SHARE

model was able to reproduce eye-movement details, measured by conventional eye-movement

measures. Furthermore, SHARE provided a set of terminology, such as saccade targeting

probabilities and fixation duration categories, that can reproduce eye-movement distributional

data for individuals. Because the parameters of this model are more tractable and accessible than

the raw distributions, this can be an important step toward developing an empirical methodology

for implementing and evaluating the claims of contending models of eye-movement control in

reading.

Summary

Once a SHARE model was trained with an individual reader’s eye movement data, it

captured the essence of the data and encapsulated it in the model parameters. Given the same

reading materials, the model could reproduce eye movements that were quantitatively similar to

the original empirical data, as the above simulation study demonstrated.

The simulation also showed that the SHARE architecture was able to adapt to beginning

and skilled readers. In addition, the Markovian structure at the control level of the model

naturally accounted for temporal dynamics in reading. When assessed using the conventional

eye-movement measures, the model was able to quantitatively reproduce empirical values.

Compared to many existing models, the graphical model is simple and its statistical

characteristics are well understood. Therefore, the SHARE structure is suitable as a general

platform of communication in the field of reading eye movement research.

The simulation and the analyses only illustrated a small portion of the potential of the

SHARE architecture. For example, it would be interesting to study its ability to predict the next

eye movement on the basis of eye movements that a reader has already made. The simulation

also suggested several aspects of the current implementation of the model that need refinement,

including the handling of refixations in some readers and the issue of single fixation duration.

The next chapter will show how analysis of the parameters of individual SHARE models can

shed light on what aspects of eye-movement control develop as children become more skilled

readers.

CHAPTER 6. DEVELOPMENTAL CHANGES OF READING EYE MOVEMENTS

The previous chapter has shown that the SHARE model is able to capture a wide range of

individual differences in reading eye movements. It may also prove useful in capturing in a

concise manner developmental differences in reading eye-movements, which will in turn provide

the basis for theorizing about what cognitive processes change with the acquisition of reading

skill.

Previous Research on the Development of Reading Eye Movements

How do eye movements change with age and reading proficiency? A few studies have

investigated this question, and most of them have reported only global statistics. Table 1 (from

Table 4 in Rayner et al., 1998) summarized some global measures of reading eye movements

from previous studies (Buswell, 1922; McConkie et al., 1991; Rayner, 1986; Taylor, 1965).

Mean fixation duration declines with age, although the absolute values of the means and the

range of developmental changes vary among studies. Developmental changes in saccade patterns

are more difficult to describe. Based on the incomplete list of two variables in Table 1, skilled

readers cover the same text with fewer fixations, although it is not consistent across studies

whether proficient readers have fewer regressions than beginning readers.

The only study that went significantly beyond global statistics is McConkie et al. (1991).

McConkie et al. examined distributions of fixation durations for first- through fifth-grade

students. Three findings were evident from the distributions. First, fixation duration distributions

typically had a single mode at approximately 180 msec, regardless age. Therefore, what drives

the developmental changes in mean fixation duration appears to be the right tails of the

distributions. In addition, there were substantial individual differences in the distributions of

fixation durations, especially among beginning readers. Lastly, McConkie et al. also showed that

the means and higher moments of fixation duration distributions were strongly correlated with

reading abilities.

With regard to saccade control, McConkie et al. (1991) found that first-grade students

showed distributions of landing positions similar to those of adults (McConkie et al., 1988).

Another eye-movement characteristic shared by beginning and skilled readers is within-word

refixations. McConkie et al. (1988, 1991) demonstrated that the probability of making a

refixation on a word is a U-shaped function of the landing site of the initial fixation on the word.

McConkie et al. (1991) also showed that the probability of skipping a word as a function of

saccade launching site increases with age, and the forms of the functions at different grades

resemble adult data (McConkie et al., 1994). Thus, it appears that many of the basic mechanisms

of eye-movement control in reading English are in place after a year of reading experience,

possibly even before any formal reading instruction.

Developmental Analyses Using SHARE

To the extent a SHARE model can simulate individual readers’ eye-movement patterns,

developmental differences in reading eye movements can be studied by analyzing parameters of

individual models. McConkie et al. (1991) showed that developmental changes are more

complicated than what can be described by global measures such as mean fixation duration. The

SHARE architecture is particularly suitable for studying these complex changes, because it

provides a rich set of structures and parameters to describe these differences and is able to

closely simulate readers’ eye movements, as shown in the previous chapter.

This chapter focuses on two developmental issues – the changes of eye-movement

control across age, and the changes of the effects of linguistic, perceptual, and oculomotor

factors on eye-movement control. These correspond to two levels in the SHARE model, namely

the control layer and the relationship between the input and the control layer. Age differences in

individual parameters of these layers are analyzed. Grouping by age risks overlooking

meaningful within-group differences, as age is only a crude indication of reading skill. In the

absence of an independent reading proficiency measure, reading speed (measured in words per

minutes, WPM) was used as an indicator of readers’ reading proficiency. Past research has

shown that reading speed is highly correlated with standardized reading test scores.

Development of Reading Eye-movement Control

One of the core assumptions of the SHARE model is discrete control of eye movements

in the control layer. The probabilities of making each eye-movement command – for example

“forward 2 words” or “long fixation” – form the basis of individual readers’ eye-movement

control strategy. The following analyses explore developmental differences in controlling

saccades and fixation duration.

Saccade targeting. Of the seven potential saccade targets in the ST node, what is the

probability of selecting a particular target? Figure 18 shows the probabilities40 of making

regressions (ST=-1 or –2*), refixations (ST=0), progressing one word (ST=1), and progressing

more than one word (ST= 2, 3, or 4*) as a function of age group and reading speed. Some

categories were combined to simplify data presentation.

40 These are the unconditional probabilities, i.e., ignoring the effects of word frequency and alike. They were

computed by collapsing the multidimensional frequency tables in the ST node into a single dimension table.

The probability of regressions did not differ across age, F(2, 73)=1.25, p=1.86,

MSE=0.0018. Regression rates were around 15% for all age groups, which is remarkable given

that the adult readers were reading simple, elementary-school-level stories. Some college student

reading as fast as 600 words per minute made 25% regressions, more than any third-grade

student did.

Refixation rates showed a significant decrease with age, F(2, 73)=105.3, p<0.001,

MSE=0.0036. A post-hoc comparison with Bonferroni adjustments showed that each age group

was significantly different from others.

The probability of progressing one word showed a significant decrease with age, F(2,

73)=12.1, p<0.001, MSE=0.0039. A Bonferroni post-hoc comparison showed that while both

third- and fifth-grade groups differed significantly from adults, they did not differ significantly

from each other. The magnitude of the difference was rather small – approximately 32% for

children versus 25% for adults.

Finally, the largest developmental difference was an increase in the probability of

progressing two or more words, F(2, 73)=134.4, p<0.001, MSE=0.0052. Each age group differed

significantly from others.

To summarize results on the developmental patterns of saccade control, the largest

differences between beginning and skilled readers lie in the tradeoff between making refixations

and making long (2 or more words) forward saccades. Comparatively, the differences in

regression rate and the probability of moving forward one word at a time are small.

Fixation duration. According to the present model the distribution of fixation durations is

a mixture of three components, each of which follows a lognormal distribution. Developmental

changes in the proportions, modes, and variance of the components are analyzed below.

Figure 19 shows the proportions of the three types of fixations. There was no significant

age effect on the probability of making short fixations, F(2, 73)=2.157, p=0.123, MSE=0.0021.

The probability of making long fixations showed a significant decrease with age, F(2, 73)=27.3,

p<0.001, MSE=0.0056. A post-hoc test showed that third- and fifth-grade students did not differ

significantly from each other, but both differed significantly from adults. Because they add up to

1, the probability of making medium fixations also had a significant age effect, increasing with a

age, and similar post-hoc results.

Figure 20 shows the modes (corresponding to the means of the logarithm of fixation

durations in the model) of the three components of fixation durations as a function of age and

reading speed. Overall, the largest change appears to be the decrease of long fixation modes with

age and reading speed.

For short fixations, the average mode increased slightly with age (from 62 msec to 67

msec to 73 msec) although the difference was statistically significant, F(2, 73)=8.591, p<0.001,

MSE=0.0080. The differences between the children were not significant in a post hoc test, but the

child-adult difference was significant.

There was also a significant but small age effect in the modes of medium fixations, F(2,

73)=14.8, p<0.001, MSE=0.0034. The drop from 202 msec (3rd grade) to 198 msec (5th grade)

was not significant, but the mode for adults, 179 msec, was significantly lower than either of the

children groups.

A strong age effect was observed for long fixations, F(2, 73)=35.9, p<0.001,

MSE=0.0017. Again, third-grade (319 msec) and fifth-grade (292 msec) values did not differ

significantly, but both differed significantly from that of adults (221 msec).

Lastly, variances of the three components as functions of age and reading speed are

shown in Figure 21. There was no significant age effect for the variance of short fixations, F(2,

73)=1.52, p=0.225, MSE=0.0028. The variance of medium and long fixations decrease

significantly with age, F(2, 73)=11.80, p<0.001, MSE=0.0004, and F(2, 73)=17.74, p<.001,

MSE=0.0014, respectively. In both cases, the two young age groups did not differ significantly

from each other but both differed significantly from adults. Again, the largest age-related

difference is the decrease in the variance for long fixations.

The above analyses of the control layer parameters provide some new pieces of

information to the understanding of reading development. Regarding saccade programming,

beginning and skilled readers differ not in regression rate, or in the overall probability of making

forward saccades. The developmental change is rather specific – skilled readers tend to make

fewer refixations, and make more rather long forward saccades (two words or more).

With respect to fixations, findings from the present study concur with McConkie et al.

(1991)’s observations that the modes of fixation duration distributions do not change much with

age but the tails of the distributions becomes less heavy. In addition, the discrete FDC node in

the SHARE model provides a quantitative description of these developmental changes. By

decomposing the overall distributions into three components, it is shown that the characteristics

of the briefest fixations do not change substantially with age. The medium fixation component,

corresponding to the modes of the distribution, becomes slightly shorter and denser, but the

changes are small compared to the third component. What really accounts for the developmental

changes is the third, long fixation component – its proportion, mode and, variance decreased

substantially with age.

Effects of Input Variables on Eye-movement Control

The above analyses ignore effects of input variables on the control layer. However,

Chapter 4 has shown clearly that these input variables contribute significantly to the explanatory

ability of the SHARE model. Their effects are investigated below.

Under the present implementation, the input variables were represented as ordinal

discrete variables (e.g., low, medium, and high frequency; although in the future they may be

continuous). Therefore their relations with the control nodes – ST and FDC, which are also

discrete and ordinal – are represented in multidimensional contingent tables. The current report

will focus only on the main effects of each individual input variable, that is, only the two-

dimension contingent table between an input variable and a control node will be analyzed.

Interactions between these variables will be explored in future research.

The strength of association of a two-dimensional contingency table is summarized using

Goodman and Kruskal’s Gamma (1954; 1963; see also Agresti, 1990). Gamma, a scalar ranging

from -1 to 1, measures the association between two ordinal, discrete variables. It is defined as the

difference between numbers of concordant and discordant pairs divided by the sum of the two

counts, where discordant pairs are cases where the two variables vary in opposite directions, and

concordant pairs are cases where the two variables change in the same direction (ties are

excluded; for mathematical definition, see Agresti, 1990). Goodman (1963) showed that a

Gamma computed from a sample follows an asymptotic normal distribution, whose mean is the

population Gamma and variance is a complex function of the frequencies of concordant and

discordant pairs. In the following analyses, the effect of an input variable on eye-movement

control is represented with the corresponding Gamma. The present report concentrates on two

issues related with development: the proportion of readers in each age group showing

statistically significant effects and the change of the absolute values across age. The input

variables include word frequency, the length of the next word, landing position of the current

fixation, and the state of the previous eye movement (i.e., the previous values of ST and FDC

nodes).

Input variables and saccade programming. Figure 22 shows the effects of the four input

variables on saccade targeting (the ST node). The two horizontal lines in each graph mark the

95% confidence interval of the Gamma41. In other words, data points that fall between the lines

were not statistically significantly different from zero.

Word frequency has a significant effect on saccade programming for nearly all young

readers but for fewer adults. The Gammas were significantly different from zero for 95% of

third-grade and 93.3% of fifth-grade students, but only for 67.7% of adults. A Chi-square test

showed significant age effect, χ2(2)=9.63, p=0.008. The ANOVA of the Gammas by age group

was significant, F(2, 73)=18.6, p<0.001, MSE=0.0018. A Post hoc analysis showed that both

third- and fifth-grade students differed from adults but not from each other.

The length of the next word appears to have the opposite pattern. The Gammas were

significantly different from zero for 40% of third-grade readers, 86.7% of fifth-grade students,

and 100% of adults. A Chi-square test showed significant age effect, χ2(2)=15.2, p<0.001. The

41 The confidence interval of Gamma varies somewhat for each reader. In order to visually represent the interval,

the average confidence interval is used here.

ANOVA of the Gammas by age group was significant, F(2, 73)=42.8, p<0.001, MSE=0.0033. A

Post hoc analysis showed that every age group is different from each other.

The picture for landing position is different. The Gammas differed significantly from

zero for 35% of third-grade, 23.3% of fifth-grade students, and 7.7% of adults. A Chi-square test

showed significant age effect, χ2(2)=0.615, p>0.5. The ANOVA was also nonsignificant.

In contrast, for the effect of the previous saccade move, the Gamma for every reader was

significantly different from zero. The ANOVA of the Gammas by age group was significant,

F(2, 73)=10.45, p<0.001, MSE=0.0039. A Post hoc analysis showed that the third-grade students

did not differ from adults but both differed from fifth-graders.

Input variables and fixation duration control. Similar analyses also examined the effects

of input variables on the FDC node, and are presented in Figure 23.

Word frequency showed a significant effect on saccade programming for 60% of third-

grade, 46.7% of fifth-grade students, and 50% of adults. The Chi-square test was not significant,

χ2(2)=0.314, p>0.50. The ANOVA of the Gammas by age group was also nonsignificant, F(2,

73)=1.184, p=0.312, MSE=0.0053.

The length of the next word showed a similar developmental pattern but overall weaker

effects. The Gammas were significantly different from zero for 10% of third-grade readers, 30%

of fifth-grade students, and 19.2% of adults. A Chi-square test was not significant, χ2(2)=0.3479,

p>0.50. The ANOVA was nonsignificant, F(2, 73)=1.232, p=0.298, MSE=0.0035.

Landing position showed a development effect. The Gammas differed significantly from

zero for 55% of third-grade, 50% of fifth-grade students, and 7.7% of adults. The Chi-square test

was not significant, χ2(2)=3.09, p=0.214. However, the ANOVA test was significant, F(2,

73)=14.37, p<.001, MSE=0.0055.

Lastly, for the effect of the previous saccade move, the Gammas differed significantly

from zero for 80% of third-grade, 53.3% of fifth-grade students, and 30.8% of adults. The Chi-

square test was not significant, χ2(2)=4.189, p=0.123. However, the ANOVA of the Gammas by

age group was significant, F(2, 73)=6.89, p<0.001, MSE=0.0130. A Post hoc analysis showed

that the third-grade students did not differ from adults but both differed from fifth-graders.

Overall, the above results demonstrated that readers at different proficiency levels are

sensitive to different information in programming reading eye movements. When programming

the next saccade, beginning readers are more affected by the frequency of the currently fixated

word but are less affected by the length of the next word, compared to skilled readers. Landing

position also seems to have a larger impact on young readers’ WHEN decision.

Additionally, not all variables have equal effects on different parameters of eye

movements. For example, the length of the next word has very little effect on the duration of the

current fixation but significant effects on the programming of the next saccade, at least for more

skilled readers.

Discussion

What develops in reading eye-movement control? Analyses of the parameters of

individual SHARE models suggest that that as readers become more proficient, their eye

movements are less affected by features of the currently-fixated word (e.g., word frequency and

landing position) or the state of the previous eye movements (e.g., previous values of ST and

FDC nodes). Skilled readers take into account of the length of the next word in programming the

next saccade, and they tend to move further into the unread text.

Results based on analyses of SHARE’s parameter space confirmed many previous

knowledge about the development of reading eye movements. Furthermore, SHARE was able to

explore important questions that were unanswered in prior research. For example, it is found that

temporal-dependency in eye-movement control decreases slightly with age, but the effects

remain for most adult readers.

A unique feature of the SHARE model is that it models temporal dependencies between

consecutive eye movements. Evidently, these temporal dependencies were among the largest and

most consistent effects on eye movement control. More interestingly, temporal dependencies

decrease in strength with reading proficiency, which suggests that skilled reading eye

movements become more like a zero-order Markov (random-walk) process.

CHAPTER 7. DISCUSSION

The goal of the current research is to describe reading eye movements mathematically

with minimal assumptions about the mechanisms of the processes. A stochastic, hierarchical

architecture for reading eye-movement, or SHARE, is developed, and a simple model based on

this architecture is tested.

What is SHARE?

SHARE is a mathematical model that is able to reproduce many essential characteristics

of reading eye movements. It is, to my knowledge, the first model that simultaneously accounts

for fixation duration and saccade length in their distributional details, as opposed to only group

means. Its Markovian architecture also gives straightforward explanations to the moment-by-

moment dynamics of eye movements with few a priori assumptions, compared with some

existing models.

SHARE is also unique because of its completely individualized modeling approach,

which contrasts strongly with most, if not all, previous models’ focus on “the average person.”

Reading eye movements are as diverse as are readers themselves. There is no reason to presume

a common set of parameters, or even mechanisms, for all readers. Besides the bias in psychology

to think in terms of “the average person,” a practical obstacle preventing individualized modeling

is that there may not be enough data collected from an individual reader to obtain sound

parameter estimates. The Bayesian method used in SHARE provides a promising way to get

around the problem.

However, the most important contribution of SHARE is not the model in its current form.

Rather, the hope of this research is to introduce a language for describing reading eye

movements.

I argued in Chapter 1 that researchers have struggled to depict reading eye movements

since the discovery of the basic phenomena over a century ago (Javel, 1878). The solutions,

ranging from early attempts to use verbal analogies and visual aids to the latest flourish of

composite eye-movement measures and theories of mechanisms, are far from satisfactory.

The direction of the current research is to separate description from mechanism, and

focus squarely on the former. As a result, the SHARE architecture was designed to satisfy three

logical requirements for describing reading eye movements – that they are probabilistic in nature,

that they are time-series data, and that they are affected by other factors. Of course, some of the

details – e.g., the choice of input variables, the specifications of the nodes (discrete vs

continuous, etc.), or the independence assumptions – are specific to the current implementation.

But nevertheless, the general hierarchical, stochastic architecture has been shown to be flexible

enough to capture much of the essence of reading eye movements, and it has the potential to

become a common language to talk about eye movements.

This brings up the fine but crucial distinction between architecture and a specific theory

implemented under the architecture. It can be argued that SHARE, as implemented in the current

study, is a particular theory of reading eye movements, because it has restricted linguistic effects

on reading to word frequency only, and assumed conditional independency between the WHEN

and WHERE pathways. However, the author has no intention to defend such a theory. Rather, it

was implemented as an example of modeling in the new architecture. The fact that even a

simple-minded “theory” like this could account for many facts of eye movements demonstrated

the power of the architecture.

What SHARE is Not

First and foremost, SHARE, as in its current implementation, is not a theory of reading

eye movements the author wants to promote. As argued above, it is merely a demonstration of an

architecture to mathematically describe patterns of reading eye movements.

Moreover, the SHARE architecture is not a theory of eye-movement control mechanisms.

On the contrary, it is assumed that data and mechanism can be described separately, and SHARE

is intended to be as independent of the mechanism assumptions as possible. For example,

SHARE models what effect word frequency has on saccade targeting, but makes no assumption

about how the effect is possible. It does not say anything about whether the effect of word

frequency happens earlier or later than word length, or whether or not attention has been shifted.

The arrows in the hierarchical architecture represent the direction of causality only; they do not

imply serial processing or even temporal order. In short, eye-movement description is at the

phenomenological level.

SHARE does not compete with existing theories of eye movements; it is a complement.

In a sense it provides a test-bed, where different theories may be implemented on a common

ground and compete with each other. For example, it is conceivable that the E-Z Reader model

could be implemented in a SHARE environment. It would add many processing assumptions to

SHARE, and would make specific predictions about how, for example, word frequency would

affect the control of saccade and fixation duration. In other words, the model would fix some of

the free parameters. The model would then fit empirical data (a built-in feature of the SHARE

architecture), and the result could be compared to a “full” model where the corresponding

parameters were not fixed. Standard statistical tests could be carried out to evaluate the power of

the model.

Of course, it is arguable whether description and mechanism are truly separable. Our

survey of existing eye-movement models suggests, to a large extent, they are. McConkie and

Dyre (2000) have shown that different mechanisms may result in almost identical fits to

empirical data. Conversely, the Mr. Chips model (Legge et al., 1997) demonstrated that a

complex deterministic process could be modeled successfully with simple probabilistic

heuristics. On the other hand, the probabilistic nature of the SHARE architecture precludes

implementing models such as READER, in which eye movements are deterministically decided.

Nonetheless, the Mr. Chips model hints that SHARE may also be compatible with a

deterministic model if the distributional properties of the model are well understood.

Composite Variables Revisited: Implications to Psycholinguistic Research

The proliferation of composite eye-movement measures may reflect researchers’

increasing frustration in describing complex eye movement patterns. However, new measures

have not solved the problem, and in many cases only complicate the matter more.

SHARE suggests a different approach. Instead of summing up fixation duration over time

in idiosyncratic ways, SHARE captures temporal dynamics with its Markovian structure at the

eye-movement control layer. Paired with the power of the hierarchical structure (input variables),

SHARE’s probabilistic representation naturally summarizes endless combinations of eye-

movement patterns. What is variable and elusive in the sample domain can be expressed as stable

parameters in the Markov transition matrices. An analogy is the two representations of speech

signals – what is difficult to perceive in the waveform may be obvious in the spectrogram, and

vise versa. Eye-movement patterns may be difficult to capture in the sample domain, but much

easier to deal with as a Markov transition matrix.

To psycholinguistic researchers, this points to a change in data analysis. For example, in

a hypothetical reading experiment, the researcher manipulates a target word in a sentence so that

in the experimental condition the word does not fit the sentence context whereas in the control

condition it does. The researcher is interested in whether readers detect the improper word within

the region of the next n words (or within x fixations, etc.). Instead of using gaze duration over the

region (or using Liversedge et al., 1998, measures), s/he may define each of the n words as a

state, feed the eye movements within the region into a simple SHARE model42, and estimate the

transition matrices of the ST node for the experimental and control conditions. If readers change

saccade patterns when they see an inappropriate word, different transition matrices are expected

for the two conditions. Fixation duration may be modeled similarly.

There are several potential benefits for using this approach. First, the results may be more

interpretable. Instead of “mean gaze duration increased 15 msec,” one may report something like

“the probability of regressing back to the target word increased from 0.1 to 0.5, and the

probability of making long fixations increased from 0.3 to 0.4.” Furthermore, with enough data

one may be able to estimate instantaneous transition probabilities, e.g., the probability of fixating

the target word in the 2nd, 3rd, … fixation after the first fixation on the target word. This is

valuable information that many researchers have tried to infer from traditional measures such as

first fixation duration and gaze duration. Last but not least, individual differences in reading eye

movements may be estimated and experimental effects may be estimated for individual readers.

42 A simple first-order Markov model may suffice in this hypothetical study.

In sum, for psycholinguistic reading research, the SHARE architecture may provide a

complementary or possibly alternative solution to the eye-movement measurement problem,

although many details have to be worked out in the future.

Applications in Reading Education

One of the original motivations of this research was to use eye movements to detect

processing difficulties in reading. In the early days of eye-movement research, the pioneers

(Buswell, 1922; Buswell, 1937; Dearborn, 1906; Gray, 1922; Huey, 1908) did not hesitate to

point to an “abnormal” eye-movement pattern and conclude that the reader was experiencing

difficulties. Buswell (1937) also distinguished general reading deficiencies from having trouble

with specific words. The problem, of course, was that the inference process was qualitative and

holistic. The “art” of detecting reading difficulties from eye movements disappeared soon after.

Logically, if readers have different eye-movement patterns when they are reading

normally versus experiencing difficulties, one should be able to compare the patterns and detect

the state of the reader. To carry out this process quantitatively, however, one has to be able to

faithfully describe eye-movement patterns associated with different states and probabilistically

infer the state from observed eye movement patterns.

SHARE was developed exactly for this purpose. It is able to summarize a wide range of

eye-movement patterns with the stochastic, hierarchical structure. The Bayesian method can be

used to probabilistically infer the state of an unobserved node in the structure given observed

data (e.g., the value of the FDC node was hidden and was estimated from data). In addition, its

ability to adapt to individual reader’s eye-movement parameters is also essential in performing

the detection task.

As an extension of the current research, a prototype of a reading difficulty detection

model has been developed. At its simplest form, it consists of an input layer with FREQ and

WLEN as input variable, a cognitive-state layer that contains a binary node (troubled versus

normal reading), and the eye-movement layer contains a discrete node that is similar to the ST

node in the current model. The cognitive state node is assumed to be unobserved, and the goal of

the model is to estimate the probability of the states given input variables and an observed

sequence of saccade movements. Initial testing shows that the prototype model is able to

distinguish different patterns of eye movements.

Although the prototype model is far from complete, the initial results are promising. A

next step for the current research is to explore the full potential of the SHARE architecture in

describing and detecting reading difficulties.

SHARE, the mathematical model developed in this research, grows out of the need to

quantitatively account for reading eye movements in both theoretical research and educational

applications. It demonstrates the feasibility and utility of modeling eye movements at a level

other than mechanisms and processes. Although researchers may have different theories about

mechanisms and processes, it is my hope that at least we can share a common description of eye

movements.

Table 1. Developmental Characteristics of Reading Eye Movements

TABLES

Grade level Article and characteristic 1 2 3 4 5 6 Adult Taylor (1965) Fixation duration (msec) 330 300 280 270 270 270 240 Fixations per 100 words 224 174 155 139 129 120 90 Frequency of regressions (%) 23 23 22 22 21 21 17Buswell (1922) Fixation duration (msec) 432 364 316 268 252 236 252 Fixations per 100 words 182 126 113 92 87 87 75 Frequency of regressions (%) 26 21 20 19 20 21 8Rayner (1985b) Fixation duration (msec) 290 276 242 239 Fixations per 100 words 165 122 110 92 Frequency of regressions (%) 27 25 24 9McConkie et al. (1991) Fixation duration (msec) 304 268 262 248 243 200 Fixations per 100 words 168 138 125 132 135 118 Frequency of regressions (%) 34 33 34 36 36 21Overall mean Fixation duration (msec) 355 306 286 266 255 249 233 Fixations per 100 words 191 151 131 121 117 106 94 Frequency of regressions (%) 28 26 25 26 26 22 14

Reproduced from: Table 4 in Rayner (1998)

Table 2. Log Likelihood of Bayesian and MLE for Fixation Duration Fitting

Component Modes Variance of log(dur) Weights Components BNT (sec) GMM (sec) BNT GMM BNT GMM

3rd-grade: N=481, BNT Log likelihood= -386.86, GMM Log likelihood=-386.68 S 0.062 0.082 0.171 0.284 0.081 0.133M 0.204 0.212 0.133 0.105 0.537 0.530L 0.302 0.321 0.184 0.171 0.382 0.337

5th-grade: N=586, BNT Log likelihood= -415.85, GMM Log likelihood=-417.53 S 0.061 0.169 0.217 0.657 0.037 0.158M 0.191 0.194 0.103 0.092 0.595 0.611L 0.305 0.351 0.227 0.143 0.368 0.231

Adults: N=416, BNT Log likelihood= -231.08, GMM Log likelihood=-231.40 S 0.061 0.072 0.21 0.281 0.081 0.104M 0.177 0.173 0.089 0.079 0.673 0.531L 0.221 0.218 0.146 0.122 0.246 0.365

Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control

FIGURES

Figure 1. Architecture of Reilly’s Connectionist Model of Eye-movement Control (reproduced from Reilly,

1997, Figure 1). The circles represent connectionist modules and the rectangles non-connectionist control modules.

Thick lines indicate a flow of activation, thin lines a flow of control. The asymptote detectors determine when the

cascading outputs from the lexical and saccadic modules have reached asymptote.

Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5.

Figure 2. Illustration of Parafoveal Preview Effects in E-Z Reader 5 (reproduced from Reichle et al., 1997,

Figure 6). Preview benefit (gray area) increases as the frequency of the foveal word increases (x-axis). At time t(fn),

familiarity check has completed and a saccade to the next word is ordered, which would take a constant time,

t(mn+1)+t(Mn+1), to prepare and execute. During this time, if the lexical completion process is able to finish, t(lan),

there will be some time for parafoveal processing, marked in gray. Because the slope for t(lan) is larger than that for

t(fn), the gray area shrinks for low-frequency words.

Figure 3. Order-of-processing diagram for E-Z Reader 5

Figure 3. An order-of-processing diagram for E-Z Reader 5 (reproduced from Reichle et al., 1998, Figure

7). The boxes are possible states that the model could be in, with the ongoing processes represented in the box. Each

arrow is labeled by the process that has completed, and dotted arrows indicate that attention has shifted forward

(indicated by n = n + 1 on the diagram). Note that n indexes the attended word, not the fixated word. (The numbers

given to the boxes are essentially arbitrary.) f = familiarity check of the word; lc = completion of the lexical access

of the word; m = a labile stage of saccade programming that can be canceled by a subsequent saccade; M = a

subsequent nonlabile stage of saccade programming. The additional states added are for planning and executing

intraword saccades.

Figure 4. Illustration of components of the Mr. Chips model

Figure 4. Illustration of components of the Mr. Chips model, reproduced from (reproduced from Legge et

al., 1997, Figure 1). See Chapter 2 (page 34) for details.

Figures 5A and 5B. Landing Position of Fixations During Reading

Figures 5A and 5B. Landing position of fixations during reading (reproduced from McConkie et al., 1994,

Figures 1 and 2). Figure 5A shows empirical frequency distributions of fixation landing position as a function of

launching sites. The corresponding fitted Normal curves are plotted. Figure 5B shows the mean landing position as a

function of launch site, for seven-letter words. It can be seen that the range error is zero at launch site equals 7 letter

spaces.

Figure 6. Frequency of skipping four- and eight-letter words

Figure 6. Frequency of skipping four- and eight-letter words (reproduced from McConkie et al., 1994,

Figure 3). The probability of word skipping can be modeled with a logistic function (see Chapter 2, page 43 for

more details).

Figure 7. Mean Landing Positions of Regressive Saccades as a Function of Launch Site

Figure 7. Mean landing positions of regressive saccades as a function of launch site (reproduced from

Radach & McConkie, 1998, Figure 3). The x-axis is numbered relative to the space following the target word, with

negative numbers indicating launch sites from within the word, and positive numbers indicating launch positions to

the right of the word boundary. The y-axis indicates mean landing position, and is numbered with respect to the

center of the word. Interword regressions do not show systematic range errors.

Figure 8. Fitting Fixation Duration Distribution with a Two-stage Mixture Model

Figure 8. Fitting fixation duration distribution with a two-stage mixture model (reproduced from McConkie

et al., 1994, Figure 5). See Chapter 2 (page 46) for details.

Figure 9. Distributions of Fixation Durations in Yang and McConkie (in press)

25 75 125 175 225 275 325 375 425 475 525 575 625 675 725

Fixation Durations(25ms Bins)

Normal+Normal-Nonword+X's+X's-Dash+Blank-

Figure 9. Distributions of fixation durations in Yang and McConkie (in press, reproduced from Figure 2).

Normal+ is the control condition in which the original text was displayed. In the Normal- condition all spaces were

replaced by the @ character. In the Nonword+ condition letters were replaced by randomly selected letters. In the

X’s+ condition all characters except for spaces were replaced by X’s. In the X’s- condition all characters, including

spaces, were replaced by X’s. In the Dash+ condition all characters except for spaces were replaced by dashes. All

characters were replaced by spaces in the Blank-condition.

Figure 10. Graphical representation of the SHARE model

FDCt S | M | L

STt -2*| -1| 0 | 1 | 2 | 3 | 4*

ECCENn C | E

WLENn+1S | M | L

FREQn L | M | H

SACCt DURt

Figure 10. Graphical representation of the SHARE model. Each node represents a random variable. FREQn

is the frequency of the current word. WLENn+1 is the length of the next word. ECCENn is the eccentricity of the

current landing position. STt is the saccade targeting node that plans the current saccade (the one following the

current fixation t). FDCt is the fixation duration category of the current fixation. PSLt is the planned saccade length

of the current saccade. SACCt is the actual length of the current saccade. DURt is the log-transformed duration of

the current fixation. Nodes with rectangle boxes are discrete variables; nodes with oval boxes are continuous nodes.

Clear boxes represent observed variables; the shadowed box (FDC) represents a hidden variable. An arrow from one

node to another shows that the latter variable is dependent on the former; the lack of an arrow between two nodes

shows that the two nodes are conditionally independent. The circular arrows beside the ST and FDC nodes signify

temporal dependency, i.e., the value of a node at fixation t depends on that at fixation t-1.

0 0.2 0.4 0.6 0.8 10

200Distribution of Fixation Duration

Fixation Duration (in sec)

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

1Cumulative Distribution Function

−600 −400 −200 0 200 400 6000

700Distribution of Saccade Length

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

0 0.2 0.4 0.6 0.8 10

EmpiricalSimulated

0 0.2 0.4 0.6 0.8 10

−600 −400 −200 0 200 400 6000

Saccade Length

−600 −400 −200 0 200 400 6000

Saccade Length

Figure 12. Simulated and Empirical First Fixation Duration by Word Frequency

0 0.1 0.2 0.3 0.4 0.50

0.5Low Frequency Words

Empirical (sec.)

0 0.1 0.2 0.3 0.4 0.50

0.5Medium Frequency Words

Empirical (sec.)

G3 G5 Adult

0 0.1 0.2 0.3 0.4 0.50

0.5High Frequency Words

Empirical (sec.)

Figure 13. Simulated and Empirical Single Fixation Duration by Word Frequency

0 0.1 0.2 0.3 0.4 0.50

Empirical (sec.)

0 0.1 0.2 0.3 0.4 0.50

Empirical (sec.)

G3 G5 Adult

0 0.1 0.2 0.3 0.4 0.50

Empirical (sec.)

Figure 14. Simulated and Empirical Gaze Duration by Word Frequency

0 0.2 0.4 0.6 0.8 1 1.2 1.40

Low Frequency Words

Empirical (sec.)

0 0.2 0.4 0.6 0.8 1 1.2 1.40

Medium Frequency Words

Empirical (sec.)S

G3 G5 Adult

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

1High Frequency Words

Empirical (sec.)

Figure 15. Simulated and Empirical Skipping Probability by Word Frequency

0 0.02 0.04 0.06 0.08 0.10

Empirical prob.

0 0.05 0.1 0.15 0.2 0.250

Empirical prob.

G3 G5 Adult

0 0.1 0.2 0.3 0.4 0.50

Empirical prob.

Figure 16. Simulated and Empirical Probability of Making Single Fixation by Word Frequency

0 0.05 0.1 0.15 0.2 0.250

Empirical prob.

0 0.05 0.1 0.15 0.2 0.250

Empirical prob.

G3 G5 Adult

0 0.1 0.2 0.3 0.4 0.50

Empirical prob.

Figure 17. Simulated and Empirical Probability of Making Two Fixations by Word

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

Low Frequency Words

Empirical prob.

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

Medium Frequency Words

Empirical prob.

G3 G5 Adult

0 0.05 0.1 0.15 0.20

Empirical prob.

Figure 18. Developmental Changes in Saccade Targeting Probabilities

0 200 400 600 8000

1Probability of Regressions

Reading Speed (WPM)

0 200 400 600 8000

1Probability of Refixations

Reading Speed (WPM)

G3 G5 Adult

0 200 400 600 8000

1Prob. of Progressing 1 Word

Reading Speed (WPM)

0 200 400 600 8000

1Prob. of Progressing 2 or More Words

Reading Speed (WPM)

Figure 19. Developmental Changes in Fixation Duration Control: Probabilities of Making Short,

Medium, and Long Fixations

0 200 400 600 8000

1Probability of Short Fixations

Reading Speed (WPM)

0 200 400 600 8000

1Probability of Medium Fixations

Reading Speed (WPM)

b.G3 G5 Adult

0 200 400 600 8000

1Probability of Long Fixations

Reading Speed (WPM)

Figure 20. Developmental Changes in Fixation Duration Control: Modes of Short, Medium, and

Long Fixation Durations

0 200 400 600 8000

0.5Mode of Short Fixations

Reading Speed (WPM)

0 200 400 600 8000

0.5Mode of Medium Fixations

Reading Speed (WPM)

G3 G5 Adult

0 200 400 600 8000

0.5Mode of Long Fixations

Reading Speed (WPM)

Figure 21. Developmental Changes in Fixation Duration Control: Variance of Short, Medium,

and Long Fixation Durations

0 200 400 600 8000

0.5Variance of Short Fixations

Reading Speed (WPM)

0 200 400 600 8000

0.5Variance of Medium Fixations

Reading Speed (WPM)

G3 G5 Adult

0 200 400 600 8000

0.5Variance of Long Fixations

Reading Speed (WPM)

Figure 22. What Affects Saccade Targeting: Effects of Word Frequency, Length of the Next

Word, Fixation Landing Position, and the Previous Saccade Move

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5Word Frequency

Reading Speed (WPM)

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5 Length of Next Word

Reading Speed (WPM)

G3 G5 Adult

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5 Landing Position

Reading Speed (WPM)

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5 Last Saccade Move

Reading Speed (WPM)

Figure 23. What Affects Fixation Duration Control: Effects of Word Frequency, Length of the

Next Word, Fixation Landing Position, and the Previous Saccade Move

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5Word Frequency

Reading Speed (WPM)

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5 Length of Next Word

Reading Speed (WPM)

G3 G5 Adult

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5Landing Position

Reading Speed (WPM)

0 200 400 600 800−0.5

−0.4

−0.3

−0.2

−0.1

0.5 Last Fixation

Reading Speed (WPM)

Figure 24. BNT Mixture of Gaussian Model Diagram

FDC S | M | L

Figure 24. Graphical representation of the BNT Mixture of Gaussian model for fitting fixation duration

distributions. FDC is a hidden node representing the fixation duration category. DUR is the log-transformed duration

of the current fixation. FDC is a discrete variable with three states: S, M, and L, with prior probabilities of 0.10,

0.55, and 0.35, respectively. DUR is a continuous variable following normal (Gaussian) distributions. The priors for

DUR conditioned on FDC value are set as follows: DURS~N(75, 80), DURM~N(180, 130), DURL~N(320, 320).

0 0.2 0.4 0.6 0.8 10

Time (in second)

N= 45995, mean= 0.27084, LogLikelihood= −40497

Mode(linear)= 0.230, var(log)= 0.341, w= 1.000

0 0.2 0.4 0.6 0.8 10

Time (in second)

Mode(linear)= 0.218, var(log)= 0.636, w= 0.402Mode(linear)= 0.238, var(log)= 0.139, w= 0.598

0 0.2 0.4 0.6 0.8 10

Time (in second)

Mode(linear)= 0.081, var(log)= 0.353, w= 0.088Mode(linear)= 0.212, var(log)= 0.120, w= 0.608Mode(linear)= 0.362, var(log)= 0.246, w= 0.305

0 0.2 0.4 0.6 0.8 10

Time (in second)

Mode(linear)= 0.067, var(log)= 0.230, w= 0.071Mode(linear)= 0.170, var(log)= 0.066, w= 0.346Mode(linear)= 0.274, var(log)= 0.079, w= 0.399Mode(linear)= 0.444, var(log)= 0.210, w= 0.183

0 0.2 0.4 0.6 0.8 10

Time (in second)

Mode(linear)= 0.064, var(log)= 0.208, w= 0.066Mode(linear)= 0.155, var(log)= 0.057, w= 0.239Mode(linear)= 0.223, var(log)= 0.050, w= 0.339Mode(linear)= 0.338, var(log)= 0.061, w= 0.245Mode(linear)= 0.533, var(log)= 0.180, w= 0.111

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

ty N= 40478, mean= 0.19254, LogLikelihood= −21839

0 0.2 0.4 0.6 0.8 10

Time (in second)

ty N= 40478, mean= 0.19254, LogLikelihood= −21812

0 0.2 0.4 0.6 0.8 10

Time (in second)

0 0.2 0.4 0.6 0.8 10

Time (in second)

APPENDIX A. PROBLEMS IN THE E-Z READER MODEL

Reichle et al. (1998; Reichle et al., 1999) developed a series of “E-Z Reader” models of

eye-movement control during reading. They concluded that the E-Z Reader models fit the data

well. However, as will be shown below, evaluating the goodness of fit of the model turned out to

be impossible because of serious problems in their goodness-of-fit index and limitations of the

empirical data used for modeling.

The Goodness-of-fit Index.

A goodness-of-fit index is arguably the most important part of a model. On one hand, it is

the criterion based on which a model is "optimized" and parameters are estimated. On the other

hand, it is an important criterion for comparing and selecting models. It is the link between

theory and data. However, the way goodness-of-fit was handled in Reichle et al. is questionable.

According to Reichle et al. (1998):

The model's overall performance was measured by using the root mean square of the

normalized difference scores (errors) between the observed and predicted means of the

five frequency classes for each of the dependent measures. The normalization process

allowed the errors to be evaluated on a common scale (i.e., milliseconds and probabilities

were converted to unitless scores). The normalization process that we used was to square

the difference between the observed and predicted values and then to divide this

difference by the standard deviation of the observed values. (p. 157)

To facilitate further discussion, let’s put the above into formulas. Let X, an eye-

movement measure, be a random variable with expected value µx and standard deviation σx . Let

{x1 .. xN} be a random sample of X, with a sample mean of x and sample standard deviation sd.

For large N we know that the distribution of sample mean is approximately normal with a

standard deviation that is estimated by the sample standard error, se = N

sd . Finally, let xs be

the mean of measure X from the E-Z Reader simulation. Because of the large size of N in the

simulation (1,000 “statistical subjects”), xs should be very stable and can be practically treated as

a constant. With the above notations, we can write Reichle et al.’s normalization algorithm and

the goodness-of-fit index (RMS) in the following formulas. For each measure of the M=30

measures, Xi, the normalized difference score, according to Reichle, et al. (1998, cited above), is

iis ) x - x ( = ,

and the goodness-of-fit index, root mean square (RMS), of a model is calculated as

There are at least two serious errors in the above goodness-of-fit index, each of which

will be shown to have a large impact on the evaluation and interpretation of the models. In

addition, the use of RMS as goodness-of-fit is also questionable. I will discuss each of them

below.

The “normalization.” Reichle et al. claim that their normalization process "allowed the

errors to be evaluated on a common scale" that is, rendering them unitless. The idea was

probably to normalize using Z-scores. But, their formula of normalization does not serve this

purpose:

i Zsdsd

y ×=×== )x - x(x - x

)x - x() x - x (

iisiis

Clearly Zi is a unitless Z-score, but Reichle et al.'s "normalized difference score" scaled Zi

by the difference between the observed and estimated mean of measure X. As a consequence,

when yi’s were used to calculate overall goodness of fit, different measures had different

contributions to the loss function and the weight depended on the scale of the measurers.

Specific to the E-Z Reader models, a rough estimation from Reichle et al. (1999) Table 1

showed that )x - x( iis for gaze duration, first fixation duration, and single-fixation duration are

anywhere from 2 to 18 (not counting 0's), while )x - x iis( for the probability of skipping, making

single fixations, and for making two fixations are in the range of 0.01 to 0.1. The difference

between the two groups of measures is in a factor of 100. Without doing any mathematical

analysis, it's obvious that the effects on the probabilities were grossly suppressed during the

model-fitting and parameter-estimating process. An immediate consequence of using this 100:1

"normalization" formula is that the E-Z Reader models were sensitive to fixation duration data

but practically ignored effects on skipping and refixation probabilities. It is not surprising then,

given this optimization criterion, that model fitting did not improve in any real sense from E-Z

Reader 2 to 6, and in many cases the fitness was actually worse.

It's interesting, though, that even under this extremely unfavorable treatment, the three

probability measures were fit reasonably well, judged by simply looking at the observed and

estimated means. A possible explanation is that the different measures of eye movements may

not be independent (as indeed they should not be if the E-Z Reader model is correct), and

consequently fitting a subset of the variables would guarantee that the rest of the variables are

also fit well. This hypothesis will be examined later.

Standard deviation versus standard error. In calculating

Z x - x iis= ,

Reichle et al. (1998; Reichle et al., 1999) used standard deviation of the observed sample as the

denominator. Because the comparisons here were between means of observed versus simulated

observations, sample standard error should be used in the denominator (see Hayes, 1988). I

suspect that the confusion might stem from a seemingly similar situation, model training in

artificial neural networks, where after each cycle RMS is calculated on the basis of sample

standard deviation. This use of sample standard deviation is legitimate because a single

observation – activation level after this cycle – is the center of concern, rather than a mean of

some sort. However, the Monte Carlo simulation that Reichle et al. was doing is fundamentally

based on the Law of Large Numbers and is only concerned with means.

What impact does this have on goodness-of-fit indices? The answer depends on the

sample size. A rough guess on the N for each of the 30 means from Schiling et al. is

approximately 3,000 (48 sentences, 12 words long on average, 30 subjects, divided by 5

frequency categories). If Reichle et al. used standard error instead of standard deviation of each

measure, the Z scores, hence the overall goodness-of-fit index, would have been roughly 50

times larger. The RMS for E-Z Reader 6, for example, would have been in the neighborhood of

10, instead of 0.218. The Z-scores (using the correct formula) follow a unit Normal distribution

(for N=3,000). Therefore any |Zi| >2 clearly indicates a poor fit at point i, at an α level of .05. If

sd were used in place of se, as Reichle et al. did, the magnitude of Zi would be shrunk some 50-

fold and would never be significant.

RMS and goodness-of-fit testing. Reichle et al. chose to use the root mean square of error

(RMS) as an index of the goodness of fit during grid-searches of optimal parameters. There is

nothing wrong with the choice. However, RMS is rarely used in statistical modeling or Monte

Carlo simulations as a goodness-of-fit index because (a) it is difficult to test the fit of a model to

data or to compare different models on the basis of RMS, and (b) there are easier ways to do the

One classical goodness-of-test statistic, Chi-square, is actually closely related to RMS.

When each of the M error components is independently and identically distributed (i.i.d.) as unit

Normal distribution (Z), the sum of squared errors (SSE),

∑=×=M

iiZMRMSSSE 22 ,

is distributed as a Chi-square distribution with degree of freedom (df) of M. Thus SSE can be

used to test against an appropriate Chi-square distribution to see if the hypothesis that the model

fits the data set should be rejected. Not only can the fit of a single model be tested this way, but

also a series of two or more hierarchically constructed models, with increasing numbers of free

parameters, can be compared using the Chi-square test in order to decide whether the

improvement in fit with additional parameters is statistically justifiable.

Reichle et al. did not formally test the fit of their models to the data or based model

selection on clear empirical criteria, being primarily concerned with psychological validity.

Well-developed statistical methods of model fitting exist, and can provide a more systematic

means of developing and comparing models.

Correlations, Multicollinearity, and Parsimonious Modeling.

A question raised previously is why E-Z Reader was able to model eye-movement

probability data fairly well even when these measures had little weight in model optimization

and parameter estimation. A possibility is that the probability measures were highly correlated

with duration data. There was a hint in the report that this was true, as Reichle et al. (1998) stated

that “the single-fixation duration and refixation means were not included in this [RMS] measure

because their values are largely redundant with the other measures.”

To test this hypothesis, I computed pairwise correlations between the six means of eye-

movement measures, mean category word frequency, and the logarithm of the frequencies given

in Reichle et al. (1999) Table 1. All eye-movement measure means are highly correlated. The

correlation coefficients range from .85 (between skipping rate and first fixation duration, p=.069,

N.S. for n=5), to .998 (between first fixation duration and single fixation duration, highly

significant). A Principle Component analysis on the six eye-movement measures showed that the

first component accounts for 94.6% of the total variance, the first two components account for

98.6%, and the first three component account for 99.999% of total variance. In addition, all eye-

movement measures are highly correlated with the logarithm of word frequency (all p's<.05). In

short, the six eye-movement variables can be effectively reduced to a single variable, with only

5% loss of information. The model fitting on the 30-point empirical data was practically based

on 5 points, which have an almost perfect linear relationship with log-transformed word

frequency.

The multicollinearity explains another puzzling aspect of the E-Z Reader models. First, as

E-Z Reader evolved from 1 to 5, its goodness-of-fit (measured by RMS) did not improve, and

often got worse. This goes against the common experience in modeling. Part of the reason for

this is because of the errors in the loss function. On the other hand, it could also be that the E-Z

Reader 1 was almost perfect given such a simple structure in the data. Any additional

mechanisms and parameters added in subsequent models could not possibly improve the fit.

Obviously, the most parsimonious model, possibly the only model, for this data set is

"any eye-movement measure is a linear function of log-transformed word frequency." Given the

extremely high correlations among all variables, a good model for one variable is automatically a

good one for another variable. The rest of the modeling process is to find out the intercepts and

slopes of the linear functions – an easy job for the grid-search algorithm.

The EZ-Reader modeling effort is one of the most ambitious attempts to model eye-

movement control parameters in a psychologically plausible fashion, but important errors in the

modeling approach severely limit the conclusions that can be drawn from this research.

APPENDIX B. FITTING MIXTURE MODELS TO EMPIRICAL FIXATION DURATION

DISTRIBUTIONS

Introduction

There has been empirical evidence that fixation duration may not follow a single

distribution but instead consist of a mixture of distributions (Gezeck et al., 1997; McConkie &

Dyre, 2000; Yang & McConkie, in press; see Chapter 3 for discussions on these studies).

Therefore, two critical modeling decisions are (a) the component distributions and (b) the

number of components.

To date, the most successful models of fixation duration distribution are the three models

from McConkie and Dyre (2000) – the two-state transition model, the two-stage race model, and

the two-stage mixture model. All of the three models are essentially mixture models of an early,

short component and a late, long component43. The choices of component distributions varied

(Weibull, exponential, convolutions of Weibull and exponentials), but they were largely

motivated by empirical hazard functions.

43 For the two-state transition model, the short fixations are assumed to follow a Weibull distribution with a power

(shape) parameter equals to 2 (which has a linearly rising hazard function). The long component is assumed to be

exponential. The mixing rate, i.e., the proportion to switch from State 1 to State 2, increases over time. For the two-

stage mixture and two-stage racing models, McConkie and Dyre (2000) assumed that the duration of Stage 1 is a

mixture of short and long components; the duration of Stage1 is then convoluted with that of Stage 2, which is an

exponentially distributed random variable. This is mathematically equivalent to saying that the final distribution is

composed of short and long fixations, each of which is a convolution of two random distributions – the

corresponding State 1 distribution and the exponential. Therefore, all three models are essentially mixture models.

There is no unique way to fit an empirical distribution with mixture models (c.f.

McLachlan & Peel, 2000). The success of these models suggests that other mixture models with

different component assumptions may also achieve good results. In addition, the components in

McConkie and Dyre’s (2000) models are complex and difficult to handle mathematically. The

current study was a search for a simpler solution.

A simple distribution, the lognormal distribution, was chosen as the distribution of the

mixture components44. There were two reasons for this choice. First, the hazard function of

lognormal distribution has the characteristics of the empirical hazard rates: an initially slow but

accelerating curve, reaching at a peak, which is followed by a very slow, graduate decreasing tail

(Johnson et al., 1994). Secondly, a mixture-of-lognormal distribution is easy to handle because

on the log-scale it becomes a mixture-of-normal distribution, which is the most extensively

studied mixture model class. Its mathematical properties are well understood, and many

statistical algorithms are available for model estimation.

Method

Data and Apparatus. See Chapter 4.

Modeling procedure. Model fitting was done in MatLab, a numeric computation software

package. Fixation duration was first log-transformed, so that the logarithm of it was to be fit with

mixture-of-component Gaussian models. Two fitting methods were used.

For maximum likelihood estimating, the Gaussian Mixture Model (GMM) Toolbox was

44 The log-normal distribution is closely related to Normal distribution in that if log(x), x>0 is Normally distributed,

then x follows a log-normal distribution.

used (Cadez, Smyth, McLachlan, & McLaren, 2001). The GMM algorithm fits a mixture of n

Gaussian model, where n is a pre-specified integer, to the data and iteratively changes model

parameters until it maximizes the likelihood of observing the data given the model. For more

discussions on mixture models in general or maximum likelihood estimation of mixture of

Gaussian models, see McLachlan and colleagues (McLachlan & Basford, 1988; McLachlan &

Peel, 2000) and Titterington, Smith, and Makov (1985). The logarithm of fixation durations was

fitted with n=1..7 Gaussian mixture models, and the best fitting parameters over 5 repetitions

(with different random initial values) were used.

In addition to the maximum likelihood method, Bayesian estimation was done with the

Bayes Net Toolbox (BNT) developed by Kevin Murphy (2001). A graphical representation of

the BNT Gaussian mixture model is shown in Figure 24.

The Bayesian method takes into account the prior probability distribution of a parameter,

which represents prior knowledge, and incorporates it with the information in data to maximize

the posterior probability, or the probability of parameter values given observed data. A unique

advantage of the Bayesian method over the maximum likelihood estimation is that it incorporates

prior knowledge about the likely values of parameters. In the current case, the prior knowledge

came from the empirical results of Yang and McConkie (in press), i.e., the modes of the

distributions in their Figure 9.

Results

Maximum likelihood estimates. Figures 25-1 through 15 show the empirical fixation

duration distribution, the best-fit n-component Gaussian-mixture models, and the (weighted)

component distributions for third-grade, fifth-grade, and adult data. A visual inspection suggests

that 3-component Gaussian-mixture models fit the empirical data very well. Most importantly,

the three components in each age group correspond fairly well with the results from Yang and

McConkie.

Formally determining the number of components, however, was difficult. The typical

log-likelihood ratio test, a statistical procedure for comparing a “full” versus a “reduced” model

by weighting the gain in the goodness of fit against additional number of parameters, cannot be

applied directly in this case, because a 2-component Gaussian-mixture model is not strictly a

“reduced” model of a 3-component Gaussian-mixture model (McLachlan & Basford, 1988;

McLachlan & Peel, 2000; Titterington et al., 1985). Many alternative tests have been proposed

(McLachlan & Peel, 2000). Here I adopted a modified log-likelihood ratio test by Wolfe (1971;

see also Everitt, 1981), which has been shown to work well when the number of cases is at least

five times larger than the number of components. Wolfe proposed that under the null hypothesis

that the data arise from a mixture of g1 populations versus the alternative that they arise from g2

(g1<g2) populations, the usual log-likelihood statistic 2 logλ would be approximated as

-2c logλ ~ χ2d ,

where the degrees of freedom, d, is taken to be twice the difference in the number of parameters

in the two hypotheses, not including the mixing proportions, and the correction factor, c, is given

(n-1-p-1/2 g2)/n

In the current case n is sufficiently large, so c is practically 1.

Wolfe’s test was carried out in sequence to test the minimal number mixture components

that provided satisfactory fit to empirical data45. Each additional Normal component added two

new parameters, and hence d=4, and the corresponding Chi-square critical value for α=0.005 is

14.8602. In other words, if the difference of log-likelihood in two consecutive models (in terms

of the number of components) was larger than 14.86, the null hypothesis (having a smaller

number of components) should be rejected and the hypothesis associated with a larger number of

components should be adopted.

In all age groups the 3-component Gaussian-mixture models provided significantly better

fit than 2-component models, and seemed to capture the basic characteristics of the distributions.

The statistical tests showed that one should prefer a 4-component model for third-grade data, 5-

component for fifth-grade, and 3-component for adults. The additional variance accounted for in

moving beyond 3 components was relatively small (e.g., the loglikelihood for 3rd-grade

distribution increased by 334 when 3 components were used instead of 2, but it only increased by

61 and 21 for each additional component above 3), although significant. Because the differences

were so small and in order to facilitate comparison between age groups, 3-component models

were used for all groups in analysis of the parameters.

Although the maximum likelihood estimates of 3-component means corroborate with

Yang and McConkie’s (in press) findings in general, the estimates for the first component (the

short fixations) were not numerically stable from run to run, and the estimated means and

variances had a sizable effect on the estimates of parameters of the third (the longest)

45 Here the potential problem of correlation in sequential testing was simply dealt with by using a more stringent α

level, α=0.005.

components. There was a need to “anchor” the first component so as to obtain more stable

estimates of other components.

Bayesian estimates. The Bayesian estimation method was used to achieve these goals. In

these analyses, the number of components was fixed to three. Rather than having the maximum

likelihood algorithm randomly guess the initial values of parameters, the Bayesian method

allows imposing constraints of parameter values using prior distributions. Based on Yang and

McConkie (in press), the prior distributions of the components were set to three normal

distributions: N(log(75), 80), N(log(180), 130), and N(log(320), 320)46. The prior distribution of

the mixture weights was set to a Dirichlet distribution, following Bayesian modeling

conventions, with pi= {0.10, 0.55, 0.35}. These prior weights were based on the maximum

likelihood estimates of the weights for 3-component models.

Because Bayesian estimation is notoriously time consuming, random samples of 10% of

the original data were used in Bayesian estimation. This procedure was repeated three times to

ensure stability of estimates. In fact, the estimates were very stable even if only 1% of data

(which correspond to approximately 200-500 cases in each age group) were used. For

comparison the same random samples were subject to maximum likelihood estimation as well.

Table 2 showed the parameters and log-likelihood indices of the Bayesian estimates and the

corresponding maximum likelihood estimates. The results of the two methods were generally in

agreement. The fittings of Bayesian estimates (log-likelihood) were at least as good as that of

maximum likelihood ones, and the differences were often within the range of random

46 The unit for the means is millisecond. Note that fixation duration was log-transformed first and then fit to

fluctuations caused by different random starting points in the maximum likelihood method. As

expected, expected the Bayesian method provided a more consistent estimate of the mean of the

first component, so that it was less likely to interfere with the parameters of the third component.

To summarize the fitting results of lognormal-mixture models, 3-component models

provided very close fit to fixation duration distributions of both children and adult readers.

Although it is impossible to compare the goodness-of-fit of the lognormal-mixture model to that

of McConkie and Dyre’s models, they appear to be largely comparable based on the distribution

plots. In addition, the parameters of the three component distributions were reasonably close to

empirical findings in Yang and McConkie. This was an encouraging support for the choice of

lognormal-mixture model.

Additional analyses showed that the 3-component lognormal-mixture model could also fit

distributions of individual readers. Fixation duration on low frequency words had higher

proportion of the “long” component, and the mode of the component was larger. A further

investigation on the frequency effect showed the effect could be accounted for solely by the

weight component, i.e., when the parameters of the three components were fixed and only the

weights were allowed to vary, model fitting was not significantly different from when all

parameters were allowed to vary.

Discussion

The current study showed that a 3-component mixture-of-lognormal model could

successfully model empirical fixation duration distributions of beginning readers and adults. The

mixture-of-Gaussian models, which is equivalent to fitting fixation duration with mixture-of-log-normal models.

fitting appeared to be as good as McConkie and Dyre’s (2000) models.

The 3-component lognormal-mixture model provided a simple, straightforward

interpretation for Yang and McConkie’s (in press) results. According to the current model, there

are three classes of fixations, each with different distributional properties. In normal reading, the

mixture rate of these fixation classes may change with linguistic or other factors, but is relatively

stable. The resulted mixture showed the typical unimodal, long-tailed distribution. Under

extreme experimental manipulations such as in Yang and McConkie’s study, however, the

proportions are knocked out of normal balance and therefore individual component were

revealed. The current mixture model would hypothesize that each individual reader should have

stable component parameters in normal reading and Yang and McConkie’s experimental

conditions. It would be interesting to see this hypothesis tested.

Interpreting Yang and McConkie’s findings (in press) in McConkie and Dyre’s (2000)

modeling framework is difficult, because they assumed a two-component structure. In this sense,

the current model seems to be more readily interpretable.

Unlike McConkie and Dyre (2000), no attempt was made to infer the underlying

processing mechanism from the forms of distributions. Reasoning about stochastic processes

from their marginal distributions is often risky, as many mechanisms may result in similar

distributions. The choice of using lognormal components, which were no more arbitrary than

those components in McConkie and Dyre’s models, may raise skepticism. There is no doubt that

choosing the lognormal distribution was for modeling convenience, but the results suggested that

the decision was not a particularly bad one. At the same time, there is nothing in the model that

requires a lognormal distribution, and any other reasonable distribution may just work as well.

REFERENCES

Agresti, A. (1990). Categorical data analysis. New York: Wiley.

Andriessen, J. J., & De Voogd, A. H. (1973). Analysis of eye movement pattern in silent

reading. IF’0 Annual Program Report, 30-35.

Bengio, Y. (1999). Markovian models for sequential data. Neural computing surveys, 2,

129-162.

Bengio, Y., & Frasconi, P. (1996). Input/output HMMs for sequence processing. IEEE

Transactions on Neural Networks, 1231-1249.

Bernardo, J. M., & Smith, A. F. M. (1994). Bayesian theory. Chichester, England: John

Wiley.

Birnbaum, Z. W. (1952). Numerical tabulation of the distribution of Kolmogorov's

statistic for finite sample size. Journal of the American Statistical Association, 47, 425-441.

Boyen, X., & Koller, D. (1998a). Approximate learning of dynamic models. Paper

presented at the Neural Information Processing Systems (NIPS-11).

Boyen, X., & Koller, D. (1998b). Tractable inference for complex stochastic processes.

Paper presented at the 14th Annual Conference on Uncertainty in AI (UAI), San Francisco.

Brysbaert, M., & Vitu, F. (1998). Word skipping: Implications for theories of eye

movement control in reading. In G. Underwood (Ed.), Eye guidance in reading and scene

perception (pp. 125-147). Oxford, England UK: Anonima Romana.

Brysbaert, M., Vitu, F., & Schroyens, W. (1996). The right visual field advantage and the

optimal viewing position effect: On the relation between foveal and parafoveal word recognition.

Neuropsychology, 10, 385-395.

Buswell, G. T. (1922). Fundamental reading habits: A study of their development.

Supplementary Educational Monographs, 21.

Buswell, G. T. (1937). How adults read. Chicago, Ill.,: University of Chicago.

Cadez, I. V., Smyth, P., McLachlan, G. J., & McLaren, C. E. (2001). Maximum

likelihood estimation of mixture densities for binned and truncated multivariate data. Machine

learning journal, special edition on unsupervised learning, in press.

Carpenter, P. A. (1984). The influence of methodologies on psycholinguistic research: A

regression to the Whorfian hypothesis. In D. E. Kieras & M. A. Just (Eds.), New methods in

reading comprehension research (pp. 1-12). Hillsdale, NJ: Lawrence Erlbaum Asso.

Carpenter, R. H. S. (1988). Movements of the eyes (2nd rev. & enlarged ed.). London,

England UK: Pion Limited.

Conover, W. J. (1999). Practical nonparametric statistics. (3rd ed.). New York: Wiley.

Cowell, R. (1998a). Advanced inference in Bayesian networks, Learning in graphic

models (pp. 27-50). Cambridge, MA: MIT Press.

Cowell, R. (1998b). Introduction to inference for Bayesian networks, Learning in graphic

models (pp. 9-26). Cambridge, MA: MIT Press.

Dearborn, W. F. (1906). The Psychology of Reading. (Vol. XIV). New York: The

Science Press.

Everitt, B. S. (1981). A Monte Carlo investigation of the likelihood ratio test for the

number of components in a mixture of normal distributions. Multivariate Behavioral Research,

16, 171-180.

Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of

phonological and orthographic information in reading Chinese and English. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100.

Findlay, J. M., & Walker, R. (1999). A model of saccade generation based on parallel

processing and competitive inhibition. Behavioral & Brain Sciences, 22, 661-721.

Francis, W. N., & Kucera, H. (1982). Frequency analysis of English usage: lexicon and

grammar. Boston: Houghton Mifflin.

Gezeck, S., Fischer, B., & Timmer, J. (1997). Saccadic reaction times: A statistical

analysis of multimodal distributions. Vision Research, 37, 2119-2131.

Goodman, L. A., & Kruskal, W. H. (1954). Measures of Association for Cross

Classifications. Journal of the American Statistical Association, 49, 732-764.

Goodman, L. A., & Kruskal, W. H. (1963). Measures of Association for Cross

Classifications III: Approximate sampling theory. Journal of the American Statistical

Association, 58, 310-364.

Gray, C. T. (1922). Deficiencies in reading ability: Their diagnosis and remedies.

Chicago, IL: Heath & Co.

Hacisalihzade, S. S., Stark, L. W., & Allen, J. S. (1992). Visual perception and sequences

of eye movement fixations: A stochastic modeling approach. IEEE Transactions on Systems,

Man & Cybernetics, 22, 474-481.

Hall, W. J., & Wellner, J. A. (1980). Confidence bands for a survival curve from

censored data. Biometrika, 67, 133-143.

Harris, C. M., Hainline, L., Abramov, I., Lemerise, E., & et al. (1988). The distribution of

fixation durations in infants and naive adults. Vision Research, 28, 419-432.

Heckerman, D. (1998). A tutorial on learning with Bayesian networks. In M. Jordan

(Ed.), Learning in Graphic Models (pp. 301-354). Cambridge, MA: MIT Press.

Heller, D. (1982). Eye movements in reading. In R. Groner & P. Fraisse (Eds.), Cognition

and eye movements (pp. 139-154). Amsterdam: North Holland.

Henderson, J. M., & Ferreira, F. (1993). Eye movement control during reading: Fixation

measures reflect foveal but not parafoveal processing difficulty. Canadian Journal of

Experimental Psychology, 47, 201-221.

Hogaboam, T. (1983). Reading patterns in eye movement data. In K. Rayner (Ed.), Eye

movements in reading: Perceptual and language processes (pp. 309-332). New York: Academic

Press.

Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods. (2nd ed.). New

York: Wiley.

Huey, E. B. (1908). The psychology and pedagogy of reading: with a review of the

history of reading and writing and of methods, texts, and hygiene in reading. Cambridge, Mass.:

MIT Press.

Inhoff, A. W., & Radach, R. (1998). Definition and computation of oculomotor measures

in the study of cognitive processes. In G. Underwood (Ed.), Eye guidance in reading and scene

Irwin, D. E. (1998). Lexical processing during saccadic eye movements. Cognitive

Psychology, 36, 1-27.

Javel, E. (1878). Essai sur la physiologie de la lecture. Ann. Oculist, 79, 97-117, 240-274.

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous univariate distributions.

(2nd ed.). New York: Wiley & Sons.

Jordan, M., Ghahramani, Z., & Saul, L. K. (1997). Hidden Markov decision trees. In M.

C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems

(Vol. 9, pp. 501-507). Cambridge, MA: MIT Press.

Jordan, M., & Jacobs, R. A. (1994). Hierarchical mixtures of experts and the EM

algorithm. Neural Computation, 6, 181-214.

Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1998). An introduction to

variational methods for graphical models. In M. Jordan (Ed.), Learning in graphical models (pp.

105-159). Cambridge, MA: MIT Press.

Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to

comprehension. Psychological Review, 87, 329-354.

Kennison, S. M., & Clifton, C. (1995). Determinants of parafoveal preview benefit in

high and low working memory capacity readers: Implications for eye movement control. Journal

of Experimental Psychology: Learning, Memory, & Cognition, 21, 68-81.

Kerr, P. W. (1992). Eye movement control during reading: The selection of where to send

the eyes. Unpublished Doctoral thesis, University of Illinois, Urbana-Champaign, IL.

Kingstone, A., & Klein, R. M. (1993). Visual offsets facilitate saccadic latency: Does

predisengagement of visuospatial attention mediate this gap effect? Journal of Experimental

Psychology: Human Perception & Performance, 19, 1251-1265.

Kliegl, R. M., Olson, R. K., & Davidson, B. J. (1982). Regression analyses as a tool for

studying reading processes: Comment on Just and Carpenter's eye fixation theory. Memory &

Cognition, 10, 287-296.

Legge, G. E., Klitz, T. S., & Tjan, B. S. (1997). Mr. Chips: An ideal-observer model of

reading. Psychological Review, 104, 524-553.

Liversedge, S. P., Paterson, K. B., & Pickering, M. J. (1998). Eye movements and

measures of reading time. In G. Underwood (Ed.), Eye guidance in reading and scene perception

(pp. 55-75). Oxford, England UK: Anonima Romana.

Liversedge, S. P., & Underwood, G. (1998). Foveal processing load and landing position

effects in reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp.

201-221). Oxford, England UK: Anonima Romana.

McConkie, G. W. (1981). Evaluating and reporting data quality in eye movement

research. Behavior Research Methods & Instrumentation, 13, 97-106.

McConkie, G. W., & Dyre, B. P. (2000). Eye fixation durations in reading: Models of

frequency distributions. In A. Kennedy, R. Radach, D. Heller, & J. Pynte (Eds.), Reading as a

perceptual process. Amsterdam: Elsevier Science Ltd.

McConkie, G. W., Kerr, P. W., & Dyre, B. P. (1994). What are "normal" eye movements

during reading: Toward a mathematical description. In J. Ygge & G. Lennerstrand (Eds.), Eye

movements in reading. Tarrytown, NY: Pergamon.

McConkie, G. W., Kerr, P. W., Reddix, M. D., & Zola, D. (1988). Eye movement control

during reading: I. The location of initial eye fixations on words. Vision Research, 28, 1107-1118.

McConkie, G. W., Kerr, P. W., Reddix, M. D., Zola, D., & et al. (1989). Eye movement

control during reading: II. Frequency of refixating a word. Perception & Psychophysics, 46, 245-

McConkie, G. W., & Rayner, K. (1973). An on-line computer technique for studying

reading: Identifying the perceptual span. In P. L. Nacke (Ed.), Diversity in mature reading:

theory and research (Vol. 1, pp. 119-130): National Reading Conference, Inc.

McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a

fixation in reading. Perception & Psychophysics, 17, 578-586.

McConkie, G. W., Reddix, M. D., & Zola, D. (1992). Perception and cognition in

reading: Where is the meeting point. In K. Rayner (Ed.), Eye movements and visual cognition:

Scene perception and reading (pp. 293-303). New York, NY: Springer.

McConkie, G. W., Zola, D., Grimes, J., Kerr, P. W., Bryant, N. R., & Wolff, P. M.

(1991). Children's eye movements during reading. In J. F. Stein (Ed.), Vision and visual dyslexia

(pp. 251-262). London: Macmillan Press.

McCullagh, P., & Nelder, J. A. (1983). Generalized linear models. London ; New York:

Chapman and Hall.

McLachlan, G. J., & Basford, K. E. (1988). Mixture models : inference and applications

to clustering. New York, N.Y.: M. Dekker.

McLachlan, G. J., & Peel, D. (2000). Finite Mixture Models. NY: Wiley.

Miller, K., & Feng, G. (in prep.). Reading English and Chinese: A developmental eye-

movement study.

Morrison, R. E. (1984). Manipulation of stimulus onset delay in reading: Evidence for

parallel programming of saccades. Journal of Experimental Psychology: Human Perception &

Performance, 10, 667-682.

Murphy. (2001). Bayes Net Toolbox for Matlab 5. Available:

http://www.cs.berkeley.edu/~murphyk/Bayes/bnt.html.

Murray, W. S. (2000). Commentary on Section 4. Sentence processing: Issues and

measures. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 649-664).

Amsterdam, Netherlands: North-Holland/Elsevier Science Publishers.

O'Regan, J. K. (1990). Eye-movements and reading. In E. Kowler (Ed.), Eye movements

and their role in visual and cognitive processes (pp. 395-453). Amsterdam: Elsevier.

O'Regan, J. K., & Jacobs, A. M. (1992). Optimal viewing position effect in word

recognition: A challenge to current theory. Journal of Experimental Psychology: Human

Perception & Performance, 18, 185-197.

Perl, J. (2000). Causality: Models, reasoning, and inference. Cambridge, UK: Cambridge.

Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in

speech recognition. Proceedings of the IEEE, 77.

Radach, R., & McConkie, G. W. (1998). Determinants of fixation positions in words

during reading. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp. 77-

100). Oxford, England UK: Anonima Romana.

Rayner, K. (1986). Eye movements and the perceptual span in beginning and skilled

readers. Journal of Experimental Child Psychology, 41, 211-236.

Rayner, K. (1995). Eye movements and cognitive processes in reading, visual search, and

scene perception. In J. M. Findlay & R. Walker (Eds.), Eye movement research: Mechanisms,

processes and applications. Studies in visual information processing, 6 (pp. 3-22). Amsterdam,

Netherlands: Elsevier Science Publishing Co, Inc.

Rayner, K. (1998). Eye movements in reading and information processing: 20 years of

research. Psychological Bulletin, 124, 372-422.

Rayner, K., & McConkie, G. W. (1976). What guides a reader's eye movements? Vision

Research, 16, 829-837.

Rayner, K., & Pollatsek, A. (1989). The psychology of reading. Englewood Cliffs, N.J.:

Prentice Hall.

Rayner, K., Reichle, E. D., & Pollatsek, A. (1998). Eye movement control in reading: An

overview and model. In G. Underwood (Ed.), Eye guidance in reading and scene perception (pp.

243-268). Oxford, England UK: Anonima Romana.

Reichle, E. D., Pollatsek, A., Fisher, D. L., & Rayner, K. (1998). Toward a model of eye

movement control in reading. Psychological Review, 105, 125-157.

Reichle, E. D., Rayner, K., & Pollatsek, A. (1999). Eye movement control in reading:

Accounting for initial fixation locations and refixations within the E-Z Reader model. Vision

Research, 39, 4403-4411.

Reilly, R. (1993). A connectionist framework for modeling eye-movement control in

reading. In G. d'Ydewalle & J. Van Rensbergen (Eds.), Perception and cognition: Advances in

eye movement research. Studies in visual information processing (Vol. 4, pp. 193-212).

Reilly, R. G., & O'Regan, J. K. (1998). Eye movement control during reading: A

simulation of some word-targeting strategies. Vision Research, 38, 303-317.

Schilling, H. E. H., Rayner, K., & Chumbley, J. I. (1998). Comparing naming, lexical

decision, and eye fixation times: Word frequency effects and individual differences. Memory &

Cognition, 26, 1270-1281.

Shillcock, R., Ellison, T. M., & Monaghan, P. (2000). Eye-fixation behavior, lexical

storage, and visual word recognition in a split processing model. Psychological Review, 107,

US: American Psychological Assn.

Stark, L. (1994). Sequences of fixations and saccades in reading. In J. Ygge & G.

Lennerstrand (Eds.), Eye Movements in Reading (pp. 135-161). Tarrytown, NY: Pergamon.

Stark, L., & Ellis, S. (1981). Scanpaths revisited: cognitive models direct active looking.

In R. A. Monty & J. W. Senders (Eds.), Eye movements, cognition and visual perception (pp.

193-226). Hillsdale, NJ: Erlbaum.

Suppes, P. (1990). Eye-movement models for arithmetic and reading performance. In E.

Kowler (Ed.), Eye movements and their role in visual and cognitive processes (Vol. 4, pp. 455-

477). Amsterdam: Elsevier.

Suppes, P. (1994). Stochastic models of reading. In J. Ygge & G. Lennerstrand (Eds.),

Eye movements in reading (pp. 349-364). Oxford, England: Pergamon Press.

Suppes, P., & et al. (1983). A procedural theory of eye movements in doing arithmetic.

Journal of Mathematical Psychology, 27, 341-369.

Taylor, S. E. (1965). Eye movements in reading: Facts and fallacies. American

Educational Research Journal, 2, 1965, 187-202.

Thibadeau, R. (1983). CAPS: A language for modeling highly skilled knowledge-

intensive behavior. Behavior Research Methods, Instruments, & Computers, 15, 300-304.

Thibadeau, R., Just, M. A., & Carpenter, P. A. (1982). A model of the time course and

content of human reading. Cognitive Science, 6, 101-155.

Titterington, D. M., Smith, A. F. M., & Makov, U. E. (1985). Statistical analysis of finite

mixture distributions. Chichester ; New York: Wiley.

van Gisbergen, J. A. M., Gielen, S., Cox, H., Brujins, J., & Schaars, K. H. (1981).

Relation between metrics of saccades and stimulus trajectory in visual target tracking:

implications for models of the saccadic system. In A. F. Fuchs & W. Becker (Eds.), Progress in

oculomotor research. North Holland: Elsevier.

Vitu, F., & McConkie, G. W. (2000). Regressive saccades and word perception in adult

reading. In A. Kennedy & R. Radach (Eds.), Reading as a perceptual process (pp. 301-326).

Vitu, F., McConkie, G. W., & Zola, D. (1998). About regressive saccades in reading and

their relation to word identification. In G. Underwood (Ed.), Eye guidance in reading and scene

Wagner, R. A., & Fischer, M. J. (1974). The string-to-string correction problem. Journal

of Association of Computing Machinery, 21, 168-173.

Walker, R., Kentridge, R. W., & Findlay, J. M. (1995). Independent contributions of the

orienting of attention, fixation offset and bilateral stimulation on human saccadic latencies.

Experimental Brain Research, 103, 294-310.

Wolfe, J. H. (1971). A Monte Carlo study of sampling distribution fo the likelihood ratio

for mixtures of multinormal distributions (Technical Bulletin STB 72-2). San Diego, CA: U.S.

Naval Personnel and Training Research Laboratory.

Yang, S.-N., & McConkie, G. W. (in press). Eye movements during reading: A theory of

saccade initiation times.

Zangemeister, W. H., Sherman, K., & Stark, L. (1995). Evidence for a global scanpath

strategy in viewing abstract compared with realistic images. Neuropsychologia, 33, 1009-1025.

CURRICULUM VITAE

Biographical Information

Name: Gang Feng

Date of Birth: March 16, 1968

Place of Birth: Beijing, China

Education

2001 Ph.D. University of Illinois at Urbana-Champaign Department of Psychology Major area: Developmental Psychology Minor area: Quantitative Psychology 1999 M.S. University of Illinois at Urbana-Champaign Department of Statistics 1998 M.A. University of Illinois at Urbana-Champaign Department of Psychology 1990 B. Edu. Beijing Normal University, Beijing, China Department of Psychology

Awards and Honors

1999-2000 Beckman Institute Graduate Fellow

1999 Cognitive Science/AI Summer Fellowship, UIUC

1990 Honor Graduate, Beijing Normal University

1986-1990 Government fellowships, Beijing Normal University

Research Experience

1999 - 2000 Beckman Institute Graduate Fellow, Beckman Institute, UIUC

Summer, 1999 CogSci/AI Steering Committee Summer Fellowship, UIUC

Summer, 1998 Data Analyst, Center for Reading Research, UIUC

1994 - 2000 Research Assistant, Beckman Institute, UIUC

1990 - 1994 Assistant Researcher, Institute of Psychology, Chinese Academy of

Sciences

Teaching Experience

1998-1999 Teaching Assistant, Child Psychology

1996-1997 Teaching Assistant, Research methods in developmental psychology

Publications

Feng, G., Miller, K. F., Zhang, H., & Shu, H. (2001). Towed to recovery: the use of

phonological and orthographic information in reading Chinese and English. Journal of

Experimental Psychology: Learning, Memory, and Cognition, 27, 1079-1100.

Kelly, M., Miller, K., Fang, G., & Feng, G. (1999). When Days Are Numbered:

Calendar Structure and the Development of Calendar Processing in English and Chinese.

Journal of Experimental Child Psychology, 73, 289-314.

Feng, G. (1998). Homophone confusion in reading English and Chinese. Unpublished

master’s thesis, University of Illinois at Urbana-Champaign.

Fang, G., Fang, F., & Feng, G. (1995). A comparative study of elementary school

students’ mathematics achievement and motivations. Chinese University of Hong Kong

Elementary Education, 2, 51-56.

Fang, G., Feng, G., Fang, F., & Jiang, T. (1994). Preschoolers' estimation of time

duration and their cognitive strategies. Psychological Science (China), 17, 3-9.

Fang, G., Feng, G., Jiang, T., & Fang, F. (1993). Time duration estimated by preschoolers

and their strategies. Acta Psychologica Sinica, 25, 346-352.

Feng (2001) - Dissertation

Documents

Transcript of Feng (2001) - Dissertation

DCSP-12 Jianfeng Feng Jianfeng.feng@warwick.ac.uk feng/dsp.html.

Leisure and Cultural Services Department | 康樂及文化事務署 · Chen Fushan Chen Yusheng Ding Yanyong Fang Zhaoling Feng Kanghou Feng Mingqiu Feng Yongji Feng Zhanhua (Feng

FENG SHUI AND CONTEMPORARY ENVIRONMENTAL … CHAPTER THREE FENG SHUI AND CONTEMPORARY ENVIRONMENTAL DESIGN PRINCIPLES This part of the dissertation reviews feng shui theories, its

Experimental and Simulation Study of Resistive Switches ... · Experimental and Simulation Study of Resistive Switches for Memory Applications By Feng Pan A dissertation submitted

B. Olsen Final Dissertation 7.29.11 v2leadmorecontrolless.com/...Dissertation-7-29-11-v2.pdf · B.A., North Park University, 2001 Dissertation Proposal Submitted in Partial Fulfillment

Spatial Aggegation - NISTJDevaney/CommKnow/mar2001/feng... · 2001-04-17 · Spatial Aggegation Feng Zhao Xerox Palo Alto Research Center ... An Example of Physical Fields A fluid

PhD 006B DOUBLET Louis PhD TAMU Dissertation Vol 2 (Dec 2001)

Welcome to KFUPM ePrints - KFUPM ePrints · Subject: A UMI Dissertation Keywords: UMI Co. Dissertation # 1404203 Created Date: 9/10/2001 4:25:15 PM

The God of Small Things Class and Gender Divisions Pin-chia Feng Spring 2001 Kate Liu Fall 2002.

Welcome to Spectrum: Concordia University Research ... · Subject: A UMI Dissertation Keywords: UMI Co. Dissertation # MQ54280 Created Date: 4/10/2001 9:11:44 AM

Dissertation 2001

PhD Dissertation OBJECTIVE FACTORS OR …...1 OBJECTIVE FACTORS OR SUBJECTIVE PREFERENCES? DETERMINANTS OF VOTING BEHAVIOR IN POST-COMMUNIST ROMANIA (1995-2001) Abstract This dissertation

Chapter 4 Tri Sh Morrow Dissertation 2001

Chapter 6 Tri Sh Morrow Dissertation 2001

Essays on Nonfinancial Performance Measurement, Relative ...d-scholarship.pitt.edu/9921/1/schloetzerjd_ETD2008.pdf · I thank the members of my dissertation committee - Mei Feng,

INVESTIGATION OF SOY PROTEIN BLENDS · PDF fileINVESTIGATION OF SOY PROTEIN BLENDS PREPARED BY SIMULTANEOUS PLASTICIZATION AND MIXING By FENG CHEN A dissertation submitted in

Admission Control and Routing in Multi-priority …vkulkarn/PhD/ChenDissertation.pdfAdmission Control and Routing in Multi-priority Systems by Feng Chen A dissertation submitted to

New statistical potentials for improved protein structure ...New statistical potentials for improved protein structure prediction by Yaping Feng A dissertation submitted to the graduate

Curriculum vitae - alex.francois.free.fralex.francois.free.fr/data/AlexFrancois_CV_2018_e.pdf · Academic curriculum - 3 - Dissertation defence 19 December 2001: "Très honorable,

CURSO DE FENG SHUI (PRIMER NIVEL) - Cursos …centro-maya.com/programas/PROGRAMA-FENG-SHUI-COMPLETO.pdf · CURSO DE FENG SHUI (PRIMER NIVEL) ¿QUÉ ES EL FENG SHUI? El Feng Shui es