Two new readability predictors for the professional writer: pilot trials

18
Journal of Research in Reading, ISSN 0141-0423 Volume 21, Issue 2, 1998, pp 121–138 Two new readability predictors for the professional writer: pilot trials Sandra Harrison University of Coventry, UK Paul Bakker Eindhoven University of Technology, The Netherlands ABSTRACT A combination of two new, but very different, approaches to the prediction of readability could be particularly valuable in evaluating English texts written for native and non-native speakers. One approach develops the concept of lexical density, whereas the other is a novel measurement of two mechanical variables of text. The paper first outlines the traditional approach to readability. Next, it explains the concept of lexical density, and presents the results of a pilot study into reader preferences for different levels of density in text. It then offers an alternative approach to readability that uses sentence and ‘packet’ lengths. Finally, it indicates how the two approaches have been combined into a computer software program, and suggests the direction of future work. Our conclusions are threefold. Firstly, the lexical density of text is a better indicator of its readability than the scores given by many of the more common readability formulae. Secondly, the effective breaking up of sentences into ‘packets’ is as important to readability as sentence length. Finally, in looking at the mechanical variables of texts, we should not only be concerned with averages, but with distributions and most frequently occurring values. RE ´ SUME ´ Deux nouveaux pre ´dicteurs de lisibilite´ pour les professionnels de l’e´criture: essais pilotes Une combinaison de deux nouvelles, mais tre`s diffe´rentes techniques de la pre´diction de la lisibilite´ pourrait se re´ve´ler particulie`rement valable pour e´valuer des textes anglais e´crits a` l’intention de locuteurs dont l’anglais est ou n’est pas la langue maternelle. Une technique de´veloppe le concept de densite´ lexicale, tandis que l’autre # United Kingdom Reading Association 1998. Published by Blackwell Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA

Transcript of Two new readability predictors for the professional writer: pilot trials

Page 1: Two new readability predictors for the professional writer: pilot trials

Journal of Research in Reading, ISSN 0141-0423Volume 21, Issue 2, 1998, pp 121±138

Two new readability predictors for theprofessional writer: pilot trialsSandra Harrison

University of Coventry, UK

Paul Bakker

Eindhoven University of Technology, The Netherlands

ABSTRACT

A combination of two new, but very different, approaches to the prediction ofreadability could be particularly valuable in evaluating English texts written fornative and non-native speakers. One approach develops the concept of lexicaldensity, whereas the other is a novel measurement of two mechanical variables oftext. The paper first outlines the traditional approach to readability. Next, it explainsthe concept of lexical density, and presents the results of a pilot study into readerpreferences for different levels of density in text. It then offers an alternativeapproach to readability that uses sentence and `packet' lengths. Finally, it indicateshow the two approaches have been combined into a computer software program, andsuggests the direction of future work.Our conclusions are threefold. Firstly, the lexical density of text is a better

indicator of its readability than the scores given by many of the more commonreadability formulae. Secondly, the effective breaking up of sentences into `packets' isas important to readability as sentence length. Finally, in looking at the mechanicalvariables of texts, we should not only be concerned with averages, but withdistributions and most frequently occurring values.

REÂSUMEÂ

Deux nouveaux preÂdicteurs de lisibilite pour les professionnels de l'eÂcriture: essaispilotes

Une combinaison de deux nouvelles, mais treÁ s diffe rentes techniques de la pre dictionde la lisibilite pourrait se re ve ler particulieÁ rement valable pour e valuer des textesanglais e crits aÁ l'intention de locuteurs dont l'anglais est ou n'est pas la languematernelle. Une technique de veloppe le concept de densite lexicale, tandis que l'autre

# United Kingdom Reading Association 1998. Published by Blackwell Publishers, 108 Cowley Road,Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA

Page 2: Two new readability predictors for the professional writer: pilot trials

est une nouvelle mesure de deux variables me caniques du texte. Ce texte traced'abord les grandes lignes de l'approche traditionnelle de la lisibilite . Il expliqueensuite le concept de densite lexicale, et pre sente les re sultats d'une e tude piloterelative aux pre fe rences du lecteur pour diffe rents niveaux de densite des textes. Iloffre ensuite une approche alternative de la lisibilite pour des longueurs telles que laphrase ou le `paquet'. Il indique enfin comment les deux approches ont e te combine esdans un logiciel et pre sente des suggestions pour des travaux aÁ venir.Nos conclusions sont triples. PremieÁ rement, la densite lexicale du texte est un

meilleur indicateur de sa lisibilite que les scores que fournissent beaucoup desformules plus courantes de lisibilite . DeuxieÁ mement, la de composition des phrasesen `paquets' est aussi importante pour la lisibilite que la longueur de la phrase. Enfin,en regardant les variables me caniques des textes, nous devrions nous inte resser nonseulement aux moyennes mais aussi aux distributions et aux valeurs qui apparaissentle plus souvent.

INTRODUCTION

Readability formulae have been used for much of this century to predict the level ofdifficulty of text. Reactions to such formulae are highly varied: their use has beenmandatory in some documentation contracts (particularly for the US government),yet their value has been questioned by many users. This research addresses threeimportant questions: (1) can we devise more meaningful predictors of readability?(2) is it possible to devise formulae and/or models which would provide positivefeedback to writers in revising texts? and (3) could these objectives be realised in autility program running within a PC wordprocessor?Definitions of readability are of two kinds. One is to take into account all the

elements which might affect the ease with which readers can comprehend a text:`writing is readable to the extent that it provides the information they need, locatedwhere they can quickly find it, in a form in which they can easily use it' (Huckin andOlsen, 1991). This is a comprehensive definition, which leads to a discussion of newand given information, text structure, paragraph design, etc. Another definition, byTurk and Kirkman (1989), identifies `three components: writer, text and readers'.The writer affects readability by `careful selection of material, by organization,signposting and variation of emphasis'. The text affects readability by `the language(structures and vocabulary) and the physical appearance (layout, headings, whitespace)'. Finally, readers themselves affect readability by their motivation andattitudes (Klare and MacDonald-Ross, 1981). This suggests that there are very manyfactors that we might consider, and indeed Bormuth investigated over 150 factorsthat influence readability (Selzer, 1983). This comprehensive view of readabilitypoints the way forward to teaching writers how to write, and the above definitions byHuckin and Olsen and by Turk and Kirkman are taken from books that were writtenfor this purpose.The other approach to readability is to look for a way to give writers a quick

approximate guide to the level of difficulty of the texts they write. Traditionalmethods of assessing the readability of texts are based on the counting of sentences,words, syllables and characters. The counts are entered into a formula to give anindication of reading ease (the Gunning Fog Index, the Flesch Reading Ease Scale,

# United Kingdom Reading Association 1998

122 HARRISON AND BAKKER

Page 3: Two new readability predictors for the professional writer: pilot trials

etc.). The result is an indication of the reading attainment level which the user of thedocument needs to understand it. This process has been automated, and variousreadability scores can now be calculated from within a wordprocessor such asMicrosoft's Word. This kind of approach does not claim to take all possible factorsinto account. Indeed, even if it were possible to create an `expert system' computerprogram to do this, such a program might run too slowly to be of practical use.Instead we need to identify significant factors which might give an approximate guideto the level of difficulty of the text ± factors which are restricted to the surfacefeatures of a text (in other words, those that ignore textual structure, spacing, readermotivation, etc.). It is in this sense of an approximate guide that this paper considersdescriptors and predictors of readability.An interesting alternative to traditional readability formulae is the cloze

procedure, in which the text under evaluation is prepared by removing every nthword (usually every fifth word) and subjects of known reading age are asked tosupply the missing items. Taylor (1953) reports positive results in comparison to twotraditional readability formulae, while later researchers have also found it to be asuccessful technique (for example, Nystrand, 1979, who uses it to evaluate studentwriting, and Guillemette, 1989, who uses it to evaluate an IEEE Standard). However,this procedure requires special preparation of the text and a representative sample ofsubjects of known reading age to carry out the evaluation. It is therefore costly andtime-consuming to administer, and unlike traditional readability formulae cannot beused by individual writers as a quick approximate guide to the readability of theirtexts.There are many different traditional readability formulae. (Even as long ago as

1963, Klare summarised and compared 31 formulae). Most follow the same basicpattern: a constant, plus a weighted word factor, such as the number of long words,plus a weighted sentence factor, such as the number of words per sentence (Duffy,1985). Some studies have found the clause a more useful unit than the sentence, forexample Coleman (1965) shows that clause length is a better indicator of readingdifficulty than sentence length. The concept of `packets', used in the second pilotstudy described in this paper, has some similarities with this finding. The differencesbetween the traditional formulae are mostly a matter of emphasis: each is designedfor optimum accuracy in a specific context or for a specific group of users ± forexample, for users in the armed forces, or for use by children in a specific age range.The usefulness of traditional readability formulae has been questioned by writers

who feel coerced into breaking up sentences simply to get a better result from aformula, and by readers who have to struggle with passages that are full of short butunfamiliar words. Indeed, Hartley (1994) warns that `if one just simplifies text bysplitting sentences, removing connectives, and simplifying multi-syllabic words, thenthe resulting text is likely to be stilted, lacking in clear organization and, in fact,harder to read'. However, there has been a dearth of recent research in this area, andtraditional readability formulae continue to be provided for use within wordproces-sing packages in the absence of better alternatives. (An explanation of the scoresavailable in Microsoft Word is provided by Bakker and Hinson, 1997.)Readability formulae are also used because they are perceived as `objective'

(Duffy, 1985). A maximum readability score may be specified in contracts, or byorganisations to their writers. In this situation, it is clearly important that theformula is meaningful to the writer who is using it.

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 123

Page 4: Two new readability predictors for the professional writer: pilot trials

Existing readability formulae are problematic for several reasons:

Lack of clarity about the purpose. Readability formulae are often used as guidelinesfor revision, even though most were intended not as guidelines but as predictors oftext difficulty. However, prediction itself can be subdivided into (1) the ranking oftexts according to difficulty, and (2) the prediction of actual reading age. Duffy(1985) finds that the use of readability formulae to rank texts is generally successful,but the prediction of actual reading age is less accurate. Moreover, different formulaeproduce different results on the same text: Stokes (1978), after a comparison of sevenreadability formulae, concludes that `no formula consistently predicted a similargrade level to any other formula'.

The questionable validity of prescribing short sentences. It is natural for a writer, facedwith a readability score which rates a text as `too difficult', to attempt to resolve theproblem by shortening the words and sentences, especially when a specific readabilitylevel is required in a contract. But even as early as 1963, Klare proclaimed that`merely chopping sentences in half, or substituting one word for another, will resultin a better formula score but probably in poorer writing as well'. There is someresearch to show that sentences can be too short: `Pearson found that combiningsentences and therefore increasing sentence length can improve comprehension if thetwo sentences are causally related' (Selzer, 1983). And Huckin (1983) argues thatsubject experts can be misled by simplified vocabulary and need `proper use ofstandard terminology' even when this consists of `long strings of compound nouns'.

Readability formulae ignore the reader, the reader's reading style and the context.Klare (1963) summarised this problem: `If it is not readable to an intended reader it isnot readable, no matter how good a formula score it may receive'. Readers use textsfor a wide range of reasons: to get an overview of a subject, to answer specificquestions, to carry out a task, etc. They may read sequentially, or dip into the textwhere they think it might be useful; they may skim read, search for specificinformation, or study in depth. They may be subject experts or novices; well-motivated or reluctant readers. Existing readability formulae do not (and cannot)take such factors into account.

Lack of clarity about the meaning of readability scores. It is probable that most usersof readability formulae are not fully aware of what the scores mean. If a passage hasa Reading Grade Level (RGL) of 7, does this mean that a person with an RGL of 7would be able to understand 100% of the text, or 75%, or 50%? And what is meantby `understand'? Would readers be able to remember the content after the text hadbeen taken away, or be able to carry out an action with the text in front of them, orachieve a specific rating on a certain type of test? It is often possible to state veryprecisely what is meant by a specific score. Duffy (1985) gives an example: `if a texthas a 10.0 RGL score on the Kincaid-Flesch formula . . . it means that at least 50% ofthe readers with a 10.0 RGL or higher on the Gates-MacGinitie test may be expectedto score at least 35% on a Cloze test of the text.' However, most writers would havedifficulty in relating this to real world tasks.

Nevertheless, people want readability formulae. Some employers and clients puttheir trust in them because the results are replicable and therefore assumed to be

# United Kingdom Reading Association 1998

124 HARRISON AND BAKKER

Page 5: Two new readability predictors for the professional writer: pilot trials

valid. Some writers want them too: in a small informal survey carried out at atechnical communication conference (Forum 95, Dortmund, November 1995), 18 of24 respondents said they would like to use a readability formula ± for checking thelevel of difficulty of their text and for revising text to an appropriate level ofdifficulty. At the 1993 Institute of Scientific and Technical Communicatorsconference in the UK, a poll of 80 lecture delegates showed that over 60% wantedgood, straightforward scoring tools, but under 10% used those currently available inwordprocessors. Our aim in this research was to devise a system which is simple tocalculate yet of more practical use than existing formulae.

In this paper we report the results of two pilot tests. Experiment 1 is based on thepremise that writers can consciously manipulate the lexical density of the texts theyare writing. Experiment 2 uses the concept of information `packets'. Theseexperiments were carried out independently, and are therefore described separatelybelow.

EXPERIMENT 1: LEXICAL DENSITY TESTS

Method

Background

The first approach to readability which this paper proposes is based on the idea thatgeneral readers (as distinct from technical experts) will find written text easier toengage with if it is written using language structures which convey a more oral style.(This style is described by Ulijn and Strother, 1995, as `l'e crit oralise ' ± written orallanguage). One factor which shows a marked difference between written and spokenlanguage is lexical density, as defined by Halliday (1989). Words can be classedas lexical items (content words) and grammatical items (also known as structuralitems ± these are function words such as articles and pronouns). In general, writtendiscourse contains a higher ratio of lexical items to grammatical items than doesspoken discourse, and more lexical items per clause than spoken discourse. In otherwords, in writing there is a higher level of information content for a given numberof words.Halliday proposes three definitions of lexical density. At the simplest level, lexical

density is `the number of lexical items as a proportion of the number of runningwords'. However, he argues that a more useful definition is the number of lexicalitems per clause. (In the current paper, `clause' is taken to include both finite andnon-finite clauses.) Finally, Halliday suggests a more sophisticated definition whichwould involve subdividing the lexical items, giving a low weighting to words whichare more frequent in occurrence and a higher weighting to words which are lessfrequent. Tables of frequently occurring words are readily available and are alreadyused in readability formulae (e.g. Dale and Chall, 1948). A weighted lexical densityformula would be derived from three factors: the number of words per clause, thenumber of familiar lexical items per clause, and the number of unfamiliar lexicalitems per clause.Halliday (1989) has shown that it is possible to reduce the lexical density of written

texts in order to make them more readily understandable (as speech is), without at

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 125

Page 6: Two new readability predictors for the professional writer: pilot trials

the same time resorting to the contractions and colloquialisms of actual speech. Hegives the following example:

The use of this method of control unquestionably leads to safer and faster trainrunning in the most adverse weather conditions.

This sentence has one clause and contains twelve lexical items (which we haveitalicised), so using Halliday's second definition (the number of lexical items perclause) it has a lexical density of 12. Our wordprocessor gave this sentence a FleschReading Grade of 13.2, and a Gunning Fog Index of 12.2.

If this method of control is used trains will unquestionably run more safely andfaster even when the weather conditions are most adverse.

This sentence has three clauses and eleven lexical items, giving a lexical density of 3.7(again using the second definition). Our wordprocessor gave this sentence a FleschReading Grade of 12.6, and a Gunning Fog Index of 12.6.Traditional readability formulae therefore give the second version almost the same

score as the first: the second version is found to be slightly easier than the first by theFlesch scale, and slightly more difficult than the first by the Gunning Fog Index.However, we suggest that most readers would find the second version more im-mediately understandable than the first version, and indeed the second version has amuch lower lexical density.A simple lexical density calculation, such as the one used above, can be used to

compare the density of texts. If the texts contain similar lexical words, this com-parison will also show which version is more accessible to the reader.To examine the relationship (if any) between simple lexical density, conventional

readability scores and perceived readability, a small-scale pilot study was carried out.Our hypothesis was that perceived readability might be more closely correlated withlexical density than with conventional readability scores.

Subjects

Fourteen subjects took the tests. They were all university students on a technicalcommunication degree course, and were aged between 18 and 40, with an equal male-female mix. They came from a range of backgrounds, the younger ones having cometo university directly after leaving school, and the more mature students havingworked in a range of fields including business and engineering. All were nativespeakers of English.

Materials, procedure and scoring

Subjects were given tests containing pairs of passages and, for each pair, were askedto tick a box to indicate which passage out of that pair they found easier to read, orwhether they found that the two passages were at about the same level of difficulty.We were asking for a subjective perception of difficulty. We expected that, in general,the subjects would understand the vocabulary used in the test passages, but that they

# United Kingdom Reading Association 1998

126 HARRISON AND BAKKER

Page 7: Two new readability predictors for the professional writer: pilot trials

would feel differently about the passages according to the level of lexical density.They were not given any information about the concept of lexical density before thetests took place.For each of these comparisons, a short extract was taken from an existing text,

then re-written to provide a text of similar content and vocabulary but at a differentlevel of lexical density from the original. The original texts were a range ofprofessional, technical, and academic texts.Two versions of the tests were prepared. Each test had the same pairs of texts, and

each test had the denser text first in some pairs, and the less dense text first in otherpairs. The difference between the first and second version was that within each pairthe order was reversed in the second version. Half of the subjects worked on the firstversion, and half on the second version. This was done to cancel out any effectswhich might have resulted from the order of presentation of the passages.We believed that altering the lexical density of a text could affect the reader's

perception of its readability, irrespective of the perceived ease or difficulty of thevocabulary. To verify this, after the subjects had finished the test they were askedto read through the passages again and underline any words which they did notunderstand.The five pairs of samples used in the test are shown in Table 1.In Table 1, the less dense passage is given first in each of the pairs in order to

facilitate comparison between the passages. As explained above, this was not the casein the test. Lexical words are shown in the table in italics. These lexical words werenot italicised on the test papers. Some words have attributes of both lexical andgrammatical items: with such words, it is necessary to make an arbitrary decision.The presence or absence of italics shows our decision in individual cases.

Results and discussion

Results of the five comparisons are presented in Table 2. In each comparison, thelexical density has been calculated according to Halliday's second definition (numberof lexical items per clause). The Flesch Reading Grade, Flesch-Kincaid and GunningFog readability scores were calculated using the `Grammar' function of MicrosoftWord.To establish whether simple lexical density or readability formulae were the better

predictors of readability in the above five tests, we performed regression analyseson the data from the five comparisons and plotted the results. For the first chart(Figure 1), we calculated the dependent variable of preference as a percentage andregressed this on the simple lexical density scores. This gave a reasonably good fit toa straight line, with a correlation coefficient of 0.77. For the second chart (Figure 2)we calculated the dependent variable of preference as a percentage and regressed thison the simple average of the three conventional readability scores. This gave a poorfit to a straight line, with a correlation coefficient of only 0.36.

Comparison 1: Passages A and B. The scores given by the readability formulae forthese two passages vary by about three years. Nevertheless, each formula givesalmost the same score for both passages, indicating that the level of difficulty isapproximately the same for both. However, the lexical density score for the second ismuch higher than for the first, suggesting that the second might be less easily

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 127

Page 8: Two new readability predictors for the professional writer: pilot trials

Comparison 1

A `A good theory reveals patterns in the observations we make. It enables us to reduce many

complex facts to a few facts and to identify the interactions among them.' [Kalat, 1990, Introduction

to Psychology. Second edition, Wadsworth, p. 23.]

B A good theory reveals patterns in our observations. It enables the replacement of many complex

facts by a small number of facts and their interactions.

Comparison 2

C If you are directly in front of the remote control sensor, you can use the remote control unit at a

distance of up to approximately 7 metres. But this distance will be shorter if there are obstacles in the

way or if you operate the remote control unit from an angle.

D `The remote control unit can be used a straight distance of approximately 7 metres from the

remote control sensor, but this distance will be shorter if there are obstacles in the way or if the remote

control unit is operated from an angle.' [Denon, Personal Component System Manual, p. 10.]

Comparison 3

E `Your VCR is fitted with a counter which can be used to identify the start and end of a recording

in the middle of a tape. This is particularly useful when you have made more than one recording on

the same tape.' [Matsui, How to use your new Video Recorder, p. 30.]

F Your VCR is fitted with a counter for the identification of the start and end of a mid-tape

recording. This is particularly useful for tapes with more than one recording.

Comparison 4

G `A further complication is the effect of the atmosphere which preferentially absorbs certain

wavelengths out of the incoming radiation. In particular, the very high energy component in the ultra-

violet is almost completely absorbed. This means that the apparent solar temperature as seen at the

surface of the earth is slightly lower than that as seen at the top of the atmosphere. The reason why it

does not feel so hot is because the energy density is low, and because much of the heat is removed

by convection currents.' [McMullan, Morgan and Murray, Energy Resources. Second Edition (1983)

Edward Arnold, p. 70.]

H The preferential absorption by the atmosphere of certain wavelengths out of the incoming

radiation is a further complication. In particular, the very high energy component in the ultra-violet is

almost completely absorbed. Thus the apparent solar temperature observable at the surface of the

earth is slightly lower than that observable at the top of the atmosphere. It feels less hot because of the

low energy density and because of the removal of most of the heat by convection currents.

Comparison 5

J The Internet has grown very rapidly, and is now about to be commercialised. As a result, industry

and the universities have had to come to terms with the technology which they need in order to use

the Internet.

K `In both industry and academia the impact of the growth of the Internet, and its impending

commercialisation, has driven a rapid broad-based engagement with the technology.' [Forum 95,

Preseedings.]

Table 1. The five pairs of samples used in the test.

# United Kingdom Reading Association 1998

128 HARRISON AND BAKKER

Page 9: Two new readability predictors for the professional writer: pilot trials

accessible to the reader than the first. Subjects showed a very marked preference forthe less dense passage. This would be expected, because there is a marked differencein lexical density between the two passages (difference=4).

Subjects found none of the vocabulary in these passages difficult to understand.

Comparison 2: Passages C and D. All three readability formulae suggest a dramaticincrease in difficulty in the second passage (an increase of reading age by 4.9 years,7.8 years, and 6.6 years respectively) because the passage is now one sentence insteadof two, giving an increase in sentence length, and also because the active verb in thefirst passage (`you operate'), is in a passive form in the second (`is operated'). Thereis an increase in lexical density, but this is less dramatic and results mainly froma decrease in the number of clauses from five to four. Seven subjects showed apreference for the less dense text, but five subjects found the passages to be at aboutthe same level of difficulty. The traditional readability formulae show a markeddifference in reading age, and using these alone we would expect the subjects to findthe second passage more difficult. However, the lexical density scores, which aremuch closer (difference=0.85) than in Comparison 1, would lead us to predict thatsome subjects would rate the passages as equal in difficulty.

Subjects found none of the vocabulary in these passages difficult to understand.

Table 2. Summary of results of Experiment 1.

Conventional readability scores Lexical items

per clause

Reader

preferences

Flesch Reading

Grade

Flesch Kincaid Gunning Fog

1 A 12.3 9.5 12.6 3 9

B 12.3 9.1 13.0 7 4

1 (same)

2 C 10.5 11.4 13.3 4.4 7

D 15.4 19.2 19.9 5.25 2

5 (same)

3 E 8.3 9.1 12.2 3.4 10

F 11.9 9.5 11.3 6.5 4

0 (same)

4 G 13.8 12.5 16.5 3.45 7

H 15.2 13.3 18.5 8.75 1

6 (same)

5 J 11.8 10.4 12.8 2.17 14

K 17.0 17.2 19.6 13 0

0 (same)

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 129

Page 10: Two new readability predictors for the professional writer: pilot trials

Figure 1. Perceived readability as a function of simple lexical density.

Figure 2. Perceived readability as a function of average readability formulae scores.

# United Kingdom Reading Association 1998

130 HARRISON AND BAKKER

Page 11: Two new readability predictors for the professional writer: pilot trials

Comparison 3: Passages E and F. The readability formulae give different scores.Flesch suggests an increase in difficulty of 3.6 years for the second passage, Flesch-Kincaid shows little change with an increase in difficulty of just 0.4 years, andGunning Fog shows the second passage as easier by 0.9 years. The lexical density isgreater for the second passage, and there is a strong preference for the less dense text,giving a clear correlation between the lexical density scores and perceived readability.The Flesch-Kincaid and Gunning Fog scores would lead us to assume that thesepassages were about equal in difficulty.

Subjects found none of the vocabulary in these passages difficult to understand.

Comparison 4: Passages G and H. Here all three readability formulae show a slightincrease in difficulty (Flesch 1.4 years, Flesch-Kincaid 0.8 years, Gunning Fog 2years). However, there is a marked increase in lexical density, which suggests thatthe second passage is significantly harder to read than the first. The results showthat seven subjects preferred the less dense text, while only one preferred the densertext. This corresponds well to the lexical density scores (difference=5.3). But sixsubjects rated the passages as `Both the same'. A possible explanation is that readersperceived the content of these two passages to be the most difficult of the fivecomparisons, as indicated by the fact that several subjects underlined difficult words.Perhaps some subjects did not understand either passage completely, and so ratedthem equally.

Comparison 5: Passages J and K. Here all three readability formulae show that thesecond passage is much more difficult than the first (Flesch 5.2 years, Flesch-Kincaid6.8 years, Gunning Fog 6.8 years). There is a similar marked difference in lexicaldensity (10.83). Thus all the scores show that passage K was much more difficultthan passage J, and accordingly all subjects felt that the first passage was easier.

Two words were highlighted as being difficult: three subjects did not understand`broad-based', and one did not understand `academia'. Both of these words werefound only in the second passage K.

Conclusions from our pilot Lexical Density tests

The regression analyses on the results of this pilot study show clearly that subjectsperceived less lexically dense text as easier to read, even when this was at variancewith the predictions made by existing readability indicators. However, a much morecomprehensive research project would be necessary in order to determine whethera lexical density would give consistent results, and whether it could reliably predictreading difficulty. (A simple lexical density calculation such as the one used in thisresearch is not sufficient to rank passages which have different lexical content. Inorder to do this, we would need to take into account word frequency, but that isbeyond the scope of this paper.)A formula based on lexical density might be useful as a guideline for writers: it is

possible to rewrite a passage to achieve a lower lexical density by increasing both thenumber of clauses and the number of non-lexical items. However, again, furtherresearch would be necessary in order to determine whether lexical density scoringcould be used reliably in this way.

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 131

Page 12: Two new readability predictors for the professional writer: pilot trials

It is also important to note that the subjects for this pilot study were all British.Readers for whom English is a second language may react differently to those forwhom it is their first language. Indeed, informal discussions with a small number ofreaders whose first language was German suggested that when they read thesepassages some preferred the denser versions, possibly because of expectations arisingfrom their first language.

EXPERIMENT 2: THE `PACKET LENGTH' APPROACH

Method

Background

We have always had doubts about the importance attached to average sentencelength as the prime determinant of readability. Clearly it is an important factor. Butthere are good sentences that happen to be long, and impenetrable sentences thathappen to be short. Calculating the average of a variable only gives limitedinformation. In the context of sentence length, an average tells us nothing abouteither deviations or variability. If the distribution of sentence length were normal, wecould calculate a meaningful standard deviation. But it seems more likely that adistribution of sentence length would follow a Poisson distribution, since there israndomness and no theoretical upper limit to sentence length.During one of our technical writing courses, we invited a group of fifteen British

students to read once a lengthy article about the Maastricht Treaty taken from theFinancial Times newspaper. Afterwards, we asked them to guess the number of wordsin the longest sentence and the average number of words per sentence. All the writtenestimates significantly underestimated both figures. The highest guess at the longestsentence was 35 words. In fact, it was 78 words long. By most conventional rules ofgood written communication, such a sentence length is unacceptable.More recently, we conducted a test with 35 subjects on a 912-word British

newspaper passage entitled Scotland's Secret Prosperity. These subjects wereuniversity undergraduate students, 16 male and 19 female, aged between 18 and26. After reading the passage once only, subjects were invited to estimate two factors:average sentence length and the number of words in the longest sentence. Noindication was given before the test as to what would be asked of the subjects.Although the average sentence length was actually 24 words, the mean estimate was17. Even though the longest sentence was 55 words long, the mean estimate was 28,and only one subject came close to the correct figures.In both tests, when subjects read the texts carefully a second time, they did then

notice that some sentences were long by conventional standards, but were broken upeffectively by means of punctuation into what we now term `packets'. These resultstended to further confirm our impression that readers are oblivious to long sentencesin well written passages, provided that these are effectively broken up into `packets'.By a `packet', we mean a group of words between any `syntactical punctuation

marks', and by `packet length', we mean the number of words between suchpunctuation marks, excluding zero counts (caused by adjacent punctuation marks).Hyphenated words count as one word, as do abbreviations, acronyms, andquantities. Headings, captions, tables and other non-continuous prose are all

# United Kingdom Reading Association 1998

132 HARRISON AND BAKKER

Page 13: Two new readability predictors for the professional writer: pilot trials

ignored. It is important to emphasise that our definition of `packet' is purely mech-anical. Although a packet might equate to a clause, it would not necessarily do so.By `syntactical punctuation marks', we mean only those eight marks that are

mainly used to break up text into logical units for the reader, namely full-stops,commas, colons, semi-colons, exclamation marks, question marks, long (em) dashesand parentheses. We do not mean those punctuation marks that are usuallyemployed to signal (for example) contraction or speech. Grammarians will rightlypoint out that the functions of punctuation are more complex than we pretend. Somedraw a distinction between punctuation functions of `separation' and `specification'(Quirk, 1989). However, the model we propose in this paper is purely mechanical. Itis only able to find occurrences of defined punctuation marks in defined positions,not to make complex discriminations according to grammatical function.Our aim in this part of the research was to investigate `packet length' as a factor in

readability.

Materials, procedure and scoring

Any mechanical examination of readability requires a substantial corpus of uniform,readable text. We decided to conduct our research using The Economist internationalnewspaper published weekly from the United Kingdom. The Economist was chosenbecause it is written to a highly uniform style, to which texts are rigorously edited.Indeed, it prides itself on its unique style and publishes its highly-acclaimed StyleGuide (1993). Three-quarters of its 250,000 circulation is outside the UK, and manyreaders are not native speakers of English. Crown Prince Willem Alexander of theNetherlands cited it in a recent television interview as his main weekly source ofinternational news and current affairs. Only very exceptionally are the names of thewriters given. It is virtually impossible to identify substantive differences of stylebetween the anonymous articles. All these attributes combine to make The Economistan excellent subject for readability research.Taking four issues of The Economist, we counted sentence lengths across a corpus

of 58,387 words and `packet' lengths across a smaller corpus of 14,397 words.

Results

This experiment produced some surprising results. Firstly, there were (a few)readable sentences over 55 words in length. Secondly, average sentence length, atalmost 20 words per sentence, is high by standards set out in many manuals oftechnical writing.Plotting the results gave the skewed distributions shown in Figures 3 and 4. The

average sentence length was 19.7 words per sentence, with a most frequentlyoccurring value of 13 words per sentence. The average packet length was 7.74 wordsper packet, with most frequently occurring values of 2, 3 and 4 words per packet.

DISCUSSION AND DEVELOPMENTS

From the results, we began to conclude that sentence length and packet length takentogether might be a good predictor of readability. One message that stands out

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 133

Page 14: Two new readability predictors for the professional writer: pilot trials

Figure 3. Words/sentence vs. frequency in The Economist.

Figure 4. Words/packet vs. frequency in The Economist.

# United Kingdom Reading Association 1998

134 HARRISON AND BAKKER

Page 15: Two new readability predictors for the professional writer: pilot trials

clearly is that sentence length is not the prime determinant of readability. Occasionalvery long sentences (over 55 words) appear to be acceptable, provided that they areeffectively broken into `packets' by means of `syntactical punctuation'. A secondmessage is that averages are quite different from `most frequently occurring' values,since distributions for both variables are skewed.Since first obtaining these results, we have tested the approach on many other texts

using macros written by us for a wordprocessor. In general, we have found goodagreement with our models for texts that we consider readable. Texts that theauthors subjectively considered difficult to read tended to have far higher averagepacket lengths even when (as was not always the case) the average sentence length layclose to The Economist average of 19.7 words per sentence. The plots have beenparticularly affected when attempts have been made to improve conventionalreadability scores by breaking up sentences. Using our approach, it is virtuallyimpossible to achieve a smooth shift of a distribution along the horizontal axiswithout highly artificial manipulation of a text.One well-known comparison between two versions of a passage appears in the

book Effective Writing (Turk and Kirkman, 1989, pp. 17±18). These are referred toas Brown's version and Smith's version. Brown and Smith share the same content,and so comparisons between the versions are meaningful. It will be seen that thesecond version ± which is usually regarded as the more readable ± comes close to ouraverage figures for The Economist (see Table 3).

This suggests a simple guideline: aim for an average sentence length of under 20words and an average packet length of under 8 words. To evaluate texts for these twofactors, a sample size of as little as 200 words proved to be sufficient in tests using ourwordprocessor macros.

Applications

Implementing both readability models in Word for Windows

Our main goal was to devise more meaningful predictors of readability that wouldhelp readers in writing and revising text. But we also wanted approaches that couldbe implemented in a practical way on a personal computer, so that writers couldderive some real benefit from them. It seemed to us that both approaches outlined inthis paper are suited to accurate measurement by computer.

Table 3. Comparison between Brown's version, Smith's version and The Economist.

words/sentence words/packet corpus (words)

Brown's version 35.8 10.9 178

Smith's version 17.4 8.2 158 (11% decrease on Brown)

The Economist 19.7 7.74 58,387

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 135

Page 16: Two new readability predictors for the professional writer: pilot trials

During our research, the leading PC wordprocessor in terms of market share wasMicrosoft Word, so we decided to concentrate on this as a platform for ourimplementation. It has the advantage of well-documented calls, the ability to addutilities seamlessly and the calculation within existing subroutines of some of thevariables we require.We have written a utility program that runs within Word under Windows 95. This

program calculates and displays the readability descriptors introduced in this paper.Other more conventional readability scores are also displayed, and on-screen help isprovided to give explanations of the data presented to the user. This readabilityassessment tool ± named EinCov ± is available as freeware over the internet. (Todownload a copy send a message to [email protected] with only EINCOV in thesubject line and a blank message. You will then receive the current URL of where theprogram is stored.)

Future work

Concepts of lexical density presented in this paper can be extended and made moreuseful by bringing in a word frequency factor, and future work will concentrate onthis area. Our aim will be to calculate a meaningful and relevant weighted lexicaldensity calculation which is derived from three factors: the number of words perclause, the number of frequent lexical items per clause, and the number of lessfrequent lexical items per clause.On sentence length and packet length, future work will concentrate on two other

variables. Firstly, we intend to examine whether deviations around the mean in thedistributions have any effects on readability, and if so, what those effects are.Conventional wisdom in effective technical writing holds that in a self-containedpassage, sentences should start off by being short, be allowed to lengthen and thenshorten again towards the end. So, secondly, we intend to model sentence lengthvariability in complete passages, and to see whether there is any correlation betweendifferent levels of variability and readability.

CONCLUSIONS

We believe that lexical density is a powerful readability concept which can givewriters and editors valuable feedback on prose passages. We have shown through apilot study that correlation between lexical density scores and perceived readabilitycan be much higher than is the case with more conventional readability scores,although, clearly, there is more work to be done to achieve a fully robust lexicaldensity scoring system.Furthermore, we have proposed a novel approach to the mechanical prediction of

readability. Our model is based on measurement of two variables: sentence lengthand `packet' length in a large corpus of readable, stylistically consistent text takenfrom The Economist. Our conclusion is that average sentence length alone is not agood predictor of readability, but that distributions of sentence length and `packet'length taken togethermake for a better predictive model than those relying heavily onthe first variable. This model has been validated against a number of texts of varyingquality, and early results strongly support our conclusion.

# United Kingdom Reading Association 1998

136 HARRISON AND BAKKER

Page 17: Two new readability predictors for the professional writer: pilot trials

Towards the beginning of this paper, we discussed four problems affectingreadability formulae. Although we do not claim to have resolved all of these, we dofeel that we have made a useful contribution to the readability formulae debate. Tworeadability predictors have been presented. Each follows a quite different approach,but, precisely because of this difference, we feel that their individual values arefurther improved by using the approaches in combination. Our view is that theapproaches proposed may be particularly valuable for the evaluation of Englishpassages intended for non-native readers of English. We do not wish to overstate ourclaims: our models remain purely mechanical and have little to say about thecomplexities of language or reader motivation. But, used with an awareness of itslimitations, the combination of our models may serve as a good overall predictor ofreadability.Finally, our models have been implemented in a utility program to run within

Microsoft Word wordprocessing application, and this is available as freeware overthe internet. Researchers are invited to comment critically and constructively on thevalue, relevance and accuracy of the EinCov utility.

REFERENCES

BAKKER, W. P. and HINSON, D. (1997) Using and understanding readability statistics in Microsoft Word.

Communicator, 5, 20±25.

COLEMAN, E. B. (1965) Learning of prose written in four grammatical transformations. Journal of Applied

Psychology, 49, 332±341.

DALE, E. and CHALL, J. S. (1948) A formula for predicting readability: instructions. Educational Research

Bulletin, 27, 37±54.

DUFFY, T. M. (1985) Readability formulas: what's the use? In T. M. Duffy and R. Waller (Eds.) Designing

Usable Texts. London: Academic Press, Harcourt Brace Jovanovich, pp. 113±143.

ECONOMIST (1993) The Economist Style Guide. London: Hamish Hamilton.

GUILLEMETTE, R. A. (1989) The cloze procedure: assessing the understandability of an IEEE standard. IEEE

Transactions on Professional Communication, 32, 41±47.

HALLIDAY, M. A. K. (1989) Spoken and Written Language. (2nd edition). Oxford: Oxford University Press.

HARTLEY, J. (1994) Designing Instructional Text. (3rd edition). London: Kogan Page.

HUCKIN, T. N. (1983) A cognitive approach to readability. In Paul V. Anderson, R. John Brockman and

C. R. Miller (Eds) New Essays in Technical and Scientific Communication: research, theory, practice.

Farmingdale, NY: Baywood, pp. 90±108.

HUCKIN, T. N. and OLSEN, L. A. (1991) Technical Writing and Professional Communication for Nonnative

Speakers of English. New York: McGraw-Hill.

KLARE, G. R. and MACDONALD-ROSS, M. (1981) Practical Aspects of Readability. Milton Keynes, England:

Institute of Educational Technology, The Open University.

KLARE, G. R. (1963) Measurement of Readability. Ames, Iowa: Iowa State University Press.

NYSTRAND, M. (1979) Using readability research to investigate writing. Research in the Teaching of English,

13, 231±242.

QUIRK, R., GREENBAUM, S., LEECH, G. and SVARTVIK, J. (1989) A Comprehensive Grammar of the English

Language. (7th edition). Harlow, England: Longman.

SELZER, J. (1983) What constitutes a `readable' technical style? In P. V. Anderson, R. J. Brockmann and

C. R. Miller (Eds) New Essays in Technical and Scientific Communication: research, theory, practice.

Farmingdale, NY: Baywood, pp. 71±89.

STOKES, A. (1978) The reliability of readability formulae. Journal of Research in Reading, 1, 21±34.

TAYLOR, W. (1953) Cloze procedure: a new tool for measuring readability. Journalism Quarterly, 30,

415±433.

TURK, C. and KIRKMAN, J. (1989) Effective Writing: Improving scientific, technical and business

communication. (2nd edition) London: E. & F.N. Spon.

# United Kingdom Reading Association 1998

PROFESSIONAL READABILITY PREDICTORS 137

Page 18: Two new readability predictors for the professional writer: pilot trials

ULIJN, J. M. and STROTHER, J. B. (1995) Communicating in Business and Technology: from psycholinguistic

theory to international practice. Frankfurt: Peter Lang.

Address for correspondence: SANDRAHARRISON, Coventry School of Art and Design,Coventry University, Priory Street, Coventry, CV1 5FB, UK. Email: [email protected]

# United Kingdom Reading Association 1998

138 HARRISON AND BAKKER