14 Beyond Accuracy (1)

8/10/2019 14 Beyond Accuracy (1)

1/30

Beyond Accuracy: What Data Quality Means to Data Consumers

Author(s): Richard Y. Wang and Diane M. StrongSource: Journal of Management Information Systems, Vol. 12, No. 4 (Spring, 1996), pp. 5-33Published by: M.E. Sharpe, Inc.Stable URL: http://www.jstor.org/stable/40398176.

Accessed: 26/10/2013 17:54

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

M.E. Sharpe, Inc.is collaborating with JSTOR to digitize, preserve and extend access toJournal of

Management Information Systems.

http://www.jstor.org

This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/action/showPublisher?publisherCode=meshttp://www.jstor.org/stable/40398176?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/40398176?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=mes


2/30

Beyond

Accuracy:

What

Data

Quality

Means to

Data

Consumers

RICHARD

Y. WANG AND DIANE M.

STRONG

Richard

Y. Wang

is Associate

Professor f

Information

echnologies

(IT)

and

Co-Director

forTotal Data

Quality Management

TDQM)

at the MIT

Sloan School

of

Management,

wherehe received

a

Ph.D.

degree

with n

IT

concentration. e is a

major

proponent

f data

quality

research,

with

more than

twenty

apers

written o

develop

a

setof

concepts,

models,

nd methods or

his

field.

Professor

Wang

received

more than

one million dollars

of research

grants

from

both the

public

and

private

sector.His work on data qualitywas applied, bytheNavy,to theNaval Command,

Control,

ommunication,

omputers,

nd

Intelligence

C4I)

informationrchitecture.

He

presented

he state-of-the-art

f

data

quality

research nd

practice

n

the Chief

Information

fficer

CIO)

conference

n

1993,

and

spoke

on

"Data

Quality

in

the

Information

ighways"

t the

Enterprise

93 conference.

r.

Wang organized

he

first

Workshop

n

Information

echnologies

and

Systems

WITS)

in 1991. At WITS-94

he

was

elected

chairman

of the

Executive

Steering

Committee.

He also chaired

or

participated

n

the data

quality

panel

at

WITS and at

the nternational

onference

n

Information

ystems.

Dr.

Wang

is

the ditor

f

nformation

echnologies:

Trends

nd

Perspectives.

Diane M. Strong is AssistantProfessor fManagement tWorcesterPolytechnic

Institute.

he

received

her

Ph.D.

in information

ystems

from

Carnegie

Mellon

University,

n

M.S.

in

computer

cience

from

heNew

Jersey

nstitute f

Technology,

and

a

B.S.

in

computer

cience

from

he

University

f

South

Dakota.

Dr.

Strong's

research

enters

n data

and

nformation

uality

nd

software

uality.

Her

publications

have

appeared

n

A

CM

Transactions

n

nformation

ystems,

MIS

Quarterly,

ecision

Support

ystems,

nd

other

eading

ournals.

ABSTRACT:

oor data

quality

DQ)

can

have substantial

ocial

and economic

mpacts.

Although

irms re

improving

ata

quality

with

practical

pproaches

and

tools,

their

improvement

fforts end

to focus

narrowly

on

accuracy.

We believe

that data

consumershave a muchbroaderdataquality onceptualization han S professionals

realize.

The

purpose

of this

paper

s to

develop

a

framework

hat

aptures

he

aspects

of data

quality

hat re

important

o data

consumers.

A

two-stage

urvey

nd

a

two-phase

sorting

tudy

were conducted

to

develop

a

hierarchical

framework

or

organizing

data

quality

dimensions.

This framework

Acknowledgments'.

esearch

onducted

erein

as

been

upported,

n

part, y

MIT's

TotalData

Quality

Management

TDQM)

Research

rogram

nd

MIT's International

inancial

ervices

Research

enter

IFSRC).

The authors

ish o thank

isa Guarascio

or

herfieldwork

nd

factor

nalysis,

rofessors onald

Bailou,

zak

Benbasat,

nd Stuart

Madnick or

roviding

insightful

omments

nd

ncouragement,

rofessors

rance

eClerc nd

Paul

Berger

or

heir

input

n

the

design

nd

execution

f this

research,

nd

Daphne Png

for

conducting

he

confirmatoryxperimentnd ummarizingts esults.

Journal

fManagement

nformation

ystems

Spring

996,

Vol.

12,

No.

4,

pp.

5-34

Copyright

1996

M.E.

Sharpe,

nc.

http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp


3/30

6

RICHARD Y. WANG

AND DIANE M. STRONG

captures

dimensions f data

quality

hat

re

mportant

o data consumers.

ntrinsic

Q

denotes that data have

quality

in

their

own

right.

Contextual

DQ highlights

he

requirement

hat

ata

quality

must e

consideredwithin

he ontext

fthe ask

t

hand.

RepresentationalDQ and accessibilityDQ emphasizethe mportance fthe role of

systems.

These

findings

re consistent

withour

understanding

hat

high-quality

ata

should

be

intrinsically ood,

contextually

ppropriate

or he

ask,

learly

epresented,

and accessible

to the data consumer.

Our framework

has been

used

effectively

n

industry

nd

government.

Using

this

framework,

S

managers

were able

to better

understand

nd

meet

theirdata

consumers'

data

quality

needs.

The salient

feature

of

this

research

study

s

that

quality

attributes

of data

are collected

from

data

consumers

instead

of

being

defined

theoretically

r based

on researchers'

experience.

Although

exploratory,

this research

provides

a

basis

forfuture

tudies

that

measure

data

quality

along

the

dimensions

of

this framework.

Key

words and phrases: data

administration,

ata

quality,

atabase

systems.

Introduction

Many

databases

are not

error-free,

and

some

contain

a

surprisingly

arge

number

f errors

3,

4,

5,

21, 28,

34, 38,

40].

A recent

ndustry

eport,

or

xample,

notes

thatmorethan60

percent

f the

urveyed

irms

500

medium-size

orporations

withannual sales of

more than

$20

million)

have

problems

with

data

quality.1

Data

qualityproblems,however,go beyond accuracy to include otheraspects such as

completeness

and

accessibility.

A

big

New York

bank found

that

the data

in its

credit-risk

management

atabase

were

only

60

percent

omplete,

necessitating

ou-

ble-checking

y anyone

using

t.2

A

major

manufacturing

ompany

found hat

tcould

not access all sales data for

a

single

customer

because

many

different

ustomer

numbers

were

assigned

to

represent

he ame customer.

n

short,

oor

data

quality

an

have substantial

ocial and economic

impacts.

To

improve

data

quality,

we need to understand

what

data

quality

means to

data

consumers

those

who

use

data).

The

purpose

of this

research,

herefore,

s to

develop

a framework

hat

captures

the

aspects

of

data

quality

that are

important

o

data

consumers.

Related

Research

The

concept

of

"fitnessfor use" is

now

widely

adopted

in

the

quality

iterature.

t

emphasizes

the

importance

of

taking

a

consumer

viewpoint

of

quality

because

ultimately

t s

the

consumerwho will

udge

whether r not

product

s fit

or se

[13,

1

5, 22,

23].

In

this

research,

we

also

take theconsumer

iewpoint

f "fitness oruse"

in

conceptualizing

the

underlying

spects

of data

quality.

Following

this

general

quality literature,we define "data quality" as data that are fitfor use by data

consumers.

n

addition,

we

define

"data

quality

dimension"

s

a

set

of

data

quality

attributes

hat

epresent

single

aspect

or construct f data

quality.



4/30

BEYOND ACCURACY:

DATA

QUALITY

7

Three

approaches

re used

in the iterature

o

study

ata

quality:

1)

an

intuitive,

2)

a

theoretical,

nd

3)

an

empirical pproach.

The

intuitive

pproach

s taken

when

the

selection

of data

quality

ttributes

or

ny particular tudy

s

based on

the

researchers'

experience

r ntuitive

nderstanding

boutwhat ttributesre

"important."

Most data

quality

tudiesfall

nto his

ategory.

he

cumulative ffect f these tudies s a

small

set

of

data

quality

attributes hat

are

commonly

elected. For

example, many

data

quality

tudies

nclude

accuracy

as

either he

only

or one of several

key

dimensions

[4,

28,

32].

In the

accounting

nd

auditing

iterature,

eliability

s a

key

attribute

sed

in

studying

ata

quality

7,

11 2

1

24, 25,

49].

In the nformation

ystems

iterature,

nformationuality

and user

satisfaction

re

two

major

dimensions

or

valuating

he success of

nformation

ystems

12].

These

two dimensions

generally

nclude

some data

quality

attributes,

uch as

accuracy,

timeliness, recision,reliability, urrency, ompleteness, nd relevancy 2, 20, 27].

Other

attributes

uch as

accessibility

nd

interpretability

re also used in the data

quality

iterature

44,

45].

Most

of these studies

dentify

multiple

dimensionsof data

quality.

Furthermore,

although

hierarchical

iew of data

quality

s less

common,

t

s

reported

n several

studies

27,

34,

44].

None

of these

tudies,however,

mpirically

ollects data

quality

attributes

rom ata consumers.

A

theoretical

pproach

to data

quality

focuses on

how data

may

become deficient

during

he

data

manufacturing

rocess.

Although

heoretical

pproaches

are

often

recommended,

esearch

offers

ew

examples.

One such

study

uses an

ontological

approach nwhichattributesf dataquality re derivedon thebasis of data deficien-

cies,

which re defined

s the

nconsistencies

etween

heview of a real-world

ystem

that an

be inferred

rom

representing

nformation

ystem

nd the view that an be

obtained

by directly

bserving

he real-

world

ystem

42].

The

advantage

of

using

an

intuitive

pproach

is that each

study

can select

the

attributes

most relevant

to the

particular

goals

ofthat

study.

The

advantage

of

a

theoretical

pproach

is the

potential

o

provide

a

comprehensive

et of data

quality

attributes

hat are

intrinsic

o

a

data

product.

The

problem

with both of these

approaches

is

that

hey

focus

on

the

product

n

terms f

development

characteris-

tics instead

of use characteristics.

hey

fail to

capture

the voice of

the consumer.

Evaluations

of theoretical

pproaches

to

defining

product

ttributes s a basis for

improving

quality

find that

they

are not

an

adequate

basis

for

mprovingquality

and are

significantly

worse

than

empirical approaches.

To

capture

the data

quality

attributes

hat are

important

o

data

consumers,

we

take

an

empirical approach.

An

empirical

approach

to data

quality analyzes

data

collected

fromdata consumers

to

determine

he

characteristics

hey

use to assess

whether

ata

are fitforuse

in their asks.

Therefore,

hese

characteristics annot be

theoretically

etermined r

intuitively

elected

by

researchers. he

advantage

of an

empirical approach

is that

t

captures

the voice

of customers.

Furthermore,

t

may

revealcharacteristicshat esearchers ave notconsidered s part fdataquality.The

disadvantage

s

that he correctness

r

completeness

f the

results

annot be

proven

via

fundamental

rinciples.



5/30

8

RICHARD

Y.

WANG AND DIANE

M.

STRONG

Research

pproach

We follow

the methods

developed

in

marketing

esearch

for

determining

he

quality

characteristicsf

products.

Our

approach

mplicitly

ssumesthatdata can be treated

as

a

product.

t is

an

appropriate

pproach

because

an information

ystem

can

be

viewed

as a data

manufacturing

ystem cting

on

raw data

input

o

produce

output

data

or data

products

1,6,

16,

19, 35, 43,

46].

While

most data

consumers

re

not

purchasing

ata,

they

re

choosing

to use

or

not

to

use

data

in a

variety

f tasks.

Approaches

for

ssessing product

uality

ttributes

hat

re

mportant

o consumers

are

well established

n the

marketing

iscipline

8,

26].

Three

tasks

are

suggested

n

identifying

uality

ttributes

f

a

product:

1)

identifying

onsumer

needs,

2)

identi-

fying

he

hierarchical

tructure

f

consumer

needs,

and

3) measuring

he

mportance

of each consumerneed [17, 18].

Following

the

marketing

iterature,

his

research

dentifies

he

attributes

f

data

quality

that re

important

o

data consumers.

We

first ollect

data

quality

attributes

from

data

consumers,

nd

then collect

importance

atings

for

these

attributes

nd

structure

hem

nto

hierarchical

epresentation

f

data

consumers'

data

quality

needs.

Our

goal

is

to

develop

a

comprehensive,

hierarchical

framework

f

data

quality

attributes

hat re

important

o data

consumers.

Some

researchers

may

doubt

the

validity

of

asking

consumers

about

important

quality

ttributes

ecause

of

the well-known

ifficulties

ith

valuating

users'

satis-

faction

with

information

ystems

[30]. Importance

ratings

and

user

satisfaction,

however, re two differentonstructs. riffinndHuser 17],for xample,demonstr-

ate

that

determining

ttributes

f

importance

o

consumers,

ollecting

mportance

ratings

f these

attributes,

nd

measuring

ttribute

alues

are

valid

characterizations

of consumers'

actions

uch as

purchasing

he

product,

ut

atisfaction

atings

f these

attributes

re

uncorrelated

with onsumer

ctions.

Research

Method

We first

eveloped

two

surveys

hat

were

used to collect

data

fromdata

consumers

(referred

o as

the

two-stage urvey ater).

The first

urvey roduced

listof

possible

data

quality

ttributes,

ttributes

hat ame

to mind

when the data

consumer

hought

about data

quality.

The second

survey

ssessed the

mportance

f these

possible

data

quality

attributes

o

data consumers.

The

importance

atings

rom

he second

survey

were used in an

exploratory

actor

nalysis

o

yield

an intermediate

et of data

quality

dimensions hatwere

important

o

data consumers.

Because thedetailed

urveys roduced comprehensive

etof data

quality

ttributes

for

nput

o

factor

nalysis,

broad

spectrum

f ntermediate ata

quality

dimensions

were revealed. We conducted

follow-up mpirical

tudy

o

group

hese ntermediate

data

quality

dimensionsfor he

following

easons.

First,

t s

probably

not

critical

for

evaluationpurposesto considerso many qualitydimensions 27]. Second, although

these dimensions

can

be

ranked

by

the

importanceratings,

he

highest

ranking

dimensions

may

not

capture

he

essential

spects

of data

quality.

Third,

he nterme-



6/30


7/30

10

RICHARD

Y.

WANG

AND DIANE M. STRONG

we startwith

an

exploratory

pproach

that

ncludes not

only

the attributes

n

our

preliminary

ramework,

ut also the attributes

entioned

n

the iterature.

or exam-

ple,

our first

uestionnaire

tartswith ome attributes

timeliness

nd

availability)

hat

are not

part

of this

preliminary

ramework.

The

Two-Stage urvey

The

purpose of this two-stage survey

is

to identify data

quality

dimensions

perceived by

data consumers.

n the

following,

we

summarize

he method

nd

key

results f these

surveys.

The reader

s

referred

o

[47]

for

more detailed

results.

Method

The

method

for

the

two-stage

urvey

was

as follows:

For

stage

1,

we conducted

a

survey

to

generate

a

list

of data

quality

attributes

hat

capture

data

consumers'

perspectives

f data

quality.

For

stage

2,

we conducted

survey

o collect

data

on the

importance

of each

of

these

attributes

o

data

consumers,

nd

then

performed

n

exploratory

actor

nalysis

on

the

mportance

ata

to

develop

an

intermediate

et

of

data

quality

dimensions.3

The First

Survey

Thepurposeof hefirsturveywastogenerate n extensive ist fpotential ataquality

attributes.

ince

thedata

quality

dimensions

esulting

rom

actor

nalysis

depend,

o

a

large

extent,

n the

istof attributes

enerated

rom

he

first

urvey,

we decided

that

(1)

the

subjects

should

be data

consumers

who

have

used data

to make

decisions

in

diverse

ontexts

within

rganizations,

nd

2)

we should

be

able to

probe

nd

question

the

subjects

n

order

o

fully

nderstand

heir

nswers.

Subjects:

Two

pools

of

subjects

were

selected.

The

first onsisted

of 25

data

consumers

urrently orking

n

industry.

he second

was M.B.A.

students

t a

large

U.S.

university.

We selected

1

12

students

who

had

work

experience

s data

consum-

ers.

The

averageage

of

these students

was over 30.

Survey

nstrument:

he

survey

nstrument

see

appendix

A)

included

two sections

for

liciting

ata

quality

ttributes.

he first ection licited

respondents'

irst eaction

to

data

quality

by asking

hem o ist hose

ttributes

hat irst ame

to mind

when

they

thought

of

data

quality (beyond

the common attributes

f

timeliness,

ccuracy,

availability,

nd

interpretability).

he

second

section

provided

furtherues

by listing

32 attributes

eyond

hefour ommonones to

"spark"

ny

dditional

ttributes.

hese

32

attributes

ere

obtained

from

data

quality

iterature nd discussions

among

data

quality

researchers.

Procedure:

For the

elected M.B.A.

students,

he

urvey

was

self-administered.or

thesubjectsworking n industry,headministrationf thesurveywas followedbya

discussion of the

meanings

of

the

attributes

he

subjectsgenerated.

Results: This

process

resulted

n

179

attributes,

s shown

n

figure



8/30

BEYOND

ACCURACY:

DATA

QUALITY

1 1

Ability

o

be

Ability

o

Download

Ability

o

Identify Ability

o

Upload

Joined With

Errors

Acceptability

Access

by Accessibility

Accuracy

Competition

Adaptability

Adequate

Detail

Adequate

Volume Aestheticism

Age

Aggregatability

Alterabi

ty

Amountof Data

Auditable

Authority

Availability Believability

Breadth

of Data

Brevity

Certified ata

Clarity

Clarity

f

Origin

Clear Data

Compactness Compatibility

Responsibility

Competitive

Edge

Completeness

Comprehensiveness Compressibility

Concise

Conciseness

Confidentiality Conformity

Consistency

Content

Context

Continuity

Convenience

Correctness

Corruption

Cost

Cost

of

Accuracy

Cost

of Collection

Creativity

Critical

Current

Customizability

Data

Hierarchy

Data

Improves

Efficiency

Data Overload

Definability

Dependability

Depth

of Data

Detail

Detailed

Source

Dispersed

Distinguishable

Updated

Files

Dynamic

Ease

of Access

Ease of

Comparison

Ease

of

Correlation

Ease

of Data

Ease

of Maintenance

Ease of

Retrieval Ease of

Exchange

Understanding

Ease

of

Update

Ease

of Use

Easy

to

Change Easy

to

Question

Efficiency

Endurance

Enlightening Ergonomie

Error-Free

Expandability

Expense Extendibility

Extensibility

Extent

Finalization

Flawlessness

Flexibility

Form

of Presentation

Format

Integrity

Friendliness Generality Habit Historical

Compatibility

Importance

Inconsistencies

Integration Integrity

Interactive

Interesting

Level of Abstraction Level of

Standardization

Localized

Logically

Connected

Manageability

Manipulate

Measurable

Medium

Meets

Requirements

Minimality

Modularity

Narrowly

efined No lost

information

Normality

Novelty

Objectivity

Optimality

Orderliness

Origin

Parsimony Partitionability

Past

Experience

Pedigree

Personalized

Pertinent

Portability

Preciseness

Precision

Proprietary

ature

Purpose

Quantity Rationality Redundancy Regularityf Format

Relevance

Reliability

Repetitive

Reproducibility

Reputation

Resolution

of

GraphicsResponsibility

Retrievability

Revealing

Reviewability Rigidity

Robustness

Scope

of Info

Secrecy Security Self-Correcting

Semantic

Semantics Size Source

Interpretation

Specificity

Speed

Stability Storage

Synchronization

Time-independence

Timeliness

Traceable

Translatable

Transportability

Unambiguity

Unbiased

Understandable

Uniqueness

Unorganized Up-to-Date

Usable Usefulness

User

Friendly

Valid

Value

Variability

Variety

Verifiable

Volatility Well-Documented Well-Presented

Figure

1.

Data

Quality

Attributes

enerated

from he

First

urvey



9/30

12

RICHARD Y.

WANG AND DIANE M. STRONG

The

Second

Survey

The

purpose

of the

second

survey

was

to

collect data

about

the

mportance

f

quality

attributes s perceived by data consumers.The resultsof the second surveywere

ratings

f

the

mportance

fthedata

quality

ttributes.

hese

importance

atings

were

the

nput

or n

exploratory

actor

nalysis

to consolidate

hese attributes

nto set

of

data

quality

dimensions.

Subjects:

Since we needed a

sample

consisting

f a wide

range

of

data

consumers

withdifferent

erspectives,

we

selected

the alumni

of

the M.B.A.

program

f

a

large

university

ho reside

n

the United

States.These

alumni onsisted

of ndividuals

n

a

variety

f

ndustries,

epartments,

nd

management

evels

who

regularly

sed

data

to

make

decisions,

thus

satisfying

he

requirement

or data consumers

with

diverse

perspectives.

Fromover

3,200 alumni,

we

randomly

elected

1,500subjects.

Survey

nstrument:

he list of attributes

hown

n

figure

was

used to

develop

the

second

survey uestionnaire

see

appendix

B).

The

questionnaire

sked

the

respondent

to rate he

mportance

f

each

data

quality

ttribute

or heir

ata on

a scale from to

9,

where

1

was

extremely mportant

nd

9 not

important.

he

questionnaire

was

divided ntofour

ections,

depending

n

the

ppropriate

wording

f

the

ttributes,

or

example,

as stand-alone

djectives

or

as

complete

entences.

ince we

did not nclude

definitionswith the

attributes,

t is

possible

that

data consumers

responding

o the

surveys

ould

interpret

he

meanings

of the

attributes

ifferently.

ttributes

hat re

not

mportant

r that re not

consistentlynterpreted

cross

data consumers

will

not

show up as significantnthe factor nalysis.

A

pretest

fthe

questionnaire

was

administered

o fifteen

espondents:

ive

ndustry

executives,

ix

professionals,

wo

professors,

nd

two M.B.A. students.

Minor

hanges

were

made

in

the format f the

survey

s a result f the

pretest.

ased on

the results

from he

pretest,

he

final second

survey

questionnaire

ncluded 118 data

quality

attributes

i.e.,

1

1

8 itemsfor

actor

nalysis)

to be rated or heir

mportance,

s shown

in

appendix

B.

Procedure:

This

survey

was mailed

along

with

cover etter

xplaining

he nature

of

the

study,

he

time to

complete

the

survey

less

than

twenty

minutes),

nd its

criticality.

Most of

the lumni

ddresses

were home

addresses.

To

assure

a

successful

survey,

we sent he

survey

uestionnaires

ia first-classmail. We

gave respondents

six-

week cut-off

eriod

to

respond

o

the

survey.

Response

Rate: Of the 1

500

surveys

mailed,

ixteen

were

returneds

undeliverable.

Of the

remaining

,484,

355

viable

surveys

an

effective

esponse

rateof

24

percent)

were

returned

y

the

six-week

deadline.4

Missing Responses:

While

none of

the

1

18

attributes

items)

had

355

responses,

none

had fewer

han

329

responses.

There did not

ppear

to

be

any

significant

attern

to

the

missing

responses.

Results:

Descriptive

tatistics f

the 1 1

items

attributes)

re

presented

n

appendix

C. Most of the 118 itemshad a fullrangeof values from1 to 9, where 1 means

extremely

mportant

nd 9

not

mportant.

he

exceptions

were

accuracy,

reliability,

level

of

detail,

and

easy

identification

f errors.

Accuracy

and

reliability

ad the



10/30

BEYOND

ACCURACY: DATA

QUALITY

13

smallest

range,

withvalues

ranging

rom

to

7;

level of detail

nd

easy

identification

of errors

anged

from

1

to 8.

Ninety-nine

f the 1

18 items

85

percent)

had

means

less

than or

equal

to

5;

that

s,

most

of

the items

surveyed

were

considered to be

important

ata

quality

ttributes.wo items

accuracy

and correct had means ess

than

2 and thus

were

overallthemost

mportant

ata

quality

ttributes,

ith

means

of

1 77

1

and

1.816,

respectively.

n

exploratory

actor

nalysis

ofthe

mportance atings

produced

twenty

dimensions,

as shown

in Table

1. As

mentioned

earlier,

more

detailed

results an

be

found

n

[47].

Factor

Interpretation:

actor

analysis

is

appropriate

for this

study

because

its

primary

pplication

s

to uncover

n

underlying

ata structure

10, 37].

An alternative

method

would

be to ask

subjects

to

group

he attributes

nto

common

dimensions,

s

we did

in the second

phase

of

thisresearch.

Grouping

asks

provide

more assurance

that he factors re interprtable.While bothfactor nalysisand grouping asks are

used

in the

iteratureo

uncover

dimensions,

rouping

asksbecome

impractical

when

the

number

f tems

ncreases.

Furthermore,

actor

nalysis

can

uncoverdimensions

that

re not

obvious

to researchers.

A

potential

disadvantage

of factor

nalysis

in

this research

s that ttributes

with

nothing

n

common

could

load

on the same

factorbecause

they

have the same

importance

atings.

his

would

ead to

problems

n

nterpreting

he

factors.

mportance

ratings

ollected

from

sample

of 355 data

consumers

with diverse

backgrounds,

however,

will be

similar

nly

f data consumers

perceive

these

tems

onsistently.

f

these

items

form different

onstructs,

ata consumers

will

rate

their

mportance

differently,esultingndifferentactors. urthermore,hetwenty imensions how

face

validity

ecause

it was

easy

for

us

to name

these factors.

Factor

Stability.

Since

the

number

f

survey

responses

relativeto the

number

of

attributes

s

lower than

recommended,

t is

possible

that

he factor tructure

s not

sufficiently

table.

To testfactor

tability,

we reran

he

analysis using

two different

approaches.

First,

n

a

series of

factor

nalysis

runs,

we varied

the number

f

factors

to

test

whether he attributes

oading

on those

factors

hanged

as the number

of

factors

changed.

Second,

we

ran the factor

analysis

using

as

input only

those

attributes

hat

ctually

oaded

on the

twenty

imensions

o

test

whether he

nsignif-

icantattributes

ffected

he results.

In the first

pproach,

our

analysis

of

factor

tability

ound hatfourteen

ut of the

twenty

imensions

were stable across

the eries

of runs.That

s,

the ame dimensions

consisting

f

the same

attributes

merged

see

Table

1).

Two

dimensions,

15

ease

of

operation)

nd

20

(flexibility),

ere stable

n terms f the attributes

oading

on

them,

but

were

combined

nto a

single

dimension

n runsfixed at

fewer

dimensions.

Four

dimensions,5,

9, 16,

and

19,

which are all

single-attribute

actors,

have less than

desirable

stability.Specifically,

n

some

runs,

these dimensions either were not

significant

r

they

were

combined

with nother

ingle-attribute

imension.

Inthe econd

pproach,

he actor

nalysis

sedthe

1

attributes

hown

n

Table

1

which

meets esponses-to-attributeatio ecommendations.he secondapproachproduced he

same results

s the irst.

hus,

we

conclude hat hese

imensions re table

with he aveat

that

dditional

esearch

s needed o

verify

most f

the

ingle-attribute

imensions.



11/30

14

RICHARD Y. WANG

AND DIANE

M. STRONG

Table

1

Description

f he

Dimensions

Dim.

Nameof

dimension

Mean

S.D.

C.I.

Cronbacha

(attributeist)

1

Believability

believable)

271

2.51-2.91

N/

2 Value-added

data

giveyou

2.83

0.09

2.65-3.01

0.70

a

competitivedge,

data add

value to

your

perations)

3

Relevancy

applicable,

2.95

0.06

2.82-3.08

0.69

relevant,

nteresting,

sable)

4

Accuracy

data

are certified

3.05

0.10

2.86-3.24

0.87

error-free,

ccurate,

correct, lawless,

reliable,

errorsan be easily

identified,

he

ntegrity

f he

data,

precise)

5

Interpretability

interprtable)

3.20

0.09

3.03-3.37

N/A

6

Ease

of

understanding

3.22

0.07

3.07-3.37

0.79

(easily

understood,

lear,

readable)

7

Accessibility

accessible,

3.47

0.08

3.32-3.62

0.81

retrievable,

peed

of

access,

available,

up-to-

date)

8

Objectivity

unbiased,

3.58

0.09

3.40-3.76

0.73

objective)

9

Timeliness

age

of

data)

3.64

0.11

3.43-3.85

N/A

10

Completeness

breadth,

3.88

0.09

3.74-4.06

0.98

depth,

nd

scope

of

information

ontained

n he

data)

11

Traceability(well-

3.97

0.09

3.7^^.14

0.79

documented,

asily

raced,

verifiable)

12

Reputation

reputation

f he

4.04

0.10

3.83-4.25 0.87

data

source,

reputation

f

the

data)

13

Representational

4.22

0.09

4.04-4.39

0.84

consistency

data

are

continuously

resented

n

same

format,

onsistently

represented,

onsistently

formatted,

ata are

compatible

ith

revious

data)

14 Cost-effectivenesscostof 4.25 0.10 4.05-4.44 0.85

data

accuracy,

ost

ofdata

collection,

ost-effective)



12/30

BEYOND

ACCURACY:

DATA

QUALITY

15

Table

1.

Continued

Dim. Name of dimension

Mean

S.D.

C.I.

Cronbacha

(attributeist)

15 Ease

of

operation

easily

4.28 0.08

4.13-4.44 0.90

joined,

easily

changed,

easily

updated,

easily

downloaded/uploaded,

data

can be used

for

multiple

purposes,

manipulate,

easily aggregated,

easily

reproduced,

data can be

easily

integrated, easily

customized)

16 Variety f data and data 4.71 0.12 4.48-4.95 N/A

sources

(you

have

a

variety

of

data and

data

sources)

17 Concise

(well-presented,

4.75

0.08 4.5^-4.92

0.92

concise,

compactly

represented,

well-organized,

aesthetically

pleasing,

form

of

presentation,

well-

formatted,

ormat

f the

data)

18

Access

security

data

cannot

4.92

0.11

4.70-5.14 0.84

be accessed

by

competitors,

data are

of a

proprietary

nature,

access

to

data

can

be

restricted,

ecure)

19

Appropriate

mount

of data

5.01

0.11 4.79-5.23

N/A

(the

amount

of

data)

20

Flexibility

adaptable,

5.34

0.09

5.17-5.51

0.88

flexible,

extendable,

expandable)

A dimension

mean was

computed

s

the

average

of

the

responses

o

all of the tems

with

loading

of 0.5

or

greater

n

thedimension.

or

example,

thedimension

ase of

understandingonsistedof the three tems:easily understood, eadable, and clear.

The mean

importance

or ase

of

understanding

as

the

average

of

the

mportance

ratings

or

asily

understood,

eadable,

nd clear.

See

Table

1 for he

means,

tandard

deviations,

nd confidence

ntervals.)

Cronbach's

alpha,

a measure

of construct

eliability,

as

computed

or ach dimen-

sion

to assess

the

reliability

f the set of

items

forming

hatdimension.

As shown

n

the

rightmost

olumn

ofTable

1

these

lpha

coefficients

anged

rom

.69 to 0.98.

As

a

rule,

alphas

of 0.70

or above

represent

atisfactory eliability

f

the set of items

measuring

he construct

dimension).

Thus,

the

tems

measuring

ur dimensions re

sufficiently

eliable.



13/30

16

RICHARD Y. WANG AND

DIANE

M.

STRONG

Table

2.

Four

Target

Categories

for

he 20

Dimensions

Targetcategory

Dimension

Adjustment

Accuracy

ofdata

Believability

None

Accuracy

None

Objectivity

None

Completeness

Moved

to

category

2

Traceability

Eliminated

Reputation

None

Variety

f

Data Sources

Eliminated

Relevancy

of data

Value-added

None

Relevancy

None

Timeliness

None

Ease of

operation

Eliminated

Appropriate mountof data None

Flexibility

Eliminated

Representation

of data

Interpretability

None

Ease

of

understanding

None

Representational

consistency

None

Concise

representation

None

Accessibility

f data

Accessibility

None

Cost-effectiveness

Eliminated

Access

security

None

Note:A target ategorys a hypothesizedategory asedon ourpreliminaryonceptual

framework.

The

Two-Phase

orting

tudy

Twenty dimensions

were

too many

for practical

evaluation

purposes.

In

addition,

although

these

dimensions

were ranked

by

the

importance

ratings,

he

highest-ranking

imensions

might

not

capture

he essential

aspects

of data

quality.

Finally,

grouping

f

these

dimensions

was consistent

with esearch

n

the

marketing

discipline,

nd substantiated

hierarchical tructure

f data

quality

dimensions.

Using ourpreliminary onceptualframework, e conducted a two-phasesorting

study.

The

first

hase

of the

study

was

to sort hese

ntermediate

imensions

nto a

small

set of

categories.

The

second

phase

was

to confirm hat

hesedimensions

ndeed

belonged

to

the

categories

n our

preliminaryonceptual

framework.

Method

We first

reatedfour

ategories

see

column

1 of Table

2)

based on

our

preliminary

conceptual

framework,

ollowing

Moore and Benbasat

[31].

We then

grouped

he 20

intermediate imensions nto thesefour ategories see column2 of Table 2). Our

initial

grouping

was

based on our

understanding

f these

categories

nd dimensions.

The

sorting tudy

rovided

he

datato

test his nitial

rouping

nd to make

adjustments



14/30

BEYOND

ACCURACY: DATA

QUALITY

17

in the

ssignment

f dimensions

o

target

ategories

see

column 3 of Table

2),

which

will be

further

iscussed.

The

Sorting tudy:

Phase

1

Subjects:

Thirty

ubjects

from

ndustry

were selected

to

participate

n

the overall

sorting rocedure.

hese

subjects

were enrolled

n

an

evening

M.B. A. class

in

another

large

university. ighteen

f these 30

subjects

were

randomly

elected

to

participate

in the first

hase.

Design:

Each

of the

20

dimensions,

long

with a

description,

was

printed

n a

3x5-inch

card,

as shown

in

appendix

Dl. These

cards were used

by

each of

the

subjects

in the

study

o

group

the 20 dimensions

nto a small

set of

categories.

In

contrast o phase 2, the subjects forphase 1 were not given a prespecified et of

categories

each

with

a name

and

description.

The

study

was

pretestedby

two

graduate-level

MIS students

o

clarify ny

ambiguity

n the

design

or instruction.

Procedure:

The

study

was run

by

a third

arty

who

was not

ware

of the

goal

of

this

research,

n order

o avoid

any

bias

by

the

uthors.

efore

performing

he ctual

sorting

task,

ubjects

performed

trial ort

using

dimensions

ther han hese 20 dimensions

to

ensure hat

hey

understood

he

procedure.

n

the

ctual

sorting

ask,

ubjects

were

given

nstructionso

group

he20 cards

nto hree

o five

piles.

The

subjects

were

then

asked

to

label each

of

their

iles.

The

Sorting

tudy:

Phase

2

The

original ssignment

f dimensions

o

categories

was

adjusted

based on the

results

from

he

phase

1

study.

For

example,

as shown

n column

3 of Table

2,

completeness

is moved

from

he

accuracy category

o the

relevancy ategory

because

only

four

subjects

assigned

this

dimension

o the former

ategory,

whereas twelve

assigned

it

to

the latter.

This

was a reasonable

adjustment

because

completeness

could be

interpreted

ithin the

context

of the data

consumer's

task instead of our

initial

interpretation

hat

ompleteness

was

part

f

the

accuracycategory.

In addition, ive dimensionswere eliminated: raceability, ariety f data sources,

ease

of

operation,

lexibility,

nd

cost-effectiveness.

hese dimensions

were elimi-

nated for oth

of the

following

wo reasons:

First,

ubjects

did

not

consistently

ssign

the dimension

o

any

category.

For

example,

seven

subjects assigned

cost-effective-

ness to the

relevancy

ategory,

hree

ssigned

t o the

other hree

ategories,

nd

eight

assigned

it

to a self-defined

ategory.

econd,

the

dimensionwas not

ranked

highly

in

terms

f

mportance.

or

example,

cost-effectiveness

as ranked

14

out of 20.

The

purpose

of the

second

phase

of

the

sorting tudy

was to

confirm

hat the

dimensions

ndeed

belonged

to

these

adjusted

categories.

Subjects:

The

remaining

welve

subjects

from

ur

subject

pool participated

n this

phase ofstudy.

Design:

For each

category

of

dimensions

revealed from

phase

1,

the authors

provided

a

label,

as

shown

in

appendix

D2,

based

on

the

underlying

imensions.



15/30

18

RICHARD

Y. WANG AND DIANE

M. STRONG

Descriptive

phrases,

rather

han

ingle

words,

were used as labels

to avoid confound-

ing

category

abels with

ny

of the dimension abels.

Procedure:

The

third

arty

hat an

he

phase

1

study

lso ran

he

phase

2

study.

he

procedure

for

phase

2

was

similar o

that f

phase

1 with he

exception

hat

ubjects

were instructed

o

place

each of

the

dimension

cards

into the

category

that

best

represents

hatdimension.

Findings

In

this

ection,

we

present

heresults

rom he

wo-phase

tudy.

We

using

he

djusted

target ategories

o tabulate

he results rom

he

phase

1

study.

As

shown

n Table

3,

theoverall

placement

atio f

dimensionswithin

arget

ategories

was 70

percent.

his

indicatedthatthese 15 dimensionswere generallybeing placed in theappropriate

categories.

These

results,

ogether

with

he

adjustment

f dimensions

within

he

target

atego-

ries,

ed

us to refine he four

arget

ategories

s

follows:

1 The

extent

o

which data

values are

in

conformance

with he actual

or

true

values;

2.

The

extent o

whichdata are

applicable

pertinent)

o the

askof

hedata

user;

3. The extent

o which data

are

presented

n

an

intelligible

nd

clear

manner;

and

4. The extent o whichdata areavailable orobtainable.

These four

descriptions

were used

as the

category

abels

for he

phase

2

study.

The

results

from

he

phase

2

study

Table 4)

showed

that

he overall

placement

ratio

of

dimensionswithin

arget ategories

was 8

1

percent.

Toward

Hierarchical

rameworkf

Data

Quality

In

our sorting

study,

we

labeled each category on the

basis

of our

preliminary

conceptual

framework

nd our initial

grouping

f the dimensions.For

example,

we

labeled

as

accuracy

the

ategory

hat ncludes

ccuracy,objectivity,elievability,

nd

reputation.

imilarly,

we labeled thethree ther

ategories

s

relevancy, epresenta-

tion,

nd

accessibility.

We

used such a

labeling

so thatwe would not ntroduce

ny

additional

nterpretations

r

biases into

he

sorting

asks.

However,

such

representative

abels did not

necessarily

apture

he

essence

of the

underlying

imensions

s

a

group.

For

example,

as a

whole,

the

group

of dimensions

labeled

accuracy

was richer han

that

conveyed

by

the label

accuracy.

Thus,

we

reexamined he

underlying

imensions

onfirmed or ach of

the four

ategories

nd

picked

a

label

that

captured

the

essence of

the

entire

ategory.

For

example,

we

relabeled

accuracy

as

intrinsic

Q

because the

underlying

imensions

aptured

he

intrinsic spectofdataquality.

As

a result

of

this

reexamination,

we

relabeled

two of

the four

categories.

The

resulting

ategories,

herefore,

re:

ntrinsic

Q,

contextual

Q,

representational

Q,



16/30

BEYOND

ACCURACY: DATA

QUALITY

19

Table

3.

Results rom

hePhase 1

Study

15

Dimensionsnd 18

Subjects)

Actual

ategories

Target Accuracy elevancy epresen-Access- N/A Total Target

categories

tation

ibility

(%)

Accuracy

57

10

2

1

2

72

79

Relevancy

16

56

11 2

5

90

62

Repres.

8

4

50

5 5 72

69

Access.

1

2

3

26

4

36

72

Total

item

placements:

270

Hits:

189

Overall hitsratio:70%

Note:

A

target

ategory

s a

hypothesized

ategory

ased

on our

preliminaryonceptual

ramework.

An actual

ategory

s

the

ategory

elected

y

the

ubjects

or dimension.

N/A"

denotes Not

Applicable,"

whichmeans

hat he

ctual

ategory

oes

notfit nto

ny

arget ategory.

Table

4.

Results

rom

he hase

2

Study

15

Dimensionsnd

12

Subjects)

Actual

ategories

Target

Accuracy

Relevancy

Represen-

Access-

Total

Target

%)

categories tation ibility

Accuracy

43

3

1

0 48 90

Relevancy

7

44

3

6 60

73

Repres.

2

6

40

0 48

83

Access.

1

1

3

19

24

79

Total item

placements:

180

Hits:

146

Overall

hits

ratio:81%

and

ccessibility

Q

(see

figure

).

Intrinsic

Q

denotes

hat ata

have

uality

n

heir

own

right.

ccuracy

s

merely

ne

ofthefour imensions

nderlying

his

ategory.

Contextual

Q

highlights

he

equirement

hat ata

uality

must

e

considered ithin

the ontext

f the

ask t

hand;

hat

s,

datamust e

relevant,

imely,omplete,

nd

appropriate

nterms f mount

o

as to

add

value.

Representational

Q

and ccessi-

bility

Q

emphasize

he

mportance

fthe ole

of

systems;

hat

s,

the

ystem

must

be

accessible

ut

ecure,

nd he

ystem

must

resent

ata

n

uch

way

hat

hey

re

interprtable,

asy

o

understand,

nd

represented

oncisely

nd

consistently.

Thishierarchicalrameworkonfirmsnd ubstantiateshe reliminaryramework

that

we

proposed.

elow

we elaborate

n these

our

ategories,

elate

hem o the

literature,

nd

discuss omefutureesearch irections.



17/30

20

RICHARD Y. WANG

AND DIANE

M. STRONG

Data

uality

I

"

'

I

Intrinsic Contextual

Representational Accessibility

Data

uality

Data

uality

Data

uality

Data

uality

Believability(i)

^j

Data

uality

||

Value-added

^j

Data

uality

(2)

I

j

|_g,

I

Data

uality

Interpretability

Lgl

(5)

Data

uality

I

Believability(i)

||

Value-added

2)

j

I

Interpretability

5)

Accessibility7)

I

Accuracy

4)

Relevancy3) Ease

f

nderstanding

6)

Access

ecurity18)1

Objectivity

8)

Timeliness

9)

Representational

onsistency

13)

l^- - ^-J

Reputation

12)

Completeness

10)

Concise

epresentation

LaMaaMj

Appropriate

mount

f ata

1

)

^^hmmmmJ

Figure

. A

Conceptual

ramework

f Data

Quality

Intrinsic

ata

Quality

Intrinsic

DQ

includes not

only

accuracy

and

objectivity,

which

are

evident

to IS

professionals,

ut lso

believability

nd

reputation.

his

suggests

hat,

ontrary

o

the

traditional

evelopment

iew,

data consumers

lso

view

believability

nd

reputation

as an

integral art

of intrinsic

Q;

accuracy

and

objectivity

lone

are

not sufficient

fordatato be considered fhighquality.This is analogoustosomeaspectsofproduct

quality.

n the

product

uality

rea,

dimensions

f

quality mphasized

by

consumers

are broader han

hose

emphasized

by product

manufacturers.

imilarly,

ntrinsic

Q

encompasses

more han

he

ccuracy

nd

objectivity

imensions

hat S

professionals

strive o deliver. This

finding mplies

that S

professionals

hould

also

ensure the

believability

nd

reputation

f data. Research

on data source

tagging

45,

48]

is a

step

in this

direction.

Contextual ata

Quality

Some

individualdimensions

nderlying

ontextual

Q

were

reported reviously;

or

example,

completeness

nd

timeliness

4].

However,

ontextual

Q

was

not

xplicitly

recognized

n

the

data

quality

iterature. ur

grouping

f dimensions

for

ontextual

DQ

revealed that

data

quality

must

be consideredwithin he

context

of the task at

hand.

This was

consistentwith

he

iterature

n

graphical

data

representation,

hich

concluded that he

quality

of

a

graphicalrepresentation

ust

be

assessed

within he

context f

the

data

consumer's task

41].

Since

tasks and

their ontexts

vary

cross time nd

data

consumers,

ttaining igh

contextual

data

quality

s

a

research

hallenge

[29,

39].

One

approach

s to

parame-

terizecontextualdimensionsfor ach task so that dataconsumer an specifywhat

type

f

task s

beingperformed

nd

the

ppropriate

ontextual

arameters

or

hat ask.

Below

we

illustrate uch

a

research

rototype.



18/30

BEYOND ACCURACY:

DATA

QUALITY

21

During

Desert

Storm combat

operations

n

the Persian

Gulf,

naval researchers

recognized

heneed

to

explicitly

ncorporate

ontextual

Q

into

nformation

ystems

in order

o deliver more

timely

nd accurate

nformation.

s a

result,

prototype

s

being

developed

that

will be

deployed

o theU.S. aircraftarriers s

stand-alone

mage

exploitation

ools

[33].

This

prototype

arameterizes

ontextual

imensions

for ach

task

so that

pilot

or a strike

lanner

an

specify

what

type

of task

e.g.,

strike

lan

or

damage

assessment)

s

being performed

nd

the

ppropriate

ontextual

arameters

(relevant mages

in terms

f

location,

urrency,

esolution,

nd

target

ype)

for hat

task.

Representational

ata

Quality

Representational Q includesaspects related o the format f the data {concise and

consistent

epresentation)

nd

meaning

of

data

(interpretability

nd ease

of

under-

standing).

These two

aspects

suggest

hat

or

ata consumers

o

conclude that ata are

well

represented,

hey

mustnot

only

be concise and

consistently epresented,

ut

also

interprtable

nd

easy

to

understand.

Issues

related o

meaning

and format rise

in

database

systems

research

n which

format

s addressed s

part

f

syntax,

nd

meaning

s

part

f semantic

econciliation.

One

focusof current

esearch

n

that

rea

s

context

nterchangemong

heterogeneous

database

systems

36].

For

example, urrency igures

n the

ontext f a U.S.

database

are

typically

n

dollars,

whereas those

n

a

Japanese

database are

likely

o

be

in

yen.

This typeof contextbelongs to therepresentational Q, instead ofcontextualDQ,

which

deals

with he

data

consumer's task.

Accessibility

ata

Quality

Information

ystems

professionals

understand

ccessibility

DQ

well. Our research

findings

how

that

ata

consumers lso

recognize

ts

mportance.

ur

findings

ppear

to differ

rom he

iteraturehat

reats

ccessibility

s

distinct rom

nformation

uality

(see,

e.g.,

9]).

A

closer examination eveals hat

ccessibility

s

presumed

i.e.,

perfect

accessibilityDQ) in earlier nformationuality iterature ecause hard-copy eports

were used insteadof on-linedata.

In

contrast,

ata

consumers

n

our

research

ccess

computers

fortheir nformation

eeds,

and

therefore,

iew

accessibility

DQ

as an

important

ata

quality

aspect.

However,

there

s

little

difference

etween

treating

accessibility

DQ

as a

category

of

overall

data

quality,

or

separating

t

from

other

categories

of data

quality.

n either

ase,

accessibility

needs to

be taken

nto

ccount.

Summary

nd

Conclusions

TO

IMPROVE

DATA

QUALITY,

WE

NEED TO

UNDERSTAND

WHAT DATA

QUALITY

means

to data consumers (those who use data). This researchdevelops a hierarchical

frameworkhat

aptures

he

spects

of

data

quality

hat re

mportant

o

data

consum-

ers.

Specifically,

118

data

quality

attributes ollected

from

data

consumers

are



19/30


20/30

BEYOND

ACCURACY: ATA

QUALITY

23

quality

ategories

nd

their

nderlying

imensions

n

thisframework ould

provide

the constructs

o

be measured.

Second,

methodsfor

mproving

he

quality

of

data

as

perceivedby

data

consumers

ould be

developed.

Such methods ould include user

training

o

change

the

quality

of data

as

perceived by

data consumers.

Third,

the

framework

ould

be useful

s

a checklist

during

ata

requirements

nalysis.

That

is,

many

of the data

quality

characteristics

re

actually

system requirements

r user

training

equirements.

inally,

since a

single empirical

study

s never

sufficient

o

validate

the

completeness

of

a

framework,

urther

esearch

s

needed to

apply

this

framework

n

specific

workcontexts.

NOTES

1

Computerworld,eptember8, 1992, .

80-84.

2. TheWall treet ournal,

ay

26, 1992,p.B6.

3.

We

refer

o the

haracteristics

fdata

quality

s data

quality

ttributes

r as measure-

ment

tems

o

distinguish

hem

rom

ata

quality

imensions

hich

esult

rom he

factor

analysis

hroughout

his

ection.

4.

Surveys

with

significant

missing

values

(nine

surveys)

r

surveys

eturned

y

academics

fourteen urveys)

were

not considered

iable

and

therefore

ere

eliminated

from

ur

nalysis.

REFERENCES

1.

Arnold,

.E.

Information

anufacturing:

he

oad

o database

uality.

atabase,

15,

5

(1992),32.

2.

Bailey,

J.E.,

nd

Pearson,

.W.

Development

f a tool

for

measuring

nd

analyzing

computer

ser

atisfaction.

anagement

cience,

9,

5

(1

983),

530-545.

3.

Bailou,

D.P.,

and

Pazer,

H.L.

Designing

nformation

ystems

o

optimize

he

ccuracy-

timeliness

radeoff.

nformation

ystems

esearch

ISR),

6,

1

1995),

51-72.

4.

Bailou,

D.P.,

and

Pazer,

H.L.

Modeling

ata

nd

process

uality

n

multi-input,

ulti-

output

nformation

ystems.

anagement

cience,

1,2

(1985),

150-162.

5.

Bailou,

D.P.,

and

Tayi,

K.G.

Methodology

or

llocating

esources

ordata

quality

enhancement.

ommunications

f

he

ACM,

2,

3

(1989),

320-329.

6.

Bailou, D.P.;

Wang,

R.Y.;

Pazer,H.;

and

Tayi,

K.G.

Modeling

ata

manutactunng

systems

o

determine

ata

product

uality.

otal

Data

Quality

Management

TDQM)

Research

Program,

IT

Sloan

School

f

Management,

o.

TDQM-93-O9,

1993.

7.

Bodnar,

.

Reliability

odeling

t internal

ontrol

ystems.

ccounting

eview,

u,

4

(1975),747-757.

8.

Churchill,

.A.

Marketing

esearch:

Methodological

oundations,

th

d.

Chicago:

Dryden

ress,

991.

9.

Culnan,

M.

The

dimensions

f

accessibility

o

online

nformation:

mplications

or

implementing

ffice

nformation

ystems.

CM

Transactions

n

Office

nformationystems,

2,2(1984),

141-150.

10.

Cureton,

.E.,

and

D'Agostino,

.B.

Factor

Analysis:

n

Applied pproach.

illsdale,

NJ:

Lawrence

rlbaum,

983.

11

Cushing,

.E.

A mathematical

pproach

o the

nalysis

nd

design

f

nternalontrol

systems.

ccounting

eview,

9,

1

1974),

24-41.

12.

Delone,

W.H.,

nd

McLean,

E.R.

Information

ystems

uccess: he

uest

or

he

epen-

dent

ariable.

nformation

ystems

esearch,

,

1

1992),

60-95.

13. Deming, .W.OutoftheCrisis.Cambridge, A: Center orAdvanced ngineering

Study,

MIT,

1986.

1 .

Deshpande,

. The

organizational

ontext

fmarket

esearch

se.Journal

fMarketing,

46,

4

(1992),

92-101.



21/30

24

RICHARD Y. WANG

AND DIANE M. STRONG

1

.

Dobyns,

.,

and

Crawford-Mason,

.

Quality

rElse:

TheRevolution

n World usiness.

Boston:

Houehton

Mifflin,

991.

1 .

Emery,

.C.

Organizational

lanning

nd

Control

ystems:

heory

nd

Technology.

New York:

Macmillan,

969.

1 . Griffin,., ndHauser, .R. hevoice f he ustomer.

arketing

cience, 2,1 1993),

1-27.

18.

Hauser, .R.,

nd

Clausing,

.

The houseof

quality.

arvardBusiness

Review,

6,

3

(1988),

63-73.

1

.

Huh, .U.; Keller, .R.; Redman,

.C.;

and

Watkins,

.R. Data

quality.

nformation

nd

Software echnology,

2,

8

(1990),

559-565.

20.

Ives,B.; Olson,M.H.;

and

Baroudi,

.J. he

measurement

f user

nformation

atisfac-

tion.

Communications

f

he

ACM,

26,

10

1983),

785-793.

21

Johnson,

.R.;

eitch,

.A.;

and

Neter,

.Characteristics

f rrorsn ccounts

eceivable

and

nventory

udits.

ccounting

eview,

6,

2

(1981),

270-293.

22.

Juran,

.M.Juran

n

LeadershiporQuality:

n

Executive

andbook.

ew

York:The

Free

Press,

989.

23. Juran,.M.,ndGryna,.M.Quality lanningndAnalysis,ded.NewYork:McGraw-

Hill,

1980.

24.

Knechel,

W.R. A simulation

odel

or

valuating

ccounting

ystems

eliability.

udit-

ing:

A

Journal

f Theory

nd

Practice,

,

2

(1985),

38-62.

25.

Knechel,

W.R.

The

use of

quantitative

odels

n the

eview

nd

evaluation

f

nternal

control:

survey

nd

review.

ournal

f

Accounting

iterature,

(1983),

205-219.

26.

Kotier,

.

Marketing

ssentials.

nglewood

liffs,

J:

Prentice-Hall,

984.

27.

Kriebel,

.H.

Evaluating

he

quality

f nformation

ystems.

n N.

Szysperski

nd

E.

Grochla

eds.),

Design

nd

mplementation

fComputer

ased

nformation

ystems.

erman-

town,

A:

Sijthtoff

nd

Noordhoff,

79.

28.

Laudon,

K.C.

Data

quality

nddue

process

n

arge

nterorganizational

ecord

ystems.

Communications

f

he

ACM,

29,

1

1986),

4-11.

29. Madnick, . Integrating

nformation

rom

lobal systems: ealing

with

he "on-

and

off-ramps"

f henformation

uperhighway.

ournal

fOrganizational

omputing,,2 1 95),

69-82.

30.

Melone,

. A theoretical

ssessment

f

the

user-satisfaction

onstruct

n information

systems

esearch.

anagement

cience,

6,

1

1990),

598-613.

3

1

Moore,G.C.,

and

Benbasat,

.

Development

f n nstrument

o

measure

he

erceptions

of

dopting

n nformation

echnology

nnovation.

nformation

ystems

esearch,

,

3

(1991),

192-222.

32.

Morey,

.C.

Estimating

nd

mproving

he

uality

f

nformation

n the

MIS.

Commu-

nications

f

he

ACM,

25,

5

(1982),

337-342.

33.

Page,

W.,

and

Kaomea,

P.

Using uality

ttributes

o

produce

ptimal

actical

nforma-

tion. n

Proceedings

f

he

ourth

nnual

Workshop

n

nformation

echnologies

nd

Systems

(WITS).

Vancouver,

ritish

olumbia,

anada:

1994,

p.

145-154.

34. Redman,.C.DataQuality: anagementndTechnology.ewYork: antamooks, 92.

35.

Ronen,

.,

and

piegler,

. Information

s

inventory:

new

onceptual

iew.

nformation

and

Management,

1,

1991),

239-247.

36.

Sciore,

.;

Siegel,

M.;

and

Rosenthal,

.

Using

emantic

alues

ofacilitate

nteroperabil-

ity mong

eterogeneous

nformation

ystems.

CM

Transactions

n Database

Systems,

9,2

(1994),

254-290.

37.

Stevens,

.

Applied

multivariatetatisticsor

he ocial cience.

Hillsdale,

J:

Lawrence

Erlbaum,

986.

38.

Strong,

.M. Decision

support

or

xception andling

nd

quality

ontrol n

office

operations.

ecision

upport ystems,,

3

(

1

92),

2 1 -227.

39.

Strong,

.M.; Lee,

Y.W.;

and

Wang,

R.Y.

Data

quality

n context.

orthcoming,

996.

40.

Strong,

.M.,

and

Miller,

.M.

Exceptions

nd

exception

andling

n

computerized

information

rocesses.

CM

Transactionsn

nformationystems,3,

2

(1995),

206-233.

41 Tan,J.K., ndBenbasat,.

Processing

f

graphical

nformation:

decomposition

axon-

omy

o

match ata extractionasks

nd

graphical

epresentations.

nformation

ystems

e-

search,

,4

(1990),

416^439.



22/30


23/30

26 RICHARD

Y. WANG

AND

DIA

14 Beyond Accuracy (1)

Documents

Transcript of 14 Beyond Accuracy (1)