14 Beyond Accuracy (1)
-
Upload
adeelsiddiquepso -
Category
Documents
-
view
219 -
download
0
Transcript of 14 Beyond Accuracy (1)
-
8/10/2019 14 Beyond Accuracy (1)
1/30
Beyond Accuracy: What Data Quality Means to Data Consumers
Author(s): Richard Y. Wang and Diane M. StrongSource: Journal of Management Information Systems, Vol. 12, No. 4 (Spring, 1996), pp. 5-33Published by: M.E. Sharpe, Inc.Stable URL: http://www.jstor.org/stable/40398176.
Accessed: 26/10/2013 17:54
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at.http://www.jstor.org/page/info/about/policies/terms.jsp
.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].
.
M.E. Sharpe, Inc.is collaborating with JSTOR to digitize, preserve and extend access toJournal of
Management Information Systems.
http://www.jstor.org
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/action/showPublisher?publisherCode=meshttp://www.jstor.org/stable/40398176?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/40398176?origin=JSTOR-pdfhttp://www.jstor.org/action/showPublisher?publisherCode=mes -
8/10/2019 14 Beyond Accuracy (1)
2/30
Beyond
Accuracy:
What
Data
Quality
Means to
Data
Consumers
RICHARD
Y. WANG AND DIANE M.
STRONG
Richard
Y. Wang
is Associate
Professor f
Information
echnologies
(IT)
and
Co-Director
forTotal Data
Quality Management
TDQM)
at the MIT
Sloan School
of
Management,
wherehe received
a
Ph.D.
degree
with n
IT
concentration. e is a
major
proponent
f data
quality
research,
with
more than
twenty
apers
written o
develop
a
setof
concepts,
models,
nd methods or
his
field.
Professor
Wang
received
more than
one million dollars
of research
grants
from
both the
public
and
private
sector.His work on data qualitywas applied, bytheNavy,to theNaval Command,
Control,
ommunication,
omputers,
nd
Intelligence
C4I)
informationrchitecture.
He
presented
he state-of-the-art
f
data
quality
research nd
practice
n
the Chief
Information
fficer
CIO)
conference
n
1993,
and
spoke
on
"Data
Quality
in
the
Information
ighways"
t the
Enterprise
93 conference.
r.
Wang organized
he
first
Workshop
n
Information
echnologies
and
Systems
WITS)
in 1991. At WITS-94
he
was
elected
chairman
of the
Executive
Steering
Committee.
He also chaired
or
participated
n
the data
quality
panel
at
WITS and at
the nternational
onference
n
Information
ystems.
Dr.
Wang
is
the ditor
f
nformation
echnologies:
Trends
nd
Perspectives.
Diane M. Strong is AssistantProfessor fManagement tWorcesterPolytechnic
Institute.
he
received
her
Ph.D.
in information
ystems
from
Carnegie
Mellon
University,
n
M.S.
in
computer
cience
from
heNew
Jersey
nstitute f
Technology,
and
a
B.S.
in
computer
cience
from
he
University
f
South
Dakota.
Dr.
Strong's
research
enters
n data
and
nformation
uality
nd
software
uality.
Her
publications
have
appeared
n
A
CM
Transactions
n
nformation
ystems,
MIS
Quarterly,
ecision
Support
ystems,
nd
other
eading
ournals.
ABSTRACT:
oor data
quality
DQ)
can
have substantial
ocial
and economic
mpacts.
Although
irms re
improving
ata
quality
with
practical
pproaches
and
tools,
their
improvement
fforts end
to focus
narrowly
on
accuracy.
We believe
that data
consumershave a muchbroaderdataquality onceptualization han S professionals
realize.
The
purpose
of this
paper
s to
develop
a
framework
hat
aptures
he
aspects
of data
quality
hat re
important
o data
consumers.
A
two-stage
urvey
nd
a
two-phase
sorting
tudy
were conducted
to
develop
a
hierarchical
framework
or
organizing
data
quality
dimensions.
This framework
Acknowledgments'.
esearch
onducted
erein
as
been
upported,
n
part, y
MIT's
TotalData
Quality
Management
TDQM)
Research
rogram
nd
MIT's International
inancial
ervices
Research
enter
IFSRC).
The authors
ish o thank
isa Guarascio
or
herfieldwork
nd
factor
nalysis,
rofessors onald
Bailou,
zak
Benbasat,
nd Stuart
Madnick or
roviding
insightful
omments
nd
ncouragement,
rofessors
rance
eClerc nd
Paul
Berger
or
heir
input
n
the
design
nd
execution
f this
research,
nd
Daphne Png
for
conducting
he
confirmatoryxperimentnd ummarizingts esults.
Journal
fManagement
nformation
ystems
Spring
996,
Vol.
12,
No.
4,
pp.
5-34
Copyright
1996
M.E.
Sharpe,
nc.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
3/30
6
RICHARD Y. WANG
AND DIANE M. STRONG
captures
dimensions f data
quality
hat
re
mportant
o data consumers.
ntrinsic
Q
denotes that data have
quality
in
their
own
right.
Contextual
DQ highlights
he
requirement
hat
ata
quality
must e
consideredwithin
he ontext
fthe ask
t
hand.
RepresentationalDQ and accessibilityDQ emphasizethe mportance fthe role of
systems.
These
findings
re consistent
withour
understanding
hat
high-quality
ata
should
be
intrinsically ood,
contextually
ppropriate
or he
ask,
learly
epresented,
and accessible
to the data consumer.
Our framework
has been
used
effectively
n
industry
nd
government.
Using
this
framework,
S
managers
were able
to better
understand
nd
meet
theirdata
consumers'
data
quality
needs.
The salient
feature
of
this
research
study
s
that
quality
attributes
of data
are collected
from
data
consumers
instead
of
being
defined
theoretically
r based
on researchers'
experience.
Although
exploratory,
this research
provides
a
basis
forfuture
tudies
that
measure
data
quality
along
the
dimensions
of
this framework.
Key
words and phrases: data
administration,
ata
quality,
atabase
systems.
Introduction
Many
databases
are not
error-free,
and
some
contain
a
surprisingly
arge
number
f errors
3,
4,
5,
21, 28,
34, 38,
40].
A recent
ndustry
eport,
or
xample,
notes
thatmorethan60
percent
f the
urveyed
irms
500
medium-size
orporations
withannual sales of
more than
$20
million)
have
problems
with
data
quality.1
Data
qualityproblems,however,go beyond accuracy to include otheraspects such as
completeness
and
accessibility.
A
big
New York
bank found
that
the data
in its
credit-risk
management
atabase
were
only
60
percent
omplete,
necessitating
ou-
ble-checking
y anyone
using
t.2
A
major
manufacturing
ompany
found hat
tcould
not access all sales data for
a
single
customer
because
many
different
ustomer
numbers
were
assigned
to
represent
he ame customer.
n
short,
oor
data
quality
an
have substantial
ocial and economic
impacts.
To
improve
data
quality,
we need to understand
what
data
quality
means to
data
consumers
those
who
use
data).
The
purpose
of this
research,
herefore,
s to
develop
a framework
hat
captures
the
aspects
of
data
quality
that are
important
o
data
consumers.
Related
Research
The
concept
of
"fitnessfor use" is
now
widely
adopted
in
the
quality
iterature.
t
emphasizes
the
importance
of
taking
a
consumer
viewpoint
of
quality
because
ultimately
t s
the
consumerwho will
udge
whether r not
product
s fit
or se
[13,
1
5, 22,
23].
In
this
research,
we
also
take theconsumer
iewpoint
f "fitness oruse"
in
conceptualizing
the
underlying
spects
of data
quality.
Following
this
general
quality literature,we define "data quality" as data that are fitfor use by data
consumers.
n
addition,
we
define
"data
quality
dimension"
s
a
set
of
data
quality
attributes
hat
epresent
single
aspect
or construct f data
quality.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
4/30
BEYOND ACCURACY:
DATA
QUALITY
7
Three
approaches
re used
in the iterature
o
study
ata
quality:
1)
an
intuitive,
2)
a
theoretical,
nd
3)
an
empirical pproach.
The
intuitive
pproach
s taken
when
the
selection
of data
quality
ttributes
or
ny particular tudy
s
based on
the
researchers'
experience
r ntuitive
nderstanding
boutwhat ttributesre
"important."
Most data
quality
tudiesfall
nto his
ategory.
he
cumulative ffect f these tudies s a
small
set
of
data
quality
attributes hat
are
commonly
elected. For
example, many
data
quality
tudies
nclude
accuracy
as
either he
only
or one of several
key
dimensions
[4,
28,
32].
In the
accounting
nd
auditing
iterature,
eliability
s a
key
attribute
sed
in
studying
ata
quality
7,
11 2
1
24, 25,
49].
In the nformation
ystems
iterature,
nformationuality
and user
satisfaction
re
two
major
dimensions
or
valuating
he success of
nformation
ystems
12].
These
two dimensions
generally
nclude
some data
quality
attributes,
uch as
accuracy,
timeliness, recision,reliability, urrency, ompleteness, nd relevancy 2, 20, 27].
Other
attributes
uch as
accessibility
nd
interpretability
re also used in the data
quality
iterature
44,
45].
Most
of these studies
dentify
multiple
dimensionsof data
quality.
Furthermore,
although
hierarchical
iew of data
quality
s less
common,
t
s
reported
n several
studies
27,
34,
44].
None
of these
tudies,however,
mpirically
ollects data
quality
attributes
rom ata consumers.
A
theoretical
pproach
to data
quality
focuses on
how data
may
become deficient
during
he
data
manufacturing
rocess.
Although
heoretical
pproaches
are
often
recommended,
esearch
offers
ew
examples.
One such
study
uses an
ontological
approach nwhichattributesf dataquality re derivedon thebasis of data deficien-
cies,
which re defined
s the
nconsistencies
etween
heview of a real-world
ystem
that an
be inferred
rom
representing
nformation
ystem
nd the view that an be
obtained
by directly
bserving
he real-
world
ystem
42].
The
advantage
of
using
an
intuitive
pproach
is that each
study
can select
the
attributes
most relevant
to the
particular
goals
ofthat
study.
The
advantage
of
a
theoretical
pproach
is the
potential
o
provide
a
comprehensive
et of data
quality
attributes
hat are
intrinsic
o
a
data
product.
The
problem
with both of these
approaches
is
that
hey
focus
on
the
product
n
terms f
development
characteris-
tics instead
of use characteristics.
hey
fail to
capture
the voice of
the consumer.
Evaluations
of theoretical
pproaches
to
defining
product
ttributes s a basis for
improving
quality
find that
they
are not
an
adequate
basis
for
mprovingquality
and are
significantly
worse
than
empirical approaches.
To
capture
the data
quality
attributes
hat are
important
o
data
consumers,
we
take
an
empirical approach.
An
empirical
approach
to data
quality analyzes
data
collected
fromdata consumers
to
determine
he
characteristics
hey
use to assess
whether
ata
are fitforuse
in their asks.
Therefore,
hese
characteristics annot be
theoretically
etermined r
intuitively
elected
by
researchers. he
advantage
of an
empirical approach
is that
t
captures
the voice
of customers.
Furthermore,
t
may
revealcharacteristicshat esearchers ave notconsidered s part fdataquality.The
disadvantage
s
that he correctness
r
completeness
f the
results
annot be
proven
via
fundamental
rinciples.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
5/30
8
RICHARD
Y.
WANG AND DIANE
M.
STRONG
Research
pproach
We follow
the methods
developed
in
marketing
esearch
for
determining
he
quality
characteristicsf
products.
Our
approach
mplicitly
ssumesthatdata can be treated
as
a
product.
t is
an
appropriate
pproach
because
an information
ystem
can
be
viewed
as a data
manufacturing
ystem cting
on
raw data
input
o
produce
output
data
or data
products
1,6,
16,
19, 35, 43,
46].
While
most data
consumers
re
not
purchasing
ata,
they
re
choosing
to use
or
not
to
use
data
in a
variety
f tasks.
Approaches
for
ssessing product
uality
ttributes
hat
re
mportant
o consumers
are
well established
n the
marketing
iscipline
8,
26].
Three
tasks
are
suggested
n
identifying
uality
ttributes
f
a
product:
1)
identifying
onsumer
needs,
2)
identi-
fying
he
hierarchical
tructure
f
consumer
needs,
and
3) measuring
he
mportance
of each consumerneed [17, 18].
Following
the
marketing
iterature,
his
research
dentifies
he
attributes
f
data
quality
that re
important
o
data consumers.
We
first ollect
data
quality
attributes
from
data
consumers,
nd
then collect
importance
atings
for
these
attributes
nd
structure
hem
nto
hierarchical
epresentation
f
data
consumers'
data
quality
needs.
Our
goal
is
to
develop
a
comprehensive,
hierarchical
framework
f
data
quality
attributes
hat re
important
o data
consumers.
Some
researchers
may
doubt
the
validity
of
asking
consumers
about
important
quality
ttributes
ecause
of
the well-known
ifficulties
ith
valuating
users'
satis-
faction
with
information
ystems
[30]. Importance
ratings
and
user
satisfaction,
however, re two differentonstructs. riffinndHuser 17],for xample,demonstr-
ate
that
determining
ttributes
f
importance
o
consumers,
ollecting
mportance
ratings
f these
attributes,
nd
measuring
ttribute
alues
are
valid
characterizations
of consumers'
actions
uch as
purchasing
he
product,
ut
atisfaction
atings
f these
attributes
re
uncorrelated
with onsumer
ctions.
Research
Method
We first
eveloped
two
surveys
hat
were
used to collect
data
fromdata
consumers
(referred
o as
the
two-stage urvey ater).
The first
urvey roduced
listof
possible
data
quality
ttributes,
ttributes
hat ame
to mind
when the data
consumer
hought
about data
quality.
The second
survey
ssessed the
mportance
f these
possible
data
quality
attributes
o
data consumers.
The
importance
atings
rom
he second
survey
were used in an
exploratory
actor
nalysis
o
yield
an intermediate
et of data
quality
dimensions hatwere
important
o
data consumers.
Because thedetailed
urveys roduced comprehensive
etof data
quality
ttributes
for
nput
o
factor
nalysis,
broad
spectrum
f ntermediate ata
quality
dimensions
were revealed. We conducted
follow-up mpirical
tudy
o
group
hese ntermediate
data
quality
dimensionsfor he
following
easons.
First,
t s
probably
not
critical
for
evaluationpurposesto considerso many qualitydimensions 27]. Second, although
these dimensions
can
be
ranked
by
the
importanceratings,
he
highest
ranking
dimensions
may
not
capture
he
essential
spects
of data
quality.
Third,
he nterme-
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
6/30
-
8/10/2019 14 Beyond Accuracy (1)
7/30
10
RICHARD
Y.
WANG
AND DIANE M. STRONG
we startwith
an
exploratory
pproach
that
ncludes not
only
the attributes
n
our
preliminary
ramework,
ut also the attributes
entioned
n
the iterature.
or exam-
ple,
our first
uestionnaire
tartswith ome attributes
timeliness
nd
availability)
hat
are not
part
of this
preliminary
ramework.
The
Two-Stage urvey
The
purpose of this two-stage survey
is
to identify data
quality
dimensions
perceived by
data consumers.
n the
following,
we
summarize
he method
nd
key
results f these
surveys.
The reader
s
referred
o
[47]
for
more detailed
results.
Method
The
method
for
the
two-stage
urvey
was
as follows:
For
stage
1,
we conducted
a
survey
to
generate
a
list
of data
quality
attributes
hat
capture
data
consumers'
perspectives
f data
quality.
For
stage
2,
we conducted
survey
o collect
data
on the
importance
of each
of
these
attributes
o
data
consumers,
nd
then
performed
n
exploratory
actor
nalysis
on
the
mportance
ata
to
develop
an
intermediate
et
of
data
quality
dimensions.3
The First
Survey
Thepurposeof hefirsturveywastogenerate n extensive ist fpotential ataquality
attributes.
ince
thedata
quality
dimensions
esulting
rom
actor
nalysis
depend,
o
a
large
extent,
n the
istof attributes
enerated
rom
he
first
urvey,
we decided
that
(1)
the
subjects
should
be data
consumers
who
have
used data
to make
decisions
in
diverse
ontexts
within
rganizations,
nd
2)
we should
be
able to
probe
nd
question
the
subjects
n
order
o
fully
nderstand
heir
nswers.
Subjects:
Two
pools
of
subjects
were
selected.
The
first onsisted
of 25
data
consumers
urrently orking
n
industry.
he second
was M.B.A.
students
t a
large
U.S.
university.
We selected
1
12
students
who
had
work
experience
s data
consum-
ers.
The
averageage
of
these students
was over 30.
Survey
nstrument:
he
survey
nstrument
see
appendix
A)
included
two sections
for
liciting
ata
quality
ttributes.
he first ection licited
respondents'
irst eaction
to
data
quality
by asking
hem o ist hose
ttributes
hat irst ame
to mind
when
they
thought
of
data
quality (beyond
the common attributes
f
timeliness,
ccuracy,
availability,
nd
interpretability).
he
second
section
provided
furtherues
by listing
32 attributes
eyond
hefour ommonones to
"spark"
ny
dditional
ttributes.
hese
32
attributes
ere
obtained
from
data
quality
iterature nd discussions
among
data
quality
researchers.
Procedure:
For the
elected M.B.A.
students,
he
urvey
was
self-administered.or
thesubjectsworking n industry,headministrationf thesurveywas followedbya
discussion of the
meanings
of
the
attributes
he
subjectsgenerated.
Results: This
process
resulted
n
179
attributes,
s shown
n
figure
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
8/30
BEYOND
ACCURACY:
DATA
QUALITY
1 1
Ability
o
be
Ability
o
Download
Ability
o
Identify Ability
o
Upload
Joined With
Errors
Acceptability
Access
by Accessibility
Accuracy
Competition
Adaptability
Adequate
Detail
Adequate
Volume Aestheticism
Age
Aggregatability
Alterabi
ty
Amountof Data
Auditable
Authority
Availability Believability
Breadth
of Data
Brevity
Certified ata
Clarity
Clarity
f
Origin
Clear Data
Compactness Compatibility
Responsibility
Competitive
Edge
Completeness
Comprehensiveness Compressibility
Concise
Conciseness
Confidentiality Conformity
Consistency
Content
Context
Continuity
Convenience
Correctness
Corruption
Cost
Cost
of
Accuracy
Cost
of Collection
Creativity
Critical
Current
Customizability
Data
Hierarchy
Data
Improves
Efficiency
Data Overload
Definability
Dependability
Depth
of Data
Detail
Detailed
Source
Dispersed
Distinguishable
Updated
Files
Dynamic
Ease
of Access
Ease of
Comparison
Ease
of
Correlation
Ease
of Data
Ease
of Maintenance
Ease of
Retrieval Ease of
Exchange
Understanding
Ease
of
Update
Ease
of Use
Easy
to
Change Easy
to
Question
Efficiency
Endurance
Enlightening Ergonomie
Error-Free
Expandability
Expense Extendibility
Extensibility
Extent
Finalization
Flawlessness
Flexibility
Form
of Presentation
Format
Integrity
Friendliness Generality Habit Historical
Compatibility
Importance
Inconsistencies
Integration Integrity
Interactive
Interesting
Level of Abstraction Level of
Standardization
Localized
Logically
Connected
Manageability
Manipulate
Measurable
Medium
Meets
Requirements
Minimality
Modularity
Narrowly
efined No lost
information
Normality
Novelty
Objectivity
Optimality
Orderliness
Origin
Parsimony Partitionability
Past
Experience
Pedigree
Personalized
Pertinent
Portability
Preciseness
Precision
Proprietary
ature
Purpose
Quantity Rationality Redundancy Regularityf Format
Relevance
Reliability
Repetitive
Reproducibility
Reputation
Resolution
of
GraphicsResponsibility
Retrievability
Revealing
Reviewability Rigidity
Robustness
Scope
of Info
Secrecy Security Self-Correcting
Semantic
Semantics Size Source
Interpretation
Specificity
Speed
Stability Storage
Synchronization
Time-independence
Timeliness
Traceable
Translatable
Transportability
Unambiguity
Unbiased
Understandable
Uniqueness
Unorganized Up-to-Date
Usable Usefulness
User
Friendly
Valid
Value
Variability
Variety
Verifiable
Volatility Well-Documented Well-Presented
Figure
1.
Data
Quality
Attributes
enerated
from he
First
urvey
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
9/30
12
RICHARD Y.
WANG AND DIANE M. STRONG
The
Second
Survey
The
purpose
of the
second
survey
was
to
collect data
about
the
mportance
f
quality
attributes s perceived by data consumers.The resultsof the second surveywere
ratings
f
the
mportance
fthedata
quality
ttributes.
hese
importance
atings
were
the
nput
or n
exploratory
actor
nalysis
to consolidate
hese attributes
nto set
of
data
quality
dimensions.
Subjects:
Since we needed a
sample
consisting
f a wide
range
of
data
consumers
withdifferent
erspectives,
we
selected
the alumni
of
the M.B.A.
program
f
a
large
university
ho reside
n
the United
States.These
alumni onsisted
of ndividuals
n
a
variety
f
ndustries,
epartments,
nd
management
evels
who
regularly
sed
data
to
make
decisions,
thus
satisfying
he
requirement
or data consumers
with
diverse
perspectives.
Fromover
3,200 alumni,
we
randomly
elected
1,500subjects.
Survey
nstrument:
he list of attributes
hown
n
figure
was
used to
develop
the
second
survey uestionnaire
see
appendix
B).
The
questionnaire
sked
the
respondent
to rate he
mportance
f
each
data
quality
ttribute
or heir
ata on
a scale from to
9,
where
1
was
extremely mportant
nd
9 not
important.
he
questionnaire
was
divided ntofour
ections,
depending
n
the
ppropriate
wording
f
the
ttributes,
or
example,
as stand-alone
djectives
or
as
complete
entences.
ince we
did not nclude
definitionswith the
attributes,
t is
possible
that
data consumers
responding
o the
surveys
ould
interpret
he
meanings
of the
attributes
ifferently.
ttributes
hat re
not
mportant
r that re not
consistentlynterpreted
cross
data consumers
will
not
show up as significantnthe factor nalysis.
A
pretest
fthe
questionnaire
was
administered
o fifteen
espondents:
ive
ndustry
executives,
ix
professionals,
wo
professors,
nd
two M.B.A. students.
Minor
hanges
were
made
in
the format f the
survey
s a result f the
pretest.
ased on
the results
from he
pretest,
he
final second
survey
questionnaire
ncluded 118 data
quality
attributes
i.e.,
1
1
8 itemsfor
actor
nalysis)
to be rated or heir
mportance,
s shown
in
appendix
B.
Procedure:
This
survey
was mailed
along
with
cover etter
xplaining
he nature
of
the
study,
he
time to
complete
the
survey
less
than
twenty
minutes),
nd its
criticality.
Most of
the lumni
ddresses
were home
addresses.
To
assure
a
successful
survey,
we sent he
survey
uestionnaires
ia first-classmail. We
gave respondents
six-
week cut-off
eriod
to
respond
o
the
survey.
Response
Rate: Of the 1
500
surveys
mailed,
ixteen
were
returneds
undeliverable.
Of the
remaining
,484,
355
viable
surveys
an
effective
esponse
rateof
24
percent)
were
returned
y
the
six-week
deadline.4
Missing Responses:
While
none of
the
1
18
attributes
items)
had
355
responses,
none
had fewer
han
329
responses.
There did not
ppear
to
be
any
significant
attern
to
the
missing
responses.
Results:
Descriptive
tatistics f
the 1 1
items
attributes)
re
presented
n
appendix
C. Most of the 118 itemshad a fullrangeof values from1 to 9, where 1 means
extremely
mportant
nd 9
not
mportant.
he
exceptions
were
accuracy,
reliability,
level
of
detail,
and
easy
identification
f errors.
Accuracy
and
reliability
ad the
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
10/30
BEYOND
ACCURACY: DATA
QUALITY
13
smallest
range,
withvalues
ranging
rom
to
7;
level of detail
nd
easy
identification
of errors
anged
from
1
to 8.
Ninety-nine
f the 1
18 items
85
percent)
had
means
less
than or
equal
to
5;
that
s,
most
of
the items
surveyed
were
considered to be
important
ata
quality
ttributes.wo items
accuracy
and correct had means ess
than
2 and thus
were
overallthemost
mportant
ata
quality
ttributes,
ith
means
of
1 77
1
and
1.816,
respectively.
n
exploratory
actor
nalysis
ofthe
mportance atings
produced
twenty
dimensions,
as shown
in Table
1. As
mentioned
earlier,
more
detailed
results an
be
found
n
[47].
Factor
Interpretation:
actor
analysis
is
appropriate
for this
study
because
its
primary
pplication
s
to uncover
n
underlying
ata structure
10, 37].
An alternative
method
would
be to ask
subjects
to
group
he attributes
nto
common
dimensions,
s
we did
in the second
phase
of
thisresearch.
Grouping
asks
provide
more assurance
that he factors re interprtable.While bothfactor nalysisand grouping asks are
used
in the
iteratureo
uncover
dimensions,
rouping
asksbecome
impractical
when
the
number
f tems
ncreases.
Furthermore,
actor
nalysis
can
uncoverdimensions
that
re not
obvious
to researchers.
A
potential
disadvantage
of factor
nalysis
in
this research
s that ttributes
with
nothing
n
common
could
load
on the same
factorbecause
they
have the same
importance
atings.
his
would
ead to
problems
n
nterpreting
he
factors.
mportance
ratings
ollected
from
sample
of 355 data
consumers
with diverse
backgrounds,
however,
will be
similar
nly
f data consumers
perceive
these
tems
onsistently.
f
these
items
form different
onstructs,
ata consumers
will
rate
their
mportance
differently,esultingndifferentactors. urthermore,hetwenty imensions how
face
validity
ecause
it was
easy
for
us
to name
these factors.
Factor
Stability.
Since
the
number
f
survey
responses
relativeto the
number
of
attributes
s
lower than
recommended,
t is
possible
that
he factor tructure
s not
sufficiently
table.
To testfactor
tability,
we reran
he
analysis using
two different
approaches.
First,
n
a
series of
factor
nalysis
runs,
we varied
the number
f
factors
to
test
whether he attributes
oading
on those
factors
hanged
as the number
of
factors
changed.
Second,
we
ran the factor
analysis
using
as
input only
those
attributes
hat
ctually
oaded
on the
twenty
imensions
o
test
whether he
nsignif-
icantattributes
ffected
he results.
In the first
pproach,
our
analysis
of
factor
tability
ound hatfourteen
ut of the
twenty
imensions
were stable across
the eries
of runs.That
s,
the ame dimensions
consisting
f
the same
attributes
merged
see
Table
1).
Two
dimensions,
15
ease
of
operation)
nd
20
(flexibility),
ere stable
n terms f the attributes
oading
on
them,
but
were
combined
nto a
single
dimension
n runsfixed at
fewer
dimensions.
Four
dimensions,5,
9, 16,
and
19,
which are all
single-attribute
actors,
have less than
desirable
stability.Specifically,
n
some
runs,
these dimensions either were not
significant
r
they
were
combined
with nother
ingle-attribute
imension.
Inthe econd
pproach,
he actor
nalysis
sedthe
1
attributes
hown
n
Table
1
which
meets esponses-to-attributeatio ecommendations.he secondapproachproduced he
same results
s the irst.
hus,
we
conclude hat hese
imensions re table
with he aveat
that
dditional
esearch
s needed o
verify
most f
the
ingle-attribute
imensions.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
11/30
14
RICHARD Y. WANG
AND DIANE
M. STRONG
Table
1
Description
f he
Dimensions
Dim.
Nameof
dimension
Mean
S.D.
C.I.
Cronbacha
(attributeist)
1
Believability
believable)
271
2.51-2.91
N/
2 Value-added
data
giveyou
2.83
0.09
2.65-3.01
0.70
a
competitivedge,
data add
value to
your
perations)
3
Relevancy
applicable,
2.95
0.06
2.82-3.08
0.69
relevant,
nteresting,
sable)
4
Accuracy
data
are certified
3.05
0.10
2.86-3.24
0.87
error-free,
ccurate,
correct, lawless,
reliable,
errorsan be easily
identified,
he
ntegrity
f he
data,
precise)
5
Interpretability
interprtable)
3.20
0.09
3.03-3.37
N/A
6
Ease
of
understanding
3.22
0.07
3.07-3.37
0.79
(easily
understood,
lear,
readable)
7
Accessibility
accessible,
3.47
0.08
3.32-3.62
0.81
retrievable,
peed
of
access,
available,
up-to-
date)
8
Objectivity
unbiased,
3.58
0.09
3.40-3.76
0.73
objective)
9
Timeliness
age
of
data)
3.64
0.11
3.43-3.85
N/A
10
Completeness
breadth,
3.88
0.09
3.74-4.06
0.98
depth,
nd
scope
of
information
ontained
n he
data)
11
Traceability(well-
3.97
0.09
3.7^^.14
0.79
documented,
asily
raced,
verifiable)
12
Reputation
reputation
f he
4.04
0.10
3.83-4.25 0.87
data
source,
reputation
f
the
data)
13
Representational
4.22
0.09
4.04-4.39
0.84
consistency
data
are
continuously
resented
n
same
format,
onsistently
represented,
onsistently
formatted,
ata are
compatible
ith
revious
data)
14 Cost-effectivenesscostof 4.25 0.10 4.05-4.44 0.85
data
accuracy,
ost
ofdata
collection,
ost-effective)
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
12/30
BEYOND
ACCURACY:
DATA
QUALITY
15
Table
1.
Continued
Dim. Name of dimension
Mean
S.D.
C.I.
Cronbacha
(attributeist)
15 Ease
of
operation
easily
4.28 0.08
4.13-4.44 0.90
joined,
easily
changed,
easily
updated,
easily
downloaded/uploaded,
data
can be used
for
multiple
purposes,
manipulate,
easily aggregated,
easily
reproduced,
data can be
easily
integrated, easily
customized)
16 Variety f data and data 4.71 0.12 4.48-4.95 N/A
sources
(you
have
a
variety
of
data and
data
sources)
17 Concise
(well-presented,
4.75
0.08 4.5^-4.92
0.92
concise,
compactly
represented,
well-organized,
aesthetically
pleasing,
form
of
presentation,
well-
formatted,
ormat
f the
data)
18
Access
security
data
cannot
4.92
0.11
4.70-5.14 0.84
be accessed
by
competitors,
data are
of a
proprietary
nature,
access
to
data
can
be
restricted,
ecure)
19
Appropriate
mount
of data
5.01
0.11 4.79-5.23
N/A
(the
amount
of
data)
20
Flexibility
adaptable,
5.34
0.09
5.17-5.51
0.88
flexible,
extendable,
expandable)
A dimension
mean was
computed
s
the
average
of
the
responses
o
all of the tems
with
loading
of 0.5
or
greater
n
thedimension.
or
example,
thedimension
ase of
understandingonsistedof the three tems:easily understood, eadable, and clear.
The mean
importance
or ase
of
understanding
as
the
average
of
the
mportance
ratings
or
asily
understood,
eadable,
nd clear.
See
Table
1 for he
means,
tandard
deviations,
nd confidence
ntervals.)
Cronbach's
alpha,
a measure
of construct
eliability,
as
computed
or ach dimen-
sion
to assess
the
reliability
f the set of
items
forming
hatdimension.
As shown
n
the
rightmost
olumn
ofTable
1
these
lpha
coefficients
anged
rom
.69 to 0.98.
As
a
rule,
alphas
of 0.70
or above
represent
atisfactory eliability
f
the set of items
measuring
he construct
dimension).
Thus,
the
tems
measuring
ur dimensions re
sufficiently
eliable.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
13/30
16
RICHARD Y. WANG AND
DIANE
M.
STRONG
Table
2.
Four
Target
Categories
for
he 20
Dimensions
Targetcategory
Dimension
Adjustment
Accuracy
ofdata
Believability
None
Accuracy
None
Objectivity
None
Completeness
Moved
to
category
2
Traceability
Eliminated
Reputation
None
Variety
f
Data Sources
Eliminated
Relevancy
of data
Value-added
None
Relevancy
None
Timeliness
None
Ease of
operation
Eliminated
Appropriate mountof data None
Flexibility
Eliminated
Representation
of data
Interpretability
None
Ease
of
understanding
None
Representational
consistency
None
Concise
representation
None
Accessibility
f data
Accessibility
None
Cost-effectiveness
Eliminated
Access
security
None
Note:A target ategorys a hypothesizedategory asedon ourpreliminaryonceptual
framework.
The
Two-Phase
orting
tudy
Twenty dimensions
were
too many
for practical
evaluation
purposes.
In
addition,
although
these
dimensions
were ranked
by
the
importance
ratings,
he
highest-ranking
imensions
might
not
capture
he essential
aspects
of data
quality.
Finally,
grouping
f
these
dimensions
was consistent
with esearch
n
the
marketing
discipline,
nd substantiated
hierarchical tructure
f data
quality
dimensions.
Using ourpreliminary onceptualframework, e conducted a two-phasesorting
study.
The
first
hase
of the
study
was
to sort hese
ntermediate
imensions
nto a
small
set of
categories.
The
second
phase
was
to confirm hat
hesedimensions
ndeed
belonged
to
the
categories
n our
preliminaryonceptual
framework.
Method
We first
reatedfour
ategories
see
column
1 of Table
2)
based on
our
preliminary
conceptual
framework,
ollowing
Moore and Benbasat
[31].
We then
grouped
he 20
intermediate imensions nto thesefour ategories see column2 of Table 2). Our
initial
grouping
was
based on our
understanding
f these
categories
nd dimensions.
The
sorting tudy
rovided
he
datato
test his nitial
rouping
nd to make
adjustments
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
14/30
BEYOND
ACCURACY: DATA
QUALITY
17
in the
ssignment
f dimensions
o
target
ategories
see
column 3 of Table
2),
which
will be
further
iscussed.
The
Sorting tudy:
Phase
1
Subjects:
Thirty
ubjects
from
ndustry
were selected
to
participate
n
the overall
sorting rocedure.
hese
subjects
were enrolled
n
an
evening
M.B. A. class
in
another
large
university. ighteen
f these 30
subjects
were
randomly
elected
to
participate
in the first
hase.
Design:
Each
of the
20
dimensions,
long
with a
description,
was
printed
n a
3x5-inch
card,
as shown
in
appendix
Dl. These
cards were used
by
each of
the
subjects
in the
study
o
group
the 20 dimensions
nto a small
set of
categories.
In
contrast o phase 2, the subjects forphase 1 were not given a prespecified et of
categories
each
with
a name
and
description.
The
study
was
pretestedby
two
graduate-level
MIS students
o
clarify ny
ambiguity
n the
design
or instruction.
Procedure:
The
study
was run
by
a third
arty
who
was not
ware
of the
goal
of
this
research,
n order
o avoid
any
bias
by
the
uthors.
efore
performing
he ctual
sorting
task,
ubjects
performed
trial ort
using
dimensions
ther han hese 20 dimensions
to
ensure hat
hey
understood
he
procedure.
n
the
ctual
sorting
ask,
ubjects
were
given
nstructionso
group
he20 cards
nto hree
o five
piles.
The
subjects
were
then
asked
to
label each
of
their
iles.
The
Sorting
tudy:
Phase
2
The
original ssignment
f dimensions
o
categories
was
adjusted
based on the
results
from
he
phase
1
study.
For
example,
as shown
n column
3 of Table
2,
completeness
is moved
from
he
accuracy category
o the
relevancy ategory
because
only
four
subjects
assigned
this
dimension
o the former
ategory,
whereas twelve
assigned
it
to
the latter.
This
was a reasonable
adjustment
because
completeness
could be
interpreted
ithin the
context
of the data
consumer's
task instead of our
initial
interpretation
hat
ompleteness
was
part
f
the
accuracycategory.
In addition, ive dimensionswere eliminated: raceability, ariety f data sources,
ease
of
operation,
lexibility,
nd
cost-effectiveness.
hese dimensions
were elimi-
nated for oth
of the
following
wo reasons:
First,
ubjects
did
not
consistently
ssign
the dimension
o
any
category.
For
example,
seven
subjects assigned
cost-effective-
ness to the
relevancy
ategory,
hree
ssigned
t o the
other hree
ategories,
nd
eight
assigned
it
to a self-defined
ategory.
econd,
the
dimensionwas not
ranked
highly
in
terms
f
mportance.
or
example,
cost-effectiveness
as ranked
14
out of 20.
The
purpose
of the
second
phase
of
the
sorting tudy
was to
confirm
hat the
dimensions
ndeed
belonged
to
these
adjusted
categories.
Subjects:
The
remaining
welve
subjects
from
ur
subject
pool participated
n this
phase ofstudy.
Design:
For each
category
of
dimensions
revealed from
phase
1,
the authors
provided
a
label,
as
shown
in
appendix
D2,
based
on
the
underlying
imensions.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
15/30
18
RICHARD
Y. WANG AND DIANE
M. STRONG
Descriptive
phrases,
rather
han
ingle
words,
were used as labels
to avoid confound-
ing
category
abels with
ny
of the dimension abels.
Procedure:
The
third
arty
hat an
he
phase
1
study
lso ran
he
phase
2
study.
he
procedure
for
phase
2
was
similar o
that f
phase
1 with he
exception
hat
ubjects
were instructed
o
place
each of
the
dimension
cards
into the
category
that
best
represents
hatdimension.
Findings
In
this
ection,
we
present
heresults
rom he
wo-phase
tudy.
We
using
he
djusted
target ategories
o tabulate
he results rom
he
phase
1
study.
As
shown
n Table
3,
theoverall
placement
atio f
dimensionswithin
arget
ategories
was 70
percent.
his
indicatedthatthese 15 dimensionswere generallybeing placed in theappropriate
categories.
These
results,
ogether
with
he
adjustment
f dimensions
within
he
target
atego-
ries,
ed
us to refine he four
arget
ategories
s
follows:
1 The
extent
o
which data
values are
in
conformance
with he actual
or
true
values;
2.
The
extent o
whichdata are
applicable
pertinent)
o the
askof
hedata
user;
3. The extent
o which data
are
presented
n
an
intelligible
nd
clear
manner;
and
4. The extent o whichdata areavailable orobtainable.
These four
descriptions
were used
as the
category
abels
for he
phase
2
study.
The
results
from
he
phase
2
study
Table 4)
showed
that
he overall
placement
ratio
of
dimensionswithin
arget ategories
was 8
1
percent.
Toward
Hierarchical
rameworkf
Data
Quality
In
our sorting
study,
we
labeled each category on the
basis
of our
preliminary
conceptual
framework
nd our initial
grouping
f the dimensions.For
example,
we
labeled
as
accuracy
the
ategory
hat ncludes
ccuracy,objectivity,elievability,
nd
reputation.
imilarly,
we labeled thethree ther
ategories
s
relevancy, epresenta-
tion,
nd
accessibility.
We
used such a
labeling
so thatwe would not ntroduce
ny
additional
nterpretations
r
biases into
he
sorting
asks.
However,
such
representative
abels did not
necessarily
apture
he
essence
of the
underlying
imensions
s
a
group.
For
example,
as a
whole,
the
group
of dimensions
labeled
accuracy
was richer han
that
conveyed
by
the label
accuracy.
Thus,
we
reexamined he
underlying
imensions
onfirmed or ach of
the four
ategories
nd
picked
a
label
that
captured
the
essence of
the
entire
ategory.
For
example,
we
relabeled
accuracy
as
intrinsic
Q
because the
underlying
imensions
aptured
he
intrinsic spectofdataquality.
As
a result
of
this
reexamination,
we
relabeled
two of
the four
categories.
The
resulting
ategories,
herefore,
re:
ntrinsic
Q,
contextual
Q,
representational
Q,
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
16/30
BEYOND
ACCURACY: DATA
QUALITY
19
Table
3.
Results rom
hePhase 1
Study
15
Dimensionsnd 18
Subjects)
Actual
ategories
Target Accuracy elevancy epresen-Access- N/A Total Target
categories
tation
ibility
(%)
Accuracy
57
10
2
1
2
72
79
Relevancy
16
56
11 2
5
90
62
Repres.
8
4
50
5 5 72
69
Access.
1
2
3
26
4
36
72
Total
item
placements:
270
Hits:
189
Overall hitsratio:70%
Note:
A
target
ategory
s a
hypothesized
ategory
ased
on our
preliminaryonceptual
ramework.
An actual
ategory
s
the
ategory
elected
y
the
ubjects
or dimension.
N/A"
denotes Not
Applicable,"
whichmeans
hat he
ctual
ategory
oes
notfit nto
ny
arget ategory.
Table
4.
Results
rom
he hase
2
Study
15
Dimensionsnd
12
Subjects)
Actual
ategories
Target
Accuracy
Relevancy
Represen-
Access-
Total
Target
%)
categories tation ibility
Accuracy
43
3
1
0 48 90
Relevancy
7
44
3
6 60
73
Repres.
2
6
40
0 48
83
Access.
1
1
3
19
24
79
Total item
placements:
180
Hits:
146
Overall
hits
ratio:81%
and
ccessibility
Q
(see
figure
).
Intrinsic
Q
denotes
hat ata
have
uality
n
heir
own
right.
ccuracy
s
merely
ne
ofthefour imensions
nderlying
his
ategory.
Contextual
Q
highlights
he
equirement
hat ata
uality
must
e
considered ithin
the ontext
f the
ask t
hand;
hat
s,
datamust e
relevant,
imely,omplete,
nd
appropriate
nterms f mount
o
as to
add
value.
Representational
Q
and ccessi-
bility
Q
emphasize
he
mportance
fthe ole
of
systems;
hat
s,
the
ystem
must
be
accessible
ut
ecure,
nd he
ystem
must
resent
ata
n
uch
way
hat
hey
re
interprtable,
asy
o
understand,
nd
represented
oncisely
nd
consistently.
Thishierarchicalrameworkonfirmsnd ubstantiateshe reliminaryramework
that
we
proposed.
elow
we elaborate
n these
our
ategories,
elate
hem o the
literature,
nd
discuss omefutureesearch irections.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
17/30
20
RICHARD Y. WANG
AND DIANE
M. STRONG
Data
uality
I
"
'
I
Intrinsic Contextual
Representational Accessibility
Data
uality
Data
uality
Data
uality
Data
uality
Believability(i)
^j
Data
uality
||
Value-added
^j
Data
uality
(2)
I
j
|_g,
I
Data
uality
Interpretability
Lgl
(5)
Data
uality
I
Believability(i)
||
Value-added
2)
j
I
Interpretability
5)
Accessibility7)
I
Accuracy
4)
Relevancy3) Ease
f
nderstanding
6)
Access
ecurity18)1
Objectivity
8)
Timeliness
9)
Representational
onsistency
13)
l^- - ^-J
Reputation
12)
Completeness
10)
Concise
epresentation
LaMaaMj
Appropriate
mount
f ata
1
)
^^hmmmmJ
Figure
. A
Conceptual
ramework
f Data
Quality
Intrinsic
ata
Quality
Intrinsic
DQ
includes not
only
accuracy
and
objectivity,
which
are
evident
to IS
professionals,
ut lso
believability
nd
reputation.
his
suggests
hat,
ontrary
o
the
traditional
evelopment
iew,
data consumers
lso
view
believability
nd
reputation
as an
integral art
of intrinsic
Q;
accuracy
and
objectivity
lone
are
not sufficient
fordatato be considered fhighquality.This is analogoustosomeaspectsofproduct
quality.
n the
product
uality
rea,
dimensions
f
quality mphasized
by
consumers
are broader han
hose
emphasized
by product
manufacturers.
imilarly,
ntrinsic
Q
encompasses
more han
he
ccuracy
nd
objectivity
imensions
hat S
professionals
strive o deliver. This
finding mplies
that S
professionals
hould
also
ensure the
believability
nd
reputation
f data. Research
on data source
tagging
45,
48]
is a
step
in this
direction.
Contextual ata
Quality
Some
individualdimensions
nderlying
ontextual
Q
were
reported reviously;
or
example,
completeness
nd
timeliness
4].
However,
ontextual
Q
was
not
xplicitly
recognized
n
the
data
quality
iterature. ur
grouping
f dimensions
for
ontextual
DQ
revealed that
data
quality
must
be consideredwithin he
context
of the task at
hand.
This was
consistentwith
he
iterature
n
graphical
data
representation,
hich
concluded that he
quality
of
a
graphicalrepresentation
ust
be
assessed
within he
context f
the
data
consumer's task
41].
Since
tasks and
their ontexts
vary
cross time nd
data
consumers,
ttaining igh
contextual
data
quality
s
a
research
hallenge
[29,
39].
One
approach
s to
parame-
terizecontextualdimensionsfor ach task so that dataconsumer an specifywhat
type
f
task s
beingperformed
nd
the
ppropriate
ontextual
arameters
or
hat ask.
Below
we
illustrate uch
a
research
rototype.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
18/30
BEYOND ACCURACY:
DATA
QUALITY
21
During
Desert
Storm combat
operations
n
the Persian
Gulf,
naval researchers
recognized
heneed
to
explicitly
ncorporate
ontextual
Q
into
nformation
ystems
in order
o deliver more
timely
nd accurate
nformation.
s a
result,
prototype
s
being
developed
that
will be
deployed
o theU.S. aircraftarriers s
stand-alone
mage
exploitation
ools
[33].
This
prototype
arameterizes
ontextual
imensions
for ach
task
so that
pilot
or a strike
lanner
an
specify
what
type
of task
e.g.,
strike
lan
or
damage
assessment)
s
being performed
nd
the
ppropriate
ontextual
arameters
(relevant mages
in terms
f
location,
urrency,
esolution,
nd
target
ype)
for hat
task.
Representational
ata
Quality
Representational Q includesaspects related o the format f the data {concise and
consistent
epresentation)
nd
meaning
of
data
(interpretability
nd ease
of
under-
standing).
These two
aspects
suggest
hat
or
ata consumers
o
conclude that ata are
well
represented,
hey
mustnot
only
be concise and
consistently epresented,
ut
also
interprtable
nd
easy
to
understand.
Issues
related o
meaning
and format rise
in
database
systems
research
n which
format
s addressed s
part
f
syntax,
nd
meaning
s
part
f semantic
econciliation.
One
focusof current
esearch
n
that
rea
s
context
nterchangemong
heterogeneous
database
systems
36].
For
example, urrency igures
n the
ontext f a U.S.
database
are
typically
n
dollars,
whereas those
n
a
Japanese
database are
likely
o
be
in
yen.
This typeof contextbelongs to therepresentational Q, instead ofcontextualDQ,
which
deals
with he
data
consumer's task.
Accessibility
ata
Quality
Information
ystems
professionals
understand
ccessibility
DQ
well. Our research
findings
how
that
ata
consumers lso
recognize
ts
mportance.
ur
findings
ppear
to differ
rom he
iteraturehat
reats
ccessibility
s
distinct rom
nformation
uality
(see,
e.g.,
9]).
A
closer examination eveals hat
ccessibility
s
presumed
i.e.,
perfect
accessibilityDQ) in earlier nformationuality iterature ecause hard-copy eports
were used insteadof on-linedata.
In
contrast,
ata
consumers
n
our
research
ccess
computers
fortheir nformation
eeds,
and
therefore,
iew
accessibility
DQ
as an
important
ata
quality
aspect.
However,
there
s
little
difference
etween
treating
accessibility
DQ
as a
category
of
overall
data
quality,
or
separating
t
from
other
categories
of data
quality.
n either
ase,
accessibility
needs to
be taken
nto
ccount.
Summary
nd
Conclusions
TO
IMPROVE
DATA
QUALITY,
WE
NEED TO
UNDERSTAND
WHAT DATA
QUALITY
means
to data consumers (those who use data). This researchdevelops a hierarchical
frameworkhat
aptures
he
spects
of
data
quality
hat re
mportant
o
data
consum-
ers.
Specifically,
118
data
quality
attributes ollected
from
data
consumers
are
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
19/30
-
8/10/2019 14 Beyond Accuracy (1)
20/30
BEYOND
ACCURACY: ATA
QUALITY
23
quality
ategories
nd
their
nderlying
imensions
n
thisframework ould
provide
the constructs
o
be measured.
Second,
methodsfor
mproving
he
quality
of
data
as
perceivedby
data
consumers
ould be
developed.
Such methods ould include user
training
o
change
the
quality
of data
as
perceived by
data consumers.
Third,
the
framework
ould
be useful
s
a checklist
during
ata
requirements
nalysis.
That
is,
many
of the data
quality
characteristics
re
actually
system requirements
r user
training
equirements.
inally,
since a
single empirical
study
s never
sufficient
o
validate
the
completeness
of
a
framework,
urther
esearch
s
needed to
apply
this
framework
n
specific
workcontexts.
NOTES
1
Computerworld,eptember8, 1992, .
80-84.
2. TheWall treet ournal,
ay
26, 1992,p.B6.
3.
We
refer
o the
haracteristics
fdata
quality
s data
quality
ttributes
r as measure-
ment
tems
o
distinguish
hem
rom
ata
quality
imensions
hich
esult
rom he
factor
analysis
hroughout
his
ection.
4.
Surveys
with
significant
missing
values
(nine
surveys)
r
surveys
eturned
y
academics
fourteen urveys)
were
not considered
iable
and
therefore
ere
eliminated
from
ur
nalysis.
REFERENCES
1.
Arnold,
.E.
Information
anufacturing:
he
oad
o database
uality.
atabase,
15,
5
(1992),32.
2.
Bailey,
J.E.,
nd
Pearson,
.W.
Development
f a tool
for
measuring
nd
analyzing
computer
ser
atisfaction.
anagement
cience,
9,
5
(1
983),
530-545.
3.
Bailou,
D.P.,
and
Pazer,
H.L.
Designing
nformation
ystems
o
optimize
he
ccuracy-
timeliness
radeoff.
nformation
ystems
esearch
ISR),
6,
1
1995),
51-72.
4.
Bailou,
D.P.,
and
Pazer,
H.L.
Modeling
ata
nd
process
uality
n
multi-input,
ulti-
output
nformation
ystems.
anagement
cience,
1,2
(1985),
150-162.
5.
Bailou,
D.P.,
and
Tayi,
K.G.
Methodology
or
llocating
esources
ordata
quality
enhancement.
ommunications
f
he
ACM,
2,
3
(1989),
320-329.
6.
Bailou, D.P.;
Wang,
R.Y.;
Pazer,H.;
and
Tayi,
K.G.
Modeling
ata
manutactunng
systems
o
determine
ata
product
uality.
otal
Data
Quality
Management
TDQM)
Research
Program,
IT
Sloan
School
f
Management,
o.
TDQM-93-O9,
1993.
7.
Bodnar,
.
Reliability
odeling
t internal
ontrol
ystems.
ccounting
eview,
u,
4
(1975),747-757.
8.
Churchill,
.A.
Marketing
esearch:
Methodological
oundations,
th
d.
Chicago:
Dryden
ress,
991.
9.
Culnan,
M.
The
dimensions
f
accessibility
o
online
nformation:
mplications
or
implementing
ffice
nformation
ystems.
CM
Transactions
n
Office
nformationystems,
2,2(1984),
141-150.
10.
Cureton,
.E.,
and
D'Agostino,
.B.
Factor
Analysis:
n
Applied pproach.
illsdale,
NJ:
Lawrence
rlbaum,
983.
11
Cushing,
.E.
A mathematical
pproach
o the
nalysis
nd
design
f
nternalontrol
systems.
ccounting
eview,
9,
1
1974),
24-41.
12.
Delone,
W.H.,
nd
McLean,
E.R.
Information
ystems
uccess: he
uest
or
he
epen-
dent
ariable.
nformation
ystems
esearch,
,
1
1992),
60-95.
13. Deming, .W.OutoftheCrisis.Cambridge, A: Center orAdvanced ngineering
Study,
MIT,
1986.
1 .
Deshpande,
. The
organizational
ontext
fmarket
esearch
se.Journal
fMarketing,
46,
4
(1992),
92-101.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
21/30
24
RICHARD Y. WANG
AND DIANE M. STRONG
1
.
Dobyns,
.,
and
Crawford-Mason,
.
Quality
rElse:
TheRevolution
n World usiness.
Boston:
Houehton
Mifflin,
991.
1 .
Emery,
.C.
Organizational
lanning
nd
Control
ystems:
heory
nd
Technology.
New York:
Macmillan,
969.
1 . Griffin,., ndHauser, .R. hevoice f he ustomer.
arketing
cience, 2,1 1993),
1-27.
18.
Hauser, .R.,
nd
Clausing,
.
The houseof
quality.
arvardBusiness
Review,
6,
3
(1988),
63-73.
1
.
Huh, .U.; Keller, .R.; Redman,
.C.;
and
Watkins,
.R. Data
quality.
nformation
nd
Software echnology,
2,
8
(1990),
559-565.
20.
Ives,B.; Olson,M.H.;
and
Baroudi,
.J. he
measurement
f user
nformation
atisfac-
tion.
Communications
f
he
ACM,
26,
10
1983),
785-793.
21
Johnson,
.R.;
eitch,
.A.;
and
Neter,
.Characteristics
f rrorsn ccounts
eceivable
and
nventory
udits.
ccounting
eview,
6,
2
(1981),
270-293.
22.
Juran,
.M.Juran
n
LeadershiporQuality:
n
Executive
andbook.
ew
York:The
Free
Press,
989.
23. Juran,.M.,ndGryna,.M.Quality lanningndAnalysis,ded.NewYork:McGraw-
Hill,
1980.
24.
Knechel,
W.R. A simulation
odel
or
valuating
ccounting
ystems
eliability.
udit-
ing:
A
Journal
f Theory
nd
Practice,
,
2
(1985),
38-62.
25.
Knechel,
W.R.
The
use of
quantitative
odels
n the
eview
nd
evaluation
f
nternal
control:
survey
nd
review.
ournal
f
Accounting
iterature,
(1983),
205-219.
26.
Kotier,
.
Marketing
ssentials.
nglewood
liffs,
J:
Prentice-Hall,
984.
27.
Kriebel,
.H.
Evaluating
he
quality
f nformation
ystems.
n N.
Szysperski
nd
E.
Grochla
eds.),
Design
nd
mplementation
fComputer
ased
nformation
ystems.
erman-
town,
A:
Sijthtoff
nd
Noordhoff,
79.
28.
Laudon,
K.C.
Data
quality
nddue
process
n
arge
nterorganizational
ecord
ystems.
Communications
f
he
ACM,
29,
1
1986),
4-11.
29. Madnick, . Integrating
nformation
rom
lobal systems: ealing
with
he "on-
and
off-ramps"
f henformation
uperhighway.
ournal
fOrganizational
omputing,,2 1 95),
69-82.
30.
Melone,
. A theoretical
ssessment
f
the
user-satisfaction
onstruct
n information
systems
esearch.
anagement
cience,
6,
1
1990),
598-613.
3
1
Moore,G.C.,
and
Benbasat,
.
Development
f n nstrument
o
measure
he
erceptions
of
dopting
n nformation
echnology
nnovation.
nformation
ystems
esearch,
,
3
(1991),
192-222.
32.
Morey,
.C.
Estimating
nd
mproving
he
uality
f
nformation
n the
MIS.
Commu-
nications
f
he
ACM,
25,
5
(1982),
337-342.
33.
Page,
W.,
and
Kaomea,
P.
Using uality
ttributes
o
produce
ptimal
actical
nforma-
tion. n
Proceedings
f
he
ourth
nnual
Workshop
n
nformation
echnologies
nd
Systems
(WITS).
Vancouver,
ritish
olumbia,
anada:
1994,
p.
145-154.
34. Redman,.C.DataQuality: anagementndTechnology.ewYork: antamooks, 92.
35.
Ronen,
.,
and
piegler,
. Information
s
inventory:
new
onceptual
iew.
nformation
and
Management,
1,
1991),
239-247.
36.
Sciore,
.;
Siegel,
M.;
and
Rosenthal,
.
Using
emantic
alues
ofacilitate
nteroperabil-
ity mong
eterogeneous
nformation
ystems.
CM
Transactions
n Database
Systems,
9,2
(1994),
254-290.
37.
Stevens,
.
Applied
multivariatetatisticsor
he ocial cience.
Hillsdale,
J:
Lawrence
Erlbaum,
986.
38.
Strong,
.M. Decision
support
or
xception andling
nd
quality
ontrol n
office
operations.
ecision
upport ystems,,
3
(
1
92),
2 1 -227.
39.
Strong,
.M.; Lee,
Y.W.;
and
Wang,
R.Y.
Data
quality
n context.
orthcoming,
996.
40.
Strong,
.M.,
and
Miller,
.M.
Exceptions
nd
exception
andling
n
computerized
information
rocesses.
CM
Transactionsn
nformationystems,3,
2
(1995),
206-233.
41 Tan,J.K., ndBenbasat,.
Processing
f
graphical
nformation:
decomposition
axon-
omy
o
match ata extractionasks
nd
graphical
epresentations.
nformation
ystems
e-
search,
,4
(1990),
416^439.
This content downloaded from 128.95.130.223 on Sat, 26 Oct 2013 17:54:36 PMAll use subject to JSTOR Terms and Conditions
http://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp -
8/10/2019 14 Beyond Accuracy (1)
22/30
-
8/10/2019 14 Beyond Accuracy (1)
23/30
26 RICHARD
Y. WANG
AND
DIA