. gov Toward Digital Government: The Case of Government Statistics Gary Marchionini University of...
-
date post
19-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of . gov Toward Digital Government: The Case of Government Statistics Gary Marchionini University of...
.gov
Toward Digital Government: The Case of Government Statistics
Gary MarchioniniUniversity of North Carolina at Chapel Hill
www.ils.unc.edu/govstat
NSF Grants EIA 0131824 and EIA 0129978 Principal Investigators: Gary Marchionini, Stephanie Haas, Ben
Shneiderman, Catherine Plaisant, and Carol Hert
.gov
Digital Government: Leveraging IT
• Government information dissemination– Websites– Other publications (no mass emailings yet)
• Transactions– Registrations– Census, regulatory filings– Taxes
• Policy making– E-voting– E-rules
• Our work focuses on statistical information and agencies as many important decisions by policy makers and citizens depend on statistics
.govPreliminary Work
1996-2000• Human needs
– Interviews (agencies, public)– Transaction log analysis– Email content analysis
• System development and testing– Novel interfaces– Information architecture– Usability studies
.govFocus on Tables
1998-2000• Table browser
– Java applet – DTD for tables (DC and DDI influence)– XML protocol– Mapping metadata elements to interface
control mechanisms– Piping data from large databases to applet– User studies
• Metadata to aid understanding
.govStatistical Knowledge Network
2003-2006• Create SKN prototype with agency partners• Integration
– Horizontal integration across federal agencies (BLS, EIA, NCHS, Census, SSA, NASS)
– Vertical integration from local/state• Focus on non-specialists
– Help crucial– Metadata drives help
• User interfaces are the intermediaries to link people and data
• Find what you need, understand what you find
.govData Flow
agency data with integrated
metadata
agency with multiple metadata
repositories
agency backend data and metadata
agency backend data and metadata
Distributed Public Intermediary:
variable/concept level, XML-based incorporating
ISO 11179 and DDI providing java-based
statistical literacy tools to user interfaces
Statistical Ontology
firewall
Domain ExpertsEnd User
Communities
Domain Ontologies
I n
t e r
f a
c e
sU
s e
r
end user
end user
end user
end users: interactwith data frominformation/conceptperspective, notagency perspective
membrane
end user
enduser
enduser
enduser
.gov
Statistical Knowledge Network Architecture
Agencies
SKN Registry
ActionsContribute
FindDisplay
Annotate UnderstandManipulate Collaborate
…..
………….
ObjectsActions
Private Work Space
ObjectsActions
Private Work Space
ObjectsActions
Private Work Space
Ontology Rules & Constraints
SKN Consortium
…..
.gov
Objects Reports metadataTables metadataPeople metadata
GlossaryAnnotations
.gov
Interface Prototypes:Find, Display, Understand; Leverage
Metadata, Glossary, Ontology
• Relation Browser
• Mulitlayered help: treemaps, video help
• Animated Glossary
• Contextualizer
• PairTrees
• Spatial audio for maps
• Missing Data
.govUse Case Scenarios to Guide
Design• Based on discussions with agency
partners
• 20 scenarios
• 4 detailed with in depth resources located
• Used to ground ongoing work
.govMulti-layered interfaces
1 level 3 levels of growing complexity
map+table
map+table +filters
map+table +filters +scatterplot
map+table +filters +scatterplot
.gov
Script Guidelines• Base the script on a live demonstration (never on a written
description) – Focus on tasks
(not tours of widgets or conceptual overviews) – Act out the interaction (with minimum description) then describe results
in context of task– Start with a tour of main screen components
(orient and introduce vocabulary) 5-10 sec. max – Plan a linear sequences made of
very short autonomous chunks (15-60 sec.) • Map the chunks to existing online documentation • Show text title at beginning of each chunk • Carefully synchronize voice and visual (hard when alone) • Provide duration and file size for individual chunk
.govInteractive Glossary Development
Tools• Provide foundation for content
development
• Separate content development from presentation development
• Reduce overall development time
• Maximize reuse of existing elements
• Create multiple presentations from a single content development effort
.govContent
FoundationTemplate(SIG)
Questioninitial
motivation
Answeroverview, definition
Processexplanation,
equationExample
Resultstatistic, answer
Reviewsummary,
interpretation
.gov
Animation Template
• Consistent display and interaction for all animations
• Presents animation and explanatory text simultaneously
• Navigate (forward and back) through animation segments
• Complete review of text at any time
.gov
Animation Template
• Three pieces: text, animations, template
• Text is tagged with content section tags in a separate text file
• Animation consists of segments in individual animation files
• Text and animation segments coordinated by placement in template
.gov
ontology
Semantic level
•Classes
•Relationships
•Constraint rules
DTD/XML Schema Structural level
•Elements
•Attributes
•Datatypes
SKN
• Ontology
• DTD / XML Schema
• Interface Tools
• Statistical Interactive Glossary (SIG)
Ontology Applications
Knowledge organization
Content and terminology control
Data integration
Query support
Automatic classification support
Reasoning mechanism
Others
modeling
implementation
.gov
unit
agedunit
<anlyUnit>aged unit</anlyUnit><universe>married couples living together,with husband or wife aged 65 or older</universe>
age
SSA
household
Domain knowledge
Operational knowledge
estimate
poverty estimate
poverty
benefit
Census Bureau FIFARS
earning
salary
wage
income
family
distribution
.gov
Project DTD
• Investigate DDI and ISO 11179
• Leverage DDI and data cubes
• Markup a set of objects– Tables– Reports/press releases
• Use markup to build added value search (find what you need) and help (understand what you find) support into interfaces
.govThe Basic Structure
entDscr_1:description of an entity within the
marked up document
docDscr: description of the markup-what is being marked-up, who marked it up, etc.
entDscr_2:description of an entity within the
marked up document
varDscr_1: description of each variable within an entity, study group or document
stdygrpDscr: describes the “group” to which an entity or document belongs such as a survey program
nCubeDscr: used when entity is an aggregated table
fileDscr: descripes physical file structures for nCubes
varDscr_2: description of each variable within an entity, study group or document
.govOne Example of How the DTD
Helps
The DTD can help bring the “expert knowledge” to the less expert user and bring relevant information together by enabling searching via variables as well as subjects/keywords
.gov
Median income, by age, 2001
<var name="age" dcml="0" intrvl="discrete" aggrMeth="count" measUnit="aged units" scale="x1" origin="0" nature="interval" additivity="" temporal="no" geog="no" geoVocab="" catQnty="4">
<labl source="producer" level="variable">age</labl>
<universe level="variable" clusion="I">persons</universe>
<catgryGrp ID="CG1_1" catgry="C1_1 C1_2 C1_3 C1_4">
<labl source="producer" level="catgryGrp">Age</labl>
</catgryGrp>
<catgry ID="C1_1">
<catValu ID="CV1_1">1</catValu>
<labl source="producer" level="catgry">65-69</labl>
</catgry>
<catgry ID="C1_2">
<catValu ID="CV1_2">2</catValu>
<labl source="producer" level="catgry">70-74</labl>
</catgry>
<catgry ID="C1_3">
<catValu ID="CV1_3">3</catValu>
<labl source="producer" level="catgry">75-79</labl>
</catgry>
<catgry ID="C1_4">
<catValu ID="CV1_4">4</catValu>
<labl source="producer" level="catgry">80 or older</labl>
</catgry>
</var>
.gov
Discovering Metadata
• Hybrid machine learning approach– Crawl website– Create term document matrices– Use k-means clustering with small K to fit on
screen in RB++– Revise
• Use structure in the existing sites to train a classifier
• For small n of concepts, classify site
.gov
Combining Machine Learning and Dynamic InterfacesWhat should these
topics be, and how do we know if we’ve found
the right names for them?
.gov
Combining Machine Learning and Dynamic Interfaces
How do we assign thousands of
documents to their respective topics?
.govInitial, Unstructured Approach
doc
doc
doc
doc
doc
doc
doc
doc
docdoc
doc
doc
docdoc
This approach yielded intuitively coherentclusters. But the clusters fall at too fine a
level of granularity, while also wasting largeportions of the data.
Clustering Based on Word Distributions
.gov
New Approach, Semi-Supervised
docdoc
doc
docdoc
doc
docdoc
doc
docdoc
doc
docdoc
doc
docThis approach capitalizes on the agencies’ effortsand expertise, and so far seems to yield superior
results. However, the amount of training datais very sparse, and the observed categories
have high correlation in some cases.Our current work addresses these
tuning issues.
.gov
StateStatistical
Office
USDA / NASS
State CooperativeAgency (Dept.of Agriculture,etc.)
Farmers & Producers
StatisticalConsumers
Supply data to agencies
Obtain data from agencies
Collection agents
Vertical Integration: Agriculture