Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of...
Transcript of Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of...
Presentation AgendaIntroduction
NSF Project Overview
Current State Of The Art
Our Understanding Of Your Requirements
Design
Implementation / Demo
Progress
Questions?
eRulemakingCS501 Presentation 1
The Workgroup
Who We Are• Sam Phillips
– MEng in CS
• Dan Rassi– Junior in CS
• Michael Wang– MEng in CS
• Krzysztof Findeisen– Senior in Astro and CS
• Raymond McGill– Senior in IS
Federal Rulemaking• Executive agencies issue over 40001 regulations
per year– Preliminary regulations published daily as Notices of
Proposed Rulemaking (NPRMs)– Public can submit feedback on NPRMs– Usually ~100, up to 500,000 comments per regulation
1C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.
Rules and Comments• Rules tend to be long and address several “issue topics”.
– Well organized, “written like laws”
• Comments vary in type significantly– From individuals in organizations (e.g. Sierra Club, NRA)– From professionals (e.g. lawyers / lobbyists / domain experts)– From potential stakeholders (beneficiaries, those potentially
hurt)– General public
• Comments may address none to several of the “issue topics”
Putting the “e” in “eRulemaking”• Federal directive to read
and consider all comments
• Currently comments are read and sorted by hand
• For controversial issues, this is a lot of work!
• Natural Language Processors (NLPs) can be used to classify comments
• NLP software is “trained” through annotation of a subset of comments
Ideally the system can be automated
The Project• The Legal Information Institute (LII) is working
on automating the sorting process
Our propose [sic] to apply and develop a range of methods from the field of natural language processing (NLP) to create NLP tools to aid agency rule writers in:– organization, analysis, and management of the sometimes
overwhelming volume of comments, studies, and other supporting documents associated with a proposed rule; and
– analyzing proposed rules to flag possibly relevant mandates from the large number of statutes and Executive Orders that require studies, consultations, or certifications during rulemaking.
C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference
on Digital Government Research, San Diego, 2006.
The StakeholdersRulemakers / NSF
Cornell eRulemakingGroup
LII Annotators Our Group
Natural LanguageProcessor Group
Other Universities
Rulemakers / NSF
Cornell eRulemakingInitiative
LII Annotators Our Group
Natural LanguageProcessor Group
Other Universities
Related Projects• Carnegie Mellon is working on a set of analysis
tools2
– Comment statistics• Redundancy• Stakeholder phrases
– Correlations between issues• Unknown interest groups
• University of Pittsburgh and University of Southern California are also working on eRulemaking.
2J. Callan, R. Krishnan, & P. Suen. CMU eRulemaking Project Description. http://erulemaking.cs.cmu.edu/.
Current Analyst Workflow• Analysts receive comments by e-mail• They filter comments for useful statements• Build an issues-(comment summary) matrix as they read
comments• Categorize type of commenter• Organize by section of regulation• Combine massive charts, discuss, analyze• If rule is adopted, analysts publish statement on how they
addressed the comments3
3C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.
Analyst Flowchart
3C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.
Current LII Annotator Workflow• Annotators have set of ~300 comments from Department
of Transportation• Annotators agree a priori on set of issues• Issue set relatively large (38)• Annotators identify phrases in each comment with one or
more issues (this is annotating)– Multiple annotators per comment for research purposes
• Early annotating picks up overlooked issues – Tom Bruce updates issue set
• Annotated comments delivered to the NLP group
Callisto Demo• Callisto is the software LII annotators use to
annotate
• Callisto is published by MITRE, Inc.
• Although it works, it is not well-suited for eRulemaking
Term Dictionary• Rule / Reg.: Proposed rule by a federal agency
• Rulemaker / Analyst: Domain expert in agency
• Issue: A logical facet which the Rule impacts.
• Annotate / Tag (v): To “highlight” text and associate it with a specific issue.
• Tag (n): The implementation of a tag as metadata
• Flag (n): Non-issue related metadata (e.g. workflow)
Requirements• Our understanding of your immediate
requirements is:– The system is accessible to any reasonable client
system– The system can display several hundred annotated or
NLP-processed comments and indicate how each comment is classified
– The system must be extensible, so that the LII can continue working towards a production system
– The system can display the annotations associated with each comment
– The system allows users to add or modify annotations
Requirements• Our understanding of your optional requirements
is:– The system can feed comments with changed
annotations into the NLP– The system allows users (or a subset thereof) to
change the set of issues associated with a regulation (grow/collapse)
– The system allows comments to have flags not directly related to issues
– The system can handle large numbers of regulations (thousands) and comments per regulation (tens or hundreds of thousands)
Requirements• Our understanding of your long-term
requirements is:– The system supports hierarchies of issues– The system blends into the federal department’s
workflow– The system must be easy to set up and install
Assumptions• Government agencies work roughly as
summarized as the transcripts provided.
• When government agencies adopt annotation, they will do so similarly to the LII
• LII prefers a solid but feature-sparse prototype to feature-laden but not as easily extensible version
• LII prefers a system designed for Rulemakers first with “research” interests as a secondary concern.
Design / Implementation• Based on your requirements, we have selected the
iterative design process.– Several iterations at whole project– Implement mock-ups to help clarify
• Many stake-holders– Full requirements unknown– Underspecified UI very important
• Prototype System– Desires may change is practical issues crop up
Design• Our system design is a standard relational database-
driven website– Lots of implementation software available
– “Drag and drop” content modules
– Minimal retraining of team members
– Natural “three tier” architecture
– Front-end / Middleware / Backend can be replaced independently
– Simple cross-platform compatibility because of web interface
Main Module
View info
User model
Is admin Admin model Log offLog in End ?
N
Y Y
N
Start End
Registration model
Registration Module
Registrationmodule start
Registrationmodule end
Agree ? Info satisfied ?Y
N
N
Y Update databaseInput user info
Licenseagreement
Admin Module
Editannouncement
Collapse tags
Add new tags
End ? Y
N
Admin modulestart
Admin module end
User Module
Choose regulationChange
regulation ?
Is annotationmodel?
Annotation modelView model
Choose annotationor view model
Y
Y
End ?
N
N
Y
N
User module start
User module end
View Module
Filter by tags orflags
Read matchingcomments or
setionsYEnd ?
N
View module start View module end
Annotation Module
Add tag or flags
Edit tag or flags
Remove tag orflags
Submit to NLP forlearning
End ? Y
N
Annotation modulestart
Annotation moduleend
Implementation• The website will be written using the Drupal
content management system– Designed to produce dynamic websites with minimal
administration– LII was already considering Drupal for another project
• The database will be running on a mySQL system already present on LII servers– No installation required– Most content management systems require an SQL-
based relational database
UI Alpha 0.1 Demo• First Release Alpha Website
ID Task Name Duration Start Finish
1 Interface Requirements 2 days? Fri 2/16/07 Sat 2/17/07
2 Web Concept Sketches 3 days? Sat 2/17/07 Mon 2/19/07
3 Pick Web Management System 1 day? Tue 2/20/07 Tue 2/20/07
4 Get Sample Data 5 days? Sat 2/17/07 Wed 2/21/07
5 Learn About Annotating 2 days? Sun 2/25/07 Mon 2/26/07
6 Refine and Select Web Layout 3 days? Tue 2/27/07 Thu 3/1/07
7 Install Web Manager 1 day? Sun 2/25/07 Sun 2/25/07
8 Install CVS System 7 days? Fri 2/16/07 Thu 2/22/07
9 Dummy Website 4 days? Wed 2/28/07 Sat 3/3/07
10 Refine Website 4 days? Sat 3/3/07 Tue 3/6/07
11 Write Back-End Documentation 8 days? Fri 3/2/07 Fri 3/9/07
12 1st Stage Presentation 4 days? Fri 3/2/07 Mon 3/5/07
13 Presentation and Report 1 day Tue 3/6/07 Tue 3/6/07
14 Website Feedback 3 days? Wed 3/7/07 Fri 3/9/07
15 Install DBMS 7 days? Wed 2/21/07 Tue 2/27/07
16 Learn About NLP 16 days? Sun 2/25/07 Mon 3/12/07
17 Design Database 7 days Tue 3/6/07 Mon 3/12/07
18 Implement Database 5 days? Tue 3/13/07 Sat 3/17/07
19 Design Middle Tier 7 days Tue 3/13/07 Mon 3/19/07
20 Implement Middle Tier 7 days Mon 3/19/07 Sun 3/25/07
21 Refine Middle Tier 7 days Mon 3/26/07 Sun 4/1/07
22 Write Manual 7 days Sat 3/31/07 Fri 4/6/07
23 Write Back-End Documentation 4 days Tue 4/3/07 Fri 4/6/07
24 2nd Stage Presentation 7 days Tue 3/27/07 Mon 4/2/07
25 Presentation and Report 1 day Tue 4/3/07 Tue 4/3/07
26 Major Review 4 days Tue 4/3/07 Fri 4/6/07
27 Design Annotation Interface 30 days? Sat 3/10/07 Sun 4/8/07
28 Implement Annotation Interface 3 days Sun 4/8/07 Tue 4/10/07
3/6
4/3
17 21 25 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 4Feb 15, '08 Mar 1, '08 Mar 15, '08 Mar 29, '08 Apr 12, '08 Apr 26, '08
Questions?• Any Questions?