Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of...

Presentation AgendaIntroduction

NSF Project Overview

Current State Of The Art

Our Understanding Of Your Requirements

Design

Implementation / Demo

Progress

Questions?

eRulemakingCS501 Presentation 1

The Workgroup

Who We Are• Sam Phillips

– MEng in CS

• Dan Rassi– Junior in CS

• Michael Wang– MEng in CS

• Krzysztof Findeisen– Senior in Astro and CS

• Raymond McGill– Senior in IS

Federal Rulemaking• Executive agencies issue over 40001 regulations

per year– Preliminary regulations published daily as Notices of

Proposed Rulemaking (NPRMs)– Public can submit feedback on NPRMs– Usually ~100, up to 500,000 comments per regulation

1C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference on Digital Government Research, San Diego, 2006.

Rules and Comments• Rules tend to be long and address several “issue topics”.

– Well organized, “written like laws”

• Comments vary in type significantly– From individuals in organizations (e.g. Sierra Club, NRA)– From professionals (e.g. lawyers / lobbyists / domain experts)– From potential stakeholders (beneficiaries, those potentially

hurt)– General public

• Comments may address none to several of the “issue topics”

Putting the “e” in “eRulemaking”• Federal directive to read

and consider all comments

• Currently comments are read and sorted by hand

• For controversial issues, this is a lot of work!

• Natural Language Processors (NLPs) can be used to classify comments

• NLP software is “trained” through annotation of a subset of comments

Ideally the system can be automated

The Project• The Legal Information Institute (LII) is working

on automating the sorting process

Our propose [sic] to apply and develop a range of methods from the field of natural language processing (NLP) to create NLP tools to aid agency rule writers in:– organization, analysis, and management of the sometimes

overwhelming volume of comments, studies, and other supporting documents associated with a proposed rule; and

– analyzing proposed rules to flag possibly relevant mandates from the large number of statutes and Executive Orders that require studies, consultations, or certifications during rulemaking.

C. Cardie, C. Farina, & T. Bruce. Using Natural Language Processing to Improve eRulemaking. In Proceedings of 2006 International Conference

on Digital Government Research, San Diego, 2006.

The StakeholdersRulemakers / NSF

Cornell eRulemakingGroup

LII Annotators Our Group

Natural LanguageProcessor Group

Other Universities

Rulemakers / NSF

Cornell eRulemakingInitiative

LII Annotators Our Group

Natural LanguageProcessor Group

Other Universities

Related Projects• Carnegie Mellon is working on a set of analysis

tools2

– Comment statistics• Redundancy• Stakeholder phrases

– Correlations between issues• Unknown interest groups

• University of Pittsburgh and University of Southern California are also working on eRulemaking.

2J. Callan, R. Krishnan, & P. Suen. CMU eRulemaking Project Description. http://erulemaking.cs.cmu.edu/.

Current Analyst Workflow• Analysts receive comments by e-mail• They filter comments for useful statements• Build an issues-(comment summary) matrix as they read

comments• Categorize type of commenter• Organize by section of regulation• Combine massive charts, discuss, analyze• If rule is adopted, analysts publish statement on how they

addressed the comments3


Analyst Flowchart


Current LII Annotator Workflow• Annotators have set of ~300 comments from Department

of Transportation• Annotators agree a priori on set of issues• Issue set relatively large (38)• Annotators identify phrases in each comment with one or

more issues (this is annotating)– Multiple annotators per comment for research purposes

• Early annotating picks up overlooked issues – Tom Bruce updates issue set

• Annotated comments delivered to the NLP group

Callisto Demo• Callisto is the software LII annotators use to

annotate

• Callisto is published by MITRE, Inc.

• Although it works, it is not well-suited for eRulemaking

Term Dictionary• Rule / Reg.: Proposed rule by a federal agency

• Rulemaker / Analyst: Domain expert in agency

• Issue: A logical facet which the Rule impacts.

• Annotate / Tag (v): To “highlight” text and associate it with a specific issue.

• Tag (n): The implementation of a tag as metadata

• Flag (n): Non-issue related metadata (e.g. workflow)

Requirements• Our understanding of your immediate

requirements is:– The system is accessible to any reasonable client

system– The system can display several hundred annotated or

NLP-processed comments and indicate how each comment is classified

– The system must be extensible, so that the LII can continue working towards a production system

– The system can display the annotations associated with each comment

– The system allows users to add or modify annotations

Requirements• Our understanding of your optional requirements

is:– The system can feed comments with changed

annotations into the NLP– The system allows users (or a subset thereof) to

change the set of issues associated with a regulation (grow/collapse)

– The system allows comments to have flags not directly related to issues

– The system can handle large numbers of regulations (thousands) and comments per regulation (tens or hundreds of thousands)

Requirements• Our understanding of your long-term

requirements is:– The system supports hierarchies of issues– The system blends into the federal department’s

workflow– The system must be easy to set up and install

Assumptions• Government agencies work roughly as

summarized as the transcripts provided.

• When government agencies adopt annotation, they will do so similarly to the LII

• LII prefers a solid but feature-sparse prototype to feature-laden but not as easily extensible version

• LII prefers a system designed for Rulemakers first with “research” interests as a secondary concern.

Design / Implementation• Based on your requirements, we have selected the

iterative design process.– Several iterations at whole project– Implement mock-ups to help clarify

• Many stake-holders– Full requirements unknown– Underspecified UI very important

• Prototype System– Desires may change is practical issues crop up

Design• Our system design is a standard relational database-

driven website– Lots of implementation software available

– “Drag and drop” content modules

– Minimal retraining of team members

– Natural “three tier” architecture

– Front-end / Middleware / Backend can be replaced independently

– Simple cross-platform compatibility because of web interface

Main Module

View info

User model

Is admin Admin model Log offLog in End ?

N

Y Y

N

Start End

Registration model

Registration Module

Registrationmodule start

Registrationmodule end

Agree ? Info satisfied ?Y

N

N

Y Update databaseInput user info

Licenseagreement

Admin Module

Editannouncement

Collapse tags

Add new tags

End ? Y

N

Admin modulestart

Admin module end

User Module

Choose regulationChange

regulation ?

Is annotationmodel?

Annotation modelView model

Choose annotationor view model

Y

Y

End ?

N

N

Y

N

User module start

User module end

View Module

Filter by tags orflags

Read matchingcomments or

setionsYEnd ?

N

View module start View module end

Annotation Module

Add tag or flags

Edit tag or flags

Remove tag orflags

Submit to NLP forlearning

End ? Y

N

Annotation modulestart

Annotation moduleend

Implementation• The website will be written using the Drupal

content management system– Designed to produce dynamic websites with minimal

administration– LII was already considering Drupal for another project

• The database will be running on a mySQL system already present on LII servers– No installation required– Most content management systems require an SQL-

based relational database

UI Alpha 0.1 Demo• First Release Alpha Website

ID Task Name Duration Start Finish

1 Interface Requirements 2 days? Fri 2/16/07 Sat 2/17/07

2 Web Concept Sketches 3 days? Sat 2/17/07 Mon 2/19/07

3 Pick Web Management System 1 day? Tue 2/20/07 Tue 2/20/07

4 Get Sample Data 5 days? Sat 2/17/07 Wed 2/21/07

5 Learn About Annotating 2 days? Sun 2/25/07 Mon 2/26/07

6 Refine and Select Web Layout 3 days? Tue 2/27/07 Thu 3/1/07

7 Install Web Manager 1 day? Sun 2/25/07 Sun 2/25/07

8 Install CVS System 7 days? Fri 2/16/07 Thu 2/22/07

9 Dummy Website 4 days? Wed 2/28/07 Sat 3/3/07

10 Refine Website 4 days? Sat 3/3/07 Tue 3/6/07

11 Write Back-End Documentation 8 days? Fri 3/2/07 Fri 3/9/07

12 1st Stage Presentation 4 days? Fri 3/2/07 Mon 3/5/07

13 Presentation and Report 1 day Tue 3/6/07 Tue 3/6/07

14 Website Feedback 3 days? Wed 3/7/07 Fri 3/9/07

15 Install DBMS 7 days? Wed 2/21/07 Tue 2/27/07

16 Learn About NLP 16 days? Sun 2/25/07 Mon 3/12/07

17 Design Database 7 days Tue 3/6/07 Mon 3/12/07

18 Implement Database 5 days? Tue 3/13/07 Sat 3/17/07

19 Design Middle Tier 7 days Tue 3/13/07 Mon 3/19/07

20 Implement Middle Tier 7 days Mon 3/19/07 Sun 3/25/07

21 Refine Middle Tier 7 days Mon 3/26/07 Sun 4/1/07

22 Write Manual 7 days Sat 3/31/07 Fri 4/6/07

23 Write Back-End Documentation 4 days Tue 4/3/07 Fri 4/6/07

24 2nd Stage Presentation 7 days Tue 3/27/07 Mon 4/2/07

25 Presentation and Report 1 day Tue 4/3/07 Tue 4/3/07

26 Major Review 4 days Tue 4/3/07 Fri 4/6/07

27 Design Annotation Interface 30 days? Sat 3/10/07 Sun 4/8/07

28 Implement Annotation Interface 3 days Sun 4/8/07 Tue 4/10/07

3/6

4/3

17 21 25 1 5 9 13 17 21 25 29 2 6 10 14 18 22 26 30 4Feb 15, '08 Mar 1, '08 Mar 15, '08 Mar 29, '08 Apr 12, '08 Apr 26, '08

Questions?• Any Questions?

Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of...

Documents

Transcript of Presentation Agenda Introduction NSF Project Overview Current State Of The Art Our Understanding Of...