Faceted browsing for ACL Anthology Praveen Bysani.

12
Faceted browsing for ACL Anthology Praveen Bysani

Transcript of Faceted browsing for ACL Anthology Praveen Bysani.

Faceted browsing for ACL Anthology

Praveen Bysani

ACL Anthology

• a digital archive of research papers in CL and NLP

• contains over 20,100 papers

• free of cost

• archive for sister conferences and journals

Current browser

• direct and navigational search

• hard to navigate

• non-customized search

• non-sortable results

Faceted browsing

• Combination of navigational and direct search paradigms

• Facets are properties of information elements

• Access to organized information

• Ability to explore the collection in multiple dimensions through filters

Faceted Browsing

• RoR + Blacklight plugin

• Apache Solr

• Metadata from XML

• Blacklight customization for XML

Show view

Index View

More cookies..

• User Feedback• Comment/ Share / Like • Suggestions for correcting the meta data

• Ability to export bib in six formats

• Author pages• List of publications• Co-authors

• Third-party annotations• Automatically annotate articles with new metadata• Anthology as a corpus • API to make anthology an object of study

• OAI compatible• allows metadata harvesting

• @ http://aclanthology.heroku.com/

Challenges

• Normalizing the quality of anthology meta data information

• SIG Information• yaml files• no identifiers provided

• DOI• from acm• changes in names of papers, authors

Similar works

ACL Author Network

• bibliometrics

ACL Search Bench

• Semantic search

Plans for the future• A common data schema to integrate all

• Indexing the whole text data

• Range queries for year facet

• Exporting total volume bibliography

• Enriching author pages