Some Thoughts on HPC in Natural Language Engineering

18
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania

description

Some Thoughts on HPC in Natural Language Engineering. Steven Bird University of Melbourne & University of Pennsylvania. Sponsorship. Natural Language Engineering: Integrating Parallel and Parametric Processing Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003. - PowerPoint PPT Presentation

Transcript of Some Thoughts on HPC in Natural Language Engineering

Page 1: Some Thoughts on HPC in Natural Language Engineering

Some Thoughts on HPC inNatural Language EngineeringSteven Bird

University of Melbourne &

University of Pennsylvania

Page 2: Some Thoughts on HPC in Natural Language Engineering

Sponsorship

Natural Language Engineering: Integrating Parallel and Parametric Processing

Victorian Partnership for Advanced Computing Expertise Grant EPPNME092.2003

Page 3: Some Thoughts on HPC in Natural Language Engineering

NLE Application Areas

Information Extraction Information Retrieval Authoring Tools Language Analysis Language Understanding Knowledge Representation Knowledge Discovery

Spoken Language Input Written Language Input Natural Language Generation Spoken Output Multilinguality Multimodality Discourse and Dialogue

Spoken dialogue systems Cross-language information retrieval Word-sense disambiguation Multi-document summarisation Natural language database interfaces

Page 4: Some Thoughts on HPC in Natural Language Engineering

Some NLE Applications in detail Information extraction from broadcast news

Tokenization, alignment, entity detection, coreference resolution, semantic mapping

Spoken language dialogue systems (SLDS) Speech recognition, parsing, user modelling, discourse

management, generation, synthesis Language analysis

Interlinear text annotation, lexicon development, morphosyntactic grammar development

Page 5: Some Thoughts on HPC in Natural Language Engineering

Meta Activities

Discovery What tools work with data in format X? What lexical resources exist for language Y?

Reuse Diverse implementation frameworks Component integration, wrapping, etc

Training and evaluation Parametric and parallel processing Comparing systems running on the same data Gold standard vs theory comparison Analyzing interaction logs

Page 6: Some Thoughts on HPC in Natural Language Engineering

Learn about NLE

This department hosts a mirror of the ACL digital anthology

50k pages, 40 years http://www.cs.mu.oz.au/acl/

Page 7: Some Thoughts on HPC in Natural Language Engineering

SLDS Architecture

Page 8: Some Thoughts on HPC in Natural Language Engineering

SLDS Components

Page 9: Some Thoughts on HPC in Natural Language Engineering

Another SLDS Architecture

Page 10: Some Thoughts on HPC in Natural Language Engineering

Observations Common components, different arrangements

Multiple components for doing the same task Most NLE components convert between

information types Parser: from strings to trees ASR: from speech to text Summariser: from text to selected text

But: Many processes benefit from other information sources

(e.g. exploiting intonation in input) Input and output can be aligned Solution: multilayer annotations

Page 11: Some Thoughts on HPC in Natural Language Engineering

Multilayer annotations

Page 12: Some Thoughts on HPC in Natural Language Engineering

Multilayer Annotations

Page 13: Some Thoughts on HPC in Natural Language Engineering

Annotation Graphs

Labelled digraphs with timestamped nodes

Page 14: Some Thoughts on HPC in Natural Language Engineering

Annotation Graphs: complex example

AGTK: Annotation Graph Toolkit library, applications agtk.sourceforge.net

Page 15: Some Thoughts on HPC in Natural Language Engineering

NLE and Grids

NLE Applications typically constructed out of numerous components each component responsible for a specialised task executed against large data sets

To use grids in NLE: subscribe to a model which allows automated discovery of

data and components flexible design of applications, coordination of execution,

storage of results Ideally:

view grid as a commodity, hidden from application developers

Page 16: Some Thoughts on HPC in Natural Language Engineering

Architectural Components

Data Language resources for analysis E.g. Switchboard, 2400 annotated telephone conversations (26 CDs)

Software Components minimal individual functional units

e.g. Annotation Server, Alignment, ASR, Data Source Packaging, Format Conversion, Text Annotation, Lexicon Server, Semantic Mapping

common interface specification Metadata Repositories

Dublin Core Application Profile for NLE resources Application

data + components + processing instructions declarative specification in XML

Grid Service computational and storage resources for application execution

Page 17: Some Thoughts on HPC in Natural Language Engineering

Architecture

Page 18: Some Thoughts on HPC in Natural Language Engineering

Conclusion

Natural Language Engineering interesting test case for grid services many mature component technologies applications that are both data and processor

intensive applications for building the multilingual

information society of the future...