Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language...
-
Upload
earl-wheeler -
Category
Documents
-
view
219 -
download
0
description
Transcript of Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language...
Cross Language Information Cross Language Information Exploitation of ArabicExploitation of Arabic
Dr. Elizabeth D. Liddy
Center for Natural Language ProcessingSchool of Information Studies
Syracuse University
Why Cross-Language Systems Matter
• There are approximately 4,500 living languages
• 32 million Americans switch from English to another language when they get home from work (U.S. Census 1990)
• Internationally, some who wish to do harm to the US communicate in other languages
• There are too few intel analysts who know the languages of interest
Internet Language Statistics
http://global-reach.biz/globstats/
Internet Language Statistics (2)
http://www.glreach.com/globstats/evol.html
How Cross-Language Retrieval Works
• User who speaks just one language asks their question of the system in that language
• Cross-language retrieval system: • Will have indexed documents (e.g. foreign reports,
emails, message traffic) written in other languages• Translates user query into language of the documents• Matches translated query against document index• Produces a ranked list of relevant documents that are
automatically translated into user’s language• User then reads documents in their own
language• User can now make more fully informed
decisions
SU’s Cross-Language Retrieval Research
• Have produced systems for French, Spanish, Japanese
– DARPA, Intel, & corporate funding of $3.5 million • Currently working in Dutch and Chinese
– 2nd demo is of cross-language English-Chinese on a patent database from China
• Today’s funding announcement will enable us to specialize our current cross-language retrieval capabilities for Arabic
• Future work on information extraction and visualization in Arabic is of keen interest
1. LIVIA – English/English IR System
• Accepts users’ natural language expressions of complex information needs
• Provides precise retrieval against government compiled documents about terrorist activities
• Core technology funded by DARPA and Syracuse Research Corporation
• Demo’d by Ozgur Yilmazel
2. English-Chinese Retrieval Demo
• Cross – Language Retrieval of English queries against a Chinese patent database– Development funded by Unilever Corp, a
multinational corporation which owns 140 companies in more than 100 countries
• Jiang Ping Chen– PhD student in School of Information Studies
Look to the Future
• Incorporate the next level of sophistication in Information Exploitation into Arabic
• Here seen in English– Adds Information Extraction as next step to
Information Retrieval
• Seek your ongoing support for its extension into Arabic
Thank You!
Questions?
Care to try a query on LIVIA!