Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language...

16
Cross Language Information Cross Language Information Exploitation of Arabic Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse University

description

Internet Language Statistics

Transcript of Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language...

Page 1: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Cross Language Information Cross Language Information Exploitation of ArabicExploitation of Arabic

Dr. Elizabeth D. Liddy

Center for Natural Language ProcessingSchool of Information Studies

Syracuse University

Page 2: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Why Cross-Language Systems Matter

• There are approximately 4,500 living languages

• 32 million Americans switch from English to another language when they get home from work (U.S. Census 1990)

• Internationally, some who wish to do harm to the US communicate in other languages

• There are too few intel analysts who know the languages of interest

Page 3: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Internet Language Statistics

http://global-reach.biz/globstats/

Page 4: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Internet Language Statistics (2)

http://www.glreach.com/globstats/evol.html

Page 5: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

How Cross-Language Retrieval Works

• User who speaks just one language asks their question of the system in that language

• Cross-language retrieval system: • Will have indexed documents (e.g. foreign reports,

emails, message traffic) written in other languages• Translates user query into language of the documents• Matches translated query against document index• Produces a ranked list of relevant documents that are

automatically translated into user’s language• User then reads documents in their own

language• User can now make more fully informed

decisions

Page 6: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

SU’s Cross-Language Retrieval Research

• Have produced systems for French, Spanish, Japanese

– DARPA, Intel, & corporate funding of $3.5 million • Currently working in Dutch and Chinese

– 2nd demo is of cross-language English-Chinese on a patent database from China

• Today’s funding announcement will enable us to specialize our current cross-language retrieval capabilities for Arabic

• Future work on information extraction and visualization in Arabic is of keen interest

Page 7: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

1. LIVIA – English/English IR System

• Accepts users’ natural language expressions of complex information needs

• Provides precise retrieval against government compiled documents about terrorist activities

• Core technology funded by DARPA and Syracuse Research Corporation

• Demo’d by Ozgur Yilmazel

Page 8: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 9: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 10: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

2. English-Chinese Retrieval Demo

• Cross – Language Retrieval of English queries against a Chinese patent database– Development funded by Unilever Corp, a

multinational corporation which owns 140 companies in more than 100 countries

• Jiang Ping Chen– PhD student in School of Information Studies

Page 11: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Look to the Future

• Incorporate the next level of sophistication in Information Exploitation into Arabic

• Here seen in English– Adds Information Extraction as next step to

Information Retrieval

• Seek your ongoing support for its extension into Arabic

Page 12: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 13: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 14: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 15: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.
Page 16: Cross Language Information Exploitation of Arabic Dr. Elizabeth D. Liddy Center for Natural Language Processing School of Information Studies Syracuse.

Thank You!

Questions?

Care to try a query on LIVIA!