Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

75
Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki

Transcript of Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Page 1: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language technology in Africa:

Prospects

Arvi HurskainenUniversity of Helsinki

Page 2: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Why LT for African languages?

• LT is currently considered a necessary field of development in most languages.

• Why should African languages be neglected?

Page 3: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Current state

• Compared with other continents, LT in Africa takes its first steps.

Page 4: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Current state

• The latest issue of MultiLingual, a periodical with 15,000 subscribers, was supposed to concentrate on LT in Africa.

• The only article discussing genuine LT was the one describing Swahili Language Manager (SALAMA)

• Another article on Africa was written by a freelancer on public domain localization in South Africa.

• That was all for Africa.

Page 5: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

• In LT the gap between well-resourced and poorly resourced languages is bigger than in any other field.

• My impression is that even today half of global investments on LT goes to English.

Page 6: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

• African languages are triply handicapped:– Commercial sector not interested– Local governments poor – little or

no public support– African languages have features

that need different approaches than those used in main-stream LT

Page 7: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language technology (LT)

• Labour-intensive– Trivial results quickly– Useful results require several

man-years of work

• Although the development of LT is expensive, the results can be very rewarding

Page 8: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language technology (LT)

• LT built on a modular basis can result in several kinds of applications– An additional application can

make use of earlier modules and thus costs can be reduced

• Once developed, LT applications can be widely distributed with minimal cost

Page 9: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language technology (LT)

• Experience of LT in other languages available– Wrong tracks can be avoided– Solutions applied in other

languages can be tested in African languages

Page 10: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language technology (LT)

• LT of African languages NOT mere application from other languages

• African languages have special features– Very rich morphology– Noun classes– Complex verb formation– Serial verbs– Non-concatenative processes– Reduplication– Inflecting idioms and other multi-word

expressions– Tones

• Lexical• Grammatical

Page 11: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Feasibility of LT in Africa

• Question: If African languages have several special features regarded as ’problems’, is it feasible to develop language technology for those languages?

• Answer: Some ‘problems’ can be turned into advantages

Page 12: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Rich morphology

• Requires efficient development environment to succeed, but

• Can be very useful in disambiguation (= choice of correct interpretation) and syntactic analysis.

Page 13: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Poor morphology vs. rich morphology

• Poor morphology (e.g. English)– Easy to analyze morphologically, but– Difficult to disambiguate and analyze

syntactically and semantically

• Rich morphology (e.g. Bantu languages)– Difficult to analyze morphologically, but– Less difficult to disambiguate and

analyze syntactically

Page 14: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

LT applications

• Applications for end-users: – spelling correctors– hyphenators– grammar checkers– thesauri– electronic dictionaries– MT applications– multilingual speech applications

Page 15: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

LT applications

• Applications for developers:– dictionary compilers – dictionary evaluators– MT development environments– information retrieval and data

mining

Page 16: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Machine Translation (MT)

• Text-to-text MT– Official texts (government, AU, UN,

SADDEC, business, manuals, teaching)– News texts– Communication through email in

international organizations

• Speech-to-speech MT– Simultaneous interpretation– Multilingual phone calls

Page 17: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Phases of speech-to-speech MT

1. Speech recognition• Transforming speech signal to text

2. Tokenization of text• Identifying ‘words’, punctuation marks,

diacritics etc.

3. Morphological analysis• Analyzing each morphological unit and

providing it with codes (tags)

4. Morphological disambiguation• Determining correct interpretation

Page 18: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Phases of speech-to-speech MT

5. Syntactic mapping• Providing words with syntactic tags

6. Semantic disambiguation• Choosing the correct semantic meaning

7. Multi-word units• Isolating multi-word expressions and giving

correct interpretation• Idioms• Proverbs• Adjectival expressions• Compound nouns• Serial verb constructions

Page 19: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Phases of speech-to-speech MT

8. Managing word order• Re-ordering word sequences to meet

the rules of the target language• Inclusion and exclusion of pronouns

and articles

9. Producing surface forms of target language

10. Clean text in target language

11. Text-to-speech conversion

Page 20: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

1. Tokenization

*mtualiyepatataarifaalipigasimu,kukaanakungoja

Page 21: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

2. Morphological analysis*mtu

"mtu" N CAP 1/2-SG { the } { man } aliyepata

"pata" V 1/2-SG3-SP VFIN { he/she } PAST 1/2-SG-REL { who } z [pata] { get } SVO

taarifa"taarifa" N 9/10-SG { the } { report } AR "taarifa" N 9/10-PL { the } { report } AR

alipiga"piga" V 1/2-SG3-SP VFIN { he/she } PAST z [piga] { hit } SVO ACT "piga" V 1/2-SG3-SP VFIN { he/she } PR:a 5/6-SG-OBJ OBJ { it } z [piga] { hit } SVO ACT

simu"simu" N 9/10-SG { the } { telephone } "simu" N 9/10-SG { the } { type of sardine or sprat } AN "simu" N 9/10-PL { the } { telephone } "simu" N 9/10-PL { the } { type of sardine or sprat } AN

,"," COMMA { , }

kukaa"kaa" V INF { to } z [kaa] { sit } SV SVO "kaa" V INF NO-TO z [kaa] { sit } SV SVO

na"na" CC { and } "na" AG-PART { by } "na" PREP { with } "na" NA-POSS { of } "na" ADV NOART { past }

kungoja"ngoja" V INF { to } z [ngoja] { wait } SV "ngoja" V INF NO-TO z [ngoja] { wait } SV

Page 22: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

3. Disambiguation, isolating MWE

*mtu"mtu" N 1/2-SG { the } { man } @SUBJ

aliyepata"pata" V 1/2-SG3-SP VFIN { he/she } PAST 1/2-SG-REL { who } z [pata] { get } SVO @FMAINVtr+OBJ>

taarifa"taarifa" N 9/10-SG { the } { report } AR @OBJ

alipiga"piga" V 1/2-SG3-SP VFIN { he/she } PAST z SVO ACT IDIOM-V> @FMAINVtr-OBJ>

simu"simu" <IDIOM { call }

,"," COMMA { , }

kukaa"kaa" V INF { to } z [kaa] { sit } SV SVO @-FMAINV-n"kaa" V INF NO-TO z [kaa] { sit } SV SVO @-FMAINV-n

na"na" CC { and } @CC

kungoja"ngoja" V INF { to } z [ngoja] { wait } SV SVO @-FMAINV-n"ngoja" V INF NO-TO z [ngoja] { wait } SV SVO @-FMAINV-n

Page 23: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

4. Isolating MWE

• ( N 1/2-SG { the } { man } @SUBJ ) ( V 1/2-SG3-SP VFIN { he/she } PAST 1/2-SG-REL { who } z { get } SVO @FMAINVtr+OBJ> ) ( N 9/10-SG { the } { report } @OBJ ) ( V 1/2-SG3-SP VFIN { he/she } PAST z SVO ACT IDIOM-V> @FMAINVtr-OBJ> <IDIOM { call } ) ( COMMA { , } ) ( V INF { to } z { sit } SV SVO @-FMAINV-n ) ( CC { and } @CC ) ( V INF { to } z { wait } SV @-FMAINV-n )

Page 24: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

5. Word-per-line format

( N 1/2-SG { the } { man } @SUBJ ) ( V 1/2-SG3-SP VFIN { he/she } PAST 1/2-SG-REL

{ who } z { get } SVO @FMAINVtr+OBJ> ) ( N 9/10-SG { the } { report } @OBJ ) ( V 1/2-SG3-SP VFIN { he/she } PAST z SVO ACT

IDIOM-V> @FMAINVtr-OBJ> <IDIOM { call } ) ( COMMA { , } ) ( V INF { to } z { sit } SV SVO @-FMAINV-n ) ( CC { and } @CC ) ( V INF { to } z { wait } SV @-FMAINV-n )

Page 25: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

6. Copying info on serial verbs

( N 1/2-SG { the } { man } @SUBJ )( V 1/2-SG3-SP VFIN PAST 1/2-SG-REL

{ who } z { get } SVO @FMAINVtr+OBJ> )( N 9/10-SG { the } { report } @OBJ )( V 1/2-SG3-SP VFIN PAST z SVO ACT IDIOM-

V> @FMAINVtr-OBJ> <IDIOM { call } )( COMMA { , } )( V 1/2-SG3-SP VFIN PAST z { sit } SV SVO

@FMAINV-n )( CC { and } @CC )( V 1/2-SG3-SP VFIN PAST z { wait } SV SVO

@FMAINV-n )

Page 26: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

7. Construct word order

( N 1/2-SG { the } { man } @SUBJ )( V 1/2-SG3-SP VFIN PAST 1/2-SG-REL

{ who } z { get } SVO @FMAINVtr+OBJ> )( N 9/10-SG { the } { report } @OBJ )( V 1/2-SG3-SP VFIN PAST z SVO ACT IDIOM-

V> @FMAINVtr-OBJ> <IDIOM { call } )( COMMA { , } )( V 1/2-SG3-SP VFIN PAST z { sit } SV SVO

@FMAINV-n )( CC { and } @CC )( V 1/2-SG3-SP VFIN PAST z { wait } SV SVO

@FMAINV-n )

Page 27: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

8. Surface form in target language

( N 1/2-SG { the } { man } @SUBJ )( V 1/2-SG3-SP VFIN PAST 1/2-SG-REL { who } z

{ :got } SVO @FMAINVtr+OBJ> )( N 9/10-SG { the } { report } @OBJ )( V 1/2-SG3-SP VFIN PAST z SVO ACT IDIOM-V>

@FMAINVtr-OBJ> <IDIOM { :called } )( COMMA { , } )( V 1/2-SG3-SP VFIN PAST z { :sat } SV SVO @-

FMAINV-n )( CC { and } @CC )( V 1/2-SG3-SP VFIN PAST z { :waited } SV @-

FMAINV-n )

Translation:the man who got the report called, sat and

waited

Page 28: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Organizing the work

• How should the work be organised on the continent of hundreds of languages?

• Prioritising languages– ‘Big’ languages first due to their

strategic importance– Some minor languages may have

special political or scientific importance

Page 29: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Organizing the work

• Scientific infrastructure – Such as ELRA (European Language

Resource Association) and – ELDA (European Language Resource

Distribution Agency)

• Africa needs something similar• An initiative was made in the

LREC2006 conference in Genova to establish such an infrastructure

Page 30: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Organizing the work

• Networking extremely important– Geographical distances between

actors are immense– Ensures efficient communication

and distribution of ideas– Ensures that the best and tested

approaches will become a standard in LT

– Motivates in this tough work

Page 31: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Networking

• A Wikipedia type forum as an information and discussion centre for LT in Africahttp://forums.csc.fi/kitwiki/pilot/

view/KitWiki/Community/AfricanActivities

Page 32: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

KitWiki/Community/AfricanActivities

Organizations, networks and activities related to LT for African languages

Key Areas LT Policy LT Resources • Helsinki Corpus of Swahili Corpus Of Swahili LT Research and Development • SALAMA - Swahili Language Manager SALAMA • Nordic Journal of African Studies NJAS LT Training and Education LT Legislation LT Business Activities Other Activities This topic: KitWiki/Community > WebHome >

AfricanActivities History: r3 - 29 Jul 2006 - 09:03 - ArviHurskainen

Page 33: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

EDULINK initiative

• EU has started in 2006 to support networking between higher education institutions

• EDULINK-ACP-EU Cooperation Programme in Higher Education

Page 34: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

EDULINK initiative

• EDULINK is the first ACP-EU Cooperation Programme in Higher Education

• EDULINK is financed by the European Commission under the 9th EDF and is managed by the ACP Secretariat.

Page 35: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

EDULINK initiative

• EDULINK promotes networking of HEIs in ACP States and the eligible EU Member States through funding of joint projects.

Page 36: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

EDULINK initiative: Language technology for African languages

• Consortium of five universities– Dar-es-Salaam– Nairobi– Ghana– Hawassa (Ethiopia)– Helsinki

• Associates– UNISA, Stellenbosch, SA– Trondheim

Page 37: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

EDULINK initiative: Language technology for African languages

• Aims– Training in LT

• Workshops

• Training courses

• Summer School in LT

• Evaluation

– Developing new LT• Language corpora

• Morphological parsers

• Speech technology

• MT (further development of SALAMA)

Page 38: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Development environments

• Environments with property rights– Can be obtained through licensing for

development purposes– Can also be available with nominal

price, e.g. xfst package of Xerox– Cannot be included into the product

without a separate agreement with the property owner

Page 39: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Development environments

• Open domain environments– Free for development– Free for inclusion into a product

Page 40: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Availability of development environments

• In morphology– xfst package of Xerox using finite

state methods is most popular– Free for development but not free

for inclusion into a product

Page 41: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Availability of development environments

• In disambiguation and syntactic mapping– CG-2 and Functional

Dependency Grammar (FDG) of Connexor

• Only through licensing• Not free for inclusion into a product

Page 42: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Availability of development environments

• In disambiguation and syntactic mapping– CG-3 is an open source product

• Free for developing• Free for inclusion into a product

• http://beta.visl.sdu.dk/constraint_grammar.html

Page 43: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Developing open source technology

• Efforts to move SALAMA to open domain

Page 44: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Comparison of two methods for morphological analysis– Analysis using finite state method

(PR)

and– Analysis using two-phase method

(OS)

Page 45: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Finite state method– Good

• Very fast, 4.500 w/s in SWATWOL• Facilitates description on more than

one level– Two-level description most common

Page 46: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Finite state method– Good

• The use of two-level rules simplifies the structure of the dictionary

• The whole morphology can be described in one phase

• Can be used for simulating linguistic processes (good for research purposes)

Page 47: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Finite state method– Bad

• Difficult in handling non-concatenative processes (does not ‘see behind’)

• Writing a reliable rule system is difficult

• In constructing the lexicon, the influence of the rules must be anticipated

Page 48: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Finite state method– Bad

• Because the lexicon is a tree-structure, the whole language should be described with one single lexicon

• Difficulties in compiling very large lexicons

• No open source platform available

Page 49: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Two-phase method - description– In the first phase, the word is described

using pattern matching rules• Produces meta-tags with two parts

• Example: – unanifundisha

“fundisha” [funda] V uSP naTAM niOBJ ishaVE

– uSP » u = string in the word

» SP = tag meaning subject prefix

Page 50: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Two-phase method– In the second phase, meta-tags

are rewritten as final tags– uSP >

– 1/2-SG2-SP VFIN { you }

– 3/4-SG-SP VFIN { it }

– 11-SG-SP VFIN { it }

Page 51: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Result after the first phase:unanifundisha “fundisha” [funda] V uSP naTAM niOBJ ishaVE

Result after the second phase:unanifundisha

"fundisha" [funda] V 1/2-SG2-SP VFIN { you } PR:na 1/2-SG1-OBJ { me } { teach } CAUS"fundisha" [funda] V 3/4-SG-SP VFIN { it } PR:na 1/2-SG1-OBJ { me } { teach } CAUS"fundisha" [funda] V 11-SG-SP VFIN { it } PR:na 1/2-SG1-OBJ { me } { teach } CAUS

Page 52: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Two-phase method– Good

• No specific development platform needed

• Task divided into two phases – makes the description of each phase more manageable

• No compilation problems, because the system is composed of a number of separate rules, each performing a specific task

Page 53: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Two-phase method– Good

• Optimal order of readings can be controlled – helps in disambiguation,

• No ownership restrictions• The product free for distribution

Page 54: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Two-phase method– Bad

• Requires fairly good programming skills

• Because two-level rules cannot be used for simplifying the lexicon, the lexicon becomes complex

• The absence of state transition (found in fst methods) increases the need for copying word stems in complex word structures, e.g. verbs in Bantu languages

Page 55: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

The complexity of the lexicon can be reduced by allowing some overproduction, which will be removed afterwards with rules that check ungrammatical tag combinations

Page 56: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Example: reciprocal and passive extensions block the object prefix– wananifundishana (ungrammatical)

“fundishana” [funda] V waSP naTAM niOBJ ishanaVE

In post-processing, the string will be removed with the rule that states that niOBJ and anaVE cannot co-occur

Page 57: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

Speed in morphological analysis– Finite state method: 4500 w/s– Two-phase method: 500 w/s

Speed in machine translation– Finite state method: 650 w/s– Two-phase method: 350 w/s

Page 58: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

• CG-3 in disambiguation and syntactic mapping is an open domain product

Page 59: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

• Rules for re-ordering the sentence structure in the target language can be written with any suitable programming language

Page 60: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Two implementations of SALAMA

• Rules for producing the surface form in the target language can be written with any suitable programming language

Page 61: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Where?

• The main responsibility for developing LT for African languages should be in those countries where the languages are mostly used– Work out and implement a plan– Provide resources (human and capital)– Make use of the know-how available

globally– Networking

Page 62: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language resources

• Text corpora– Private collections of texts by

researchers– Text and speech corpora of official

languages of South Africa– Helsinki Corpus of Swahili (12 m)

globally available• Texts corrected and edited

• Morphologically annotated

– Gikuyu annotated corpus (de Pauw et al)

Page 63: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language resources

• Manuscripts– SOAS (School of Oriental and

African Studies, London) holds a large collection of Swahili manuscripts

– Background info in the Web, but not the manuscripts themselves

Page 64: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Language resources

• Corpus compilation in cooperation with publishing houses – so far very little used

• More extensive use of the material available in the Web

Page 65: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Dictionaries

• Internet Swahili Dictionary (Yale University)– Free and widely used (1 mil web visits

monthly)– Compiled on voluntary basis

• NOTE! On Sep 5 2007 The Internet Living Swahili Dictionary has been taken offline – at least temporarily

Page 66: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Dictionaries

• TUKI (Taasisi ya Uchunguzi wa Kiswahili) has released a CD version of the – Swahili - English and – English – Swahili dictionaries– Can be edited and used in

developing language tools

Page 67: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Grammars

• Electronic grammars missing• SALAMA (Swahili Language

Manager) contains a comprehensive grammar of Swahili

• SALAMA-DC has a potential of compiling extensive dictionaries with translated use examples

Page 68: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Tools

• Spell checkers based on word lists available for a number of languages– Kilinux (Kiswahili Linux) project

• Spell checkers based on linguistic analysis– Orthographix 2 for Swahili (Lingsoft)– Swahili speller integrated to MS Office

2007

Page 69: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Tools

• SALAMA - A comprehensive environment for developing various kinds of tools

• Spell checking• Information retrieval• Vocabulary compilation• Concordance compilation• Dictionary compilation• Machine translation

• So far based on the language in text form

Page 70: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Projects in progress

• Localization to Swahili:– Windows 2000 and Windows XP (2006)– MS Office 2003 (2005)

• Open Swahili Localization Project (KILINUX)– Linux to Swahili– OpenOffice to Swahili, including a

Swahili spell checker

• Ubuntu (a basic version of Linux) to Swahili and many other languages

Page 71: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Technology projects

• SALAMA– Based on linguistic knowledge– Maximal amount of linguistic information

expressed overtly and systematically– Statistical probabilities used in semantic

disambiguation– Developing environment rather than a

task-specific tool– Modular structure– Extensible

Page 72: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Technology projects

• SALAMA– Current status:

• machine translation from Swahili to English

• Dictionary compilation

– Future plans: • machine translation from English to

Swahili• Integration to speech-to-speech

applications

Page 73: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Open source platforms

• Need: open source platforms– Initiatives do exist for developing

open source platforms for morphological analysis

Page 74: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Government support in Africa

• Detailed plans on how to proceed and how to finance the work still missing

• South Africa better organized than other areas

• Initiatives for networking– Networking the Development of

Language Resources for African Languages (LREC 2006, Genova)

– EDULINK Initiative (EU)

Page 75: Language technology in Africa: Prospects Arvi Hurskainen University of Helsinki.

Summary

• Atmosphere positive• Towards open source solutions• Special features of African

languages to be taken into account• Systems rather than ad hoc

solutions for individual problems• Networking extremely important