Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and...

36
Workshop:HLT Collaboratio n 23 -26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language and Speech Technology

Transcript of Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and...

Page 1: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

1

HLT in South Africa: Yesterday, Today and Tomorrow

Justus Roux

Stellenbosch University Centre for Language and Speech Technology

Page 2: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

2

AIM

• Brunfelsia Latifolia

• Focus on official government policy development on HLT in South Africa

• Role players in policy making • Wish list regarding future planning and policies

Page 3: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

3

YESTERDAY1999 / 2000:

• First initiative by Pan South African Language Board (PanSALB) and the Department of Arts, Culture, Science and Technology (DACST) towards setting up a “Human Language Technology Project”

• Joint Steering Committee: DACST, PanSALB, Universities: Stellenbosch, Pretoria, UNISA, Bloemfontein, ICOMTEK (CSIR), private translation company

• Task to develop a Strategic Plan for HLT development in South Africa

Page 4: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

4

YESTERDAY

Thinking at that time very much influenced by

– European Model for ‘Language Engineering’ and FP5 funding for HLT in Europe

– Recognition of particular realities in SA• Academic & technical realities – limited – training and

reskilling programmes – technology transfer

• Financial realities – co-operation to be sought from Government, Academia, Private sector

• Political realities – official language situation > development of National Lexicographic Units (NLUs)

Page 5: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

5

YESTERDAY

September 2000 – Report – The development of Human Language Technologies in South Africa – Strategic Planning.

Three steps• Step 1 Create a SA model for HLT development

and implementation– Component 1: Applied research and capacity building

(Specialised courses at tertiary institutions, short informal courses)

– Component 2: Production of language resources – standards – “Regulatory forum”

– Component 3: Developing enabling technologies – support to innovative projects – funding from Innovation Fund of DACST

– Component 4: Conscious steps to develop HLT industry

Page 6: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

6

YESTERDAY

Step 2 Creation of a legal framework to ensure systematic acquisition of government resources

Ammendment of the Legal Deposit Act (1997)

Step 3 Development of physical infrastructure to manage the implementation of the model

(NB Role of the NLUs as integral part)

• Virtual National Language and Speech Resource Centre

• Virtual National Electronic Language and Speech Data Network

• Regulatory Forum for Human Language Technologies

Page 7: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

7

Page 8: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

8

• Strategic plan was accepted (by DACST) and on 8 November 2001 a Ministerial Advisory Panel on HLT was inaugurated with the task to focus on the viability of the establishment of a “virtual national electronic language and speech network”

• 8 members – three of whom are at this meeting

• Report delivered in to the Minister in September 2002

YESTERDAY

Page 9: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

9

Recommendations

#1 A virtual HLT Centre to be established with a hub and spoke / nodes configuration (Accepted)

YESTERDAY

Page 10: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

10

Structure of National Resource Centre for HLT (Virtual Centre: Hub and connected nodes)

Centre YSA Eng

AfrikaansUni D

N SothoSign Lang

Uni BVendaTsonga

Uni AXhosaSwati

Centre XZulu

Ndebele

Uni CN SothoTswanaManagerial Hub

Coordination of Node Activities

Data acquisitionData enhancement

Data management & backupTraining

NLU (?)Lang (?)

LELE

LE

LE = Language experts

Page 11: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

11

Recommendation #2 (Not accepted)

Establishment of an interim Implementation Secretariat for period of one year

In stead an HLT Steering Committee was appointed to oversee

implementation within a period of five years

Recommendation #3 (Accepted – not implemented)

HLT development should take place in co-operation with Presidential National Commission on Information Society and Development

Recommendation #4 (Not accepted – not necessary)

Amendment of Legal Deposit Act (1997)

YESTERDAY

Page 12: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

12

YESTERDAY

2002

Department of Science and Technology (DST) – National Research and Development Strategy – reference to ICT / HLT (Handout)

2003• National Language Policy Framework (NLPF) approved by Cabinet

(February) – specific reference to HLT in Section 3 (3.3) • The development of an official HLT Strategy as one of the

implementation mechanisms of the NLFP is suggested - Section 4 (4.8) (Refer “TODAY”)

• Establishment of an HLT Unit within National Language Service • HLT Steering Committee appointed to oversee implementation of

an HLT Resource Centre within a period of five years in collaboration with the HLT Unit of the National Language service (NLS) (2003-2007)

Page 13: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

13

YESTERDAY

2004

Department of Trade and Industry Report

Benchmarking of Technology – Trends and Technology Developments

Emphasis on the important role of HLT within the economic sector in South Africa.

Page 14: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

14

Summary of technologies with potential high impact on ICT sector

(SA Dept Trade and Industry Report 2004: 10)

Low HighSouth Africa`s ability to respond

Po

ten

tial i

mp

act

on

in

du

stry

Mobile

WirelessHLT

OSS

TelemedicineGrid computing

Geomatics

RFID

Manufacturing (CAD, Robotics)

Lim

ited

Pe

rva

sive

Page 15: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

15

YESTERDAY2005• Establishment of Meraka Institute with HLT Research

Group Initiative of Department of Science and Technology (DST)

• National Workshop on HLT (May 2005 – CSIR Conference Centre) – Roadmapping – Main issues and recommendations are in handout.

• During this period several workshops and conference tracks were held:– PRASA annual conferences– ALASA SIG on Language and Speech Technology Development– ALASA International Conferences (special track)– Roadmapping workshop with State IT Agency (SITA) – Steven

Krauwer (BLARKS)

Page 16: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

16

TODAYProgress of Steering Committee to set up Resource Centre in collaboration with NLS (HLT Unit) (1)

• Draft HLT National Strategy document developed and submitted (Detail Dr Jokweni)

• Great amount of work, but little progress

• The Steering Committee had a strained working relationship with previous Chief Director of NLS, hence two instances of disagreement:

– Unilateral call by DAC (NLS) (2005) for tenders as management agent for the envisaged National Resource Centre – failure – no funds available

– Unilateral call for development proposals by DAC (2006) – Steering Committee was not involved (amount distributed to successful applicants – outputs imminent)

Page 17: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

17

TODAY

Progress of Steering Committee to set up Resource Centre in collaboration with NLS (HLT Unit) (2)

• The Steering Committee has a good working relationship with new Chief Director and staff of the of NLS

– Submissions for funding submitted

Page 18: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

18

Research Role Players in South Africa: Universities

LanguageResources

EnablingTechno-logies

StandardiseFormats &Protocols

Speech recognition

Morph analysis

Speech generation

POS tagging

Syntactic analysis

Semantic analysis

Text corpora

Spoken corpora

Dictionaries

Lexicons

Grammars

Terminology banks

Research

UniversitiesEngineering

Computer Science Dedicated R&D Centres

Meraka Institute

DST

InternationalStandards

Organisation(ISO TC 37)

SABS TC 37

UniversitiesLanguages

Linguistics Dedicated R&D Centres

NLSPanSALB

DAC

Page 19: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

19

TOMORROWWish list - Planning and policy

• Restructuring of the HLT Steering Committee: Real role players are needed to contribute to the debate (Request to the Minister through NLS / DAC)

• Establishment of the HLT Resource Centre as a priority.– Render support services to HLT community– Source of job creation

• Co-ordinated academic training at national level– Standard curricula over and above specialised curricula– Staff exchange programme (national & international)– Recognition of modules across accredited institutions

• Applied research conducted in accordance with national priorities set by, for example, a body of experts from user sectors. (Roadmaps, annually updated.)

• Blue sky research within HLT remains imperative also from funding perspective.

Page 20: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

20

TOMORROW• National funding procedures for HLT research and

training should be transparent and equitable– Task for a Select Committee of National and International

Experts (?)

• Address the particular interest in HLT research and training within Africa: imminent projects – Algeria, Morocco, Kenya, Nigeria and Gabon. – Possibility of international funding, e.g. Association of African

Universities (AAU) staff & student exchange programme

• Hopefully more insights to be gained from this workshop, not only with respect to international co-operation, but also regarding the positioning of HLT activities in South Africa.

Page 21: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

21

THANK YOU

JC Roux

[email protected]

Page 22: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

22

Page 23: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

23

FUNCTIONS OF HLT CENTRE

Page 24: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

24

Importance of a National Resource Centre for HLT

• Acquiring, enhancing and managing text and speech data for HLT applications:– Extremely costly– Extremely time consuming– Requires skilled language experts

• Therefore: Need to develop reusable resources

• General practice world wide:– ELSNET (Europe), LDC (USA), (Japan)

Page 25: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

25

Functions of a National Resource Centre for HLT

• Constitutes one of the integral components for effective HLT product development in all official languages of SA.

• Will interact will all other role players for in the field to expedite service delivery in HLT applications.

• It will serve a depository of raw and enhanced reusable text and speech resources of all SA languages for use by different communities / institutions for language related purposes, e.g. NLUs, Terminology development sections, translation services, education etc

• It will serve as a language archive to document language and speech phenomena of the official languages of SA over a period of time as part of cultural heritage. (SA lost its ‘Sound Archive’)

Page 26: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

26

Tasks of a National Resource Centre for HLT

Data acquisition • Text data

– Different types / genres• Official / Formal (announcements, legislation)• Informal (magazines etc)• Literary (novels, drama etc)

• Sources:• Printed media: News agencies, Publishers• Government services (all levels, including Hansard)

Page 27: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

27

Tasks of a National Resource Centre for HLT

Data acquisition • Speech data

– Different types• Read speech • Spontaneous speech

– Different domains & conditions• Sport, news, interviews / noisy environments

– Different transmission modes• Telephone speech: mobile, fixed lines• Recorded speech (microphone)

– Different subjects• Male, Female, young, old, impaired

• Sources:• SABC archives• Own initiatives (!)

Page 28: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

28

Tasks of a National Resource Centre for HLT

Data enhancementText• Development and application of

– Tokenisers (word identification)– Parts of speech taggers (nouns, verbs, adverbs etc)– Morphological analysers (composition of words)– Syntactic parsers (composition of phrases / sentences)(With tools to be developed in collaboration with experts

from Technology Component)

• Creation of machine readable lexicons (XML format)

Page 29: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

29

A partial XML entry for the noun -ntu, class 1-2, is as follows

<Entry> <Head> <Stem>ntu</Stem> </Head> <Body> <Tone>3.2.9</Tone> <MSI>

<POS> <Noun> <Noun-features>

<Class-pf-s>umu</Class-pf-s><Class-pf-p>aba</Class-pf-p><Class-no>1-2</Class-no><Label>n</Label>

<Dim> <Form>umntwana</Form>

<Sense>baby, small child</Sense> </Dim> <Loc> <Form>kumuntu</Form>

Bosch SE, Pretorius L & Jones, J. Towards machine-readable lexicons for South African Bantu Languages. Nordic Journal of African Studies 16 (2): 131-145 (2007)

Page 30: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

30

Tasks of a National Resource Centre for HLT

Data enhancement (2)

Speech• Orthographic transcriptions of speech (S to T)• Phonetic transcription and annotation of speech

– Sound like utterances• Fluent speech• Repetitions, false starts etc

– Non sound like utterances• Background noise• Lip smacks etc

• Supportive software programmes (e.g. Praat)

Page 31: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

31

Ukuja(bula)

Speaker One – Ngithi ukujabula manje

u k u

Page 32: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

32

Tasks of a National Resource Centre for HLT

Data management & Software development

• Determine data needs in collaboration with HLT Unit in NLS for government applications

• Acquire the data with the assistance of language specialists at different nodes of the Centre

• Solicit development of appropriate software• Manage, back-up, distribute data to users• Commercialise resources: private sector

developers

Page 33: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

33

Tasks of a National Resource Centre for HLT

Training and Consultation• Identify training needs and potential trainers

• Develop non-formal training curricula for the reskilling of interested language practitioners

• Organise HLT training workshops at different venues in the country encouraging language bodies to participate

• Create awareness of HLT potential in collaboration with the HLT Unit of NLS

Page 34: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

34

Structure of National Resource Centre for HLT (Virtual Centre: Hub and connected nodes)

Centre YSA Eng

AfrikaansUni D

N SothoSign Lang

Uni BVendaTsonga

Uni AXhosaSwati

Centre XZulu

Ndebele

Uni CN SothoTswanaManagerial Hub

Coordination of Node Activities

Data acquisitionData enhancement

Data management & backupTraining

NLU (?)Lang (?)

LELE

LE

LE = Language experts

Page 35: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

35

Relationships

Seatla se sengwe se tlhapiswa ke se sengwe

(The one hand washes the other)

• No infringements on current lexicographic or terminological activities - Different foci

• Complementary activities:– Raw or enhanced data to be supplied to NLU`s /

PanSALB / NLS– NLU`s could contribute to National depository

• Win-win situation for the sake of technological development of our languages

Page 36: Workshop:HLT Collaboration 23 - 26 November 2008 1 HLT in South Africa: Yesterday, Today and Tomorrow Justus Roux Stellenbosch University Centre for Language.

Workshop:HLT Collaboration 23 -26 November 2008

36

Concluding remarks

• Attempt to speed up activities in the development of HLT applications to provide services in a language of choice.

• To provide new resources and tools for lexicographic and terminological development.

• To provide a new range of job opportunities for graduates in African languages

• Keep South Africa abreast with new developments in the Information Society and avoid the marginalisation of the indigenous languages.