TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

42
Changes in Moses Hieu Hoang TAUS October 2014

description

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. 

MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. 
 

For the latest updates go to http://www.statmt.org/mosescore/ or follow us on Twitter - #MosesCore

Transcript of TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Page 1: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Changes in Moses

Hieu HoangTAUS

October 2014

Page 2: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Easier installation– Binary releases– Pre-built models

• Testing and Releases– Linux, Mac OSX, Windows– 32 and 64-bit

• Faster training– Parallelism at all stages

Year 1 (2012)

Page 3: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

Page 4: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

Page 5: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 6: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 7: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 8: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Specify a Feature Function

• New Feature Function– New sections

● [feature-function-file]● [weight-?]

• Custom code– Parse ini file

– Initialize feature function

Then….[lmodel-file]8 0 3 europarl.en.srilm.gz

[weight-l]0.142

ini file:

Page 9: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Adding new Feature Function

• New Feature Function– No new section

● Line in [feature] section

● Line in [weight] section

– Framework● parse ini file● initialize feature

Now….[feature]KENLM file=path order=0

[weight]KENLM0= 0.142

ini file:

Page 10: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 11: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

● Dynamic suffix array● Stores training data

– Extract translation rule on-the-fly– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 12: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model● Continuous space LM

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 13: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models● Replicate Devlin et al, 2014● Large quality gains

– Transliteration

• Translation rule properties

• Syntax decoding

Page 14: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration● Character level translation● Learns from parallel data● Integrate into decoder

• Translation rule properties

• Syntax decoding

Page 15: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

– Extra information for each rule● Context, syntax, domain etc

• Syntax decoding

Page 16: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

– Faster, memory efficient decoding

– More syntactic models

Page 17: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases

– Academic and commercial needs

– Prevent forks

– Development/Stable versions

– Forwards/Backward compatibility

– Upgradability

• Features

• Deployment

• Future development

Page 18: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

• Future development

Page 19: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

– Platform/Clouds

– Docker containers

– Priorities

– Interaction and data formats

• Future development

Page 20: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features

• Deployment

• Future development

– User-friendliness

– End-to-end solution

– Users

Page 21: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh
Page 22: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Changes in Moses

Hieu HoangTAUS

October 2014

Thanks for inviting me to come

Here to tell you a little about the things I’ve been doing to Moses

- over the past 2 years - mainly concentrate of the past year - but will quickly tell you about things I did

prior to that

1

Page 23: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Easier installation– Binary releases– Pre-built models

• Testing and Releases– Linux, Mac OSX, Windows– 32 and 64-bit

• Faster training– Parallelism at all stages

Year 1 (2012)

In the 1st year - picked off the low hanging fruit - fixed many of the easy issues that required - time & effort

Made installation easierRun a lot of experiments anyway - gave some of them away - with all the scripts + configuration - used to run them - students can see how to replicate our

resultsLots of testing - all major platforms

Made obvious speed improvements - parallelising as much the traning as possible

2

Page 24: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

In year 2 - made it even easier to install - if you can’t be bother to compile or even

download the binaries

- download a virtual machine with moses + friends installed

OR rent an amazon server with moses + friends

installed

3

Page 25: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

However, the main reason I came here today - talk about the major changes I made - in decoder - and else where Makes is easier for us coders - add and change things in Moses

4

Page 26: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

What is a feature function? - something that gives a translation a score

over the last few years - gotten bored with existing features like

language models and reordering modelsthe trend in MT - create novel features which give a score to a translation - like any feature, tries to give bigger scores to better models

New feature function framework - designed to make it easy to add new features

Not totally new to Moses - always had the ability have - add new LM implementations - add new phrase-table implementation - now – generalize to mutiple implementations of arbitary features that gives a score to transation - always been able to add new features - just made it easier

Another trend - FF shouldn’t just have a fixed, limited number of scores - they can have unknown number of scores - that can flicker on when a particularly good, or bad translation, is used - this is usually called sparse featuresAim of feature function framework - give them equal prominense to dense features - rather than have them as abjuncts - easy to forget - all FF can have sparse features - don’t need to turn it - FF can have dense AND sparse features - not mutually exclusive

5

Page 27: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Simplify class structure - to make it easier for us to develop with

Moses - Moses has been around for 8 years now - everyone has the freedom to add what

they want - no-one is in overall control - this way of organising an open-source

project is great - gotten lots of contribution, lots of

features - downside - grown organically - things are not as well structured as

they can be - now I have the time - with the benefit of hindsight - go back and put some structure

to what we’ve done

6

Page 28: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Why did I delete things - delete very little - I’m not the gatekeeper of moses, I don’t

control it - if a functionality was deleted, it’s not a

comment on usefulness of it - purely ‘cos it got in the way of the

refactoring

Quickly go thru the last 2 - before telling you about feature functions

7

Page 29: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Specify a Feature Function

• New Feature Function– New sections

● [feature-function-file]● [weight-?]

• Custom code– Parse ini file

– Initialize feature function

Then….[lmodel-file]8 0 3 europarl.en.srilm.gz

[weight-l]0.142

ini file:

completely bestoked - no framework to help you - if you don’t do it right, wont’ work

8

Page 30: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Adding new Feature Function

• New Feature Function– No new section

● Line in [feature] section

● Line in [weight] section

– Framework

● parse ini file● initialize feature

Now….[feature]KENLM file=path order=0

[weight]KENLM0= 0.142

ini file:

Write a class that implements the feature function

The framework does the rest - no need to create a custom section in the ini file or - change StaticData class or - change Paramater class

9

Page 31: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 32: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

● Dynamic suffix array● Stores training data

– Extract translation rule on-the-fly– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 33: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

● Continuous space LM– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 34: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models● Replicate Devlin et al, 2014● Large quality gains

– Transliteration

• Translation rule properties

• Syntax decoding

Page 35: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration● Character level translation● Learns from parallel data● Integrate into decoder

• Translation rule properties

• Syntax decoding

Page 36: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

– Extra information for each rule● Context, syntax, domain etc

• Syntax decoding

Page 37: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

– Faster, memory efficient decoding

– More syntactic models

Page 38: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases

– Academic and commercial needs

– Prevent forks

– Development/Stable versions

– Forwards/Backward compatibility

– Upgradability

• Features

• Deployment

• Future development

Page 39: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

• Future development

Page 40: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

– Platform/Clouds

– Docker containers

– Priorities

– Interaction and data formats

• Future development

Page 41: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features

• Deployment

• Future development

– User-friendliness

– End-to-end solution

– Users

Page 42: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh