Download - TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Transcript
Page 1: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Changes in Moses

Hieu HoangTAUS

October 2014

Page 2: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Easier installation– Binary releases– Pre-built models

• Testing and Releases– Linux, Mac OSX, Windows– 32 and 64-bit

• Faster training– Parallelism at all stages

Year 1 (2012)

Page 3: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

Page 4: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

Page 5: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 6: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 7: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Page 8: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Specify a Feature Function

• New Feature Function– New sections

● [feature-function-file]● [weight-?]

• Custom code– Parse ini file

– Initialize feature function

Then….[lmodel-file]8 0 3 europarl.en.srilm.gz

[weight-l]0.142

ini file:

Page 9: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Adding new Feature Function

• New Feature Function– No new section

● Line in [feature] section

● Line in [weight] section

– Framework● parse ini file● initialize feature

Now….[feature]KENLM file=path order=0

[weight]KENLM0= 0.142

ini file:

Page 10: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 11: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

● Dynamic suffix array● Stores training data

– Extract translation rule on-the-fly– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 12: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model● Continuous space LM

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 13: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models● Replicate Devlin et al, 2014● Large quality gains

– Transliteration

• Translation rule properties

• Syntax decoding

Page 14: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration● Character level translation● Learns from parallel data● Integrate into decoder

• Translation rule properties

• Syntax decoding

Page 15: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

– Extra information for each rule● Context, syntax, domain etc

• Syntax decoding

Page 16: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

– Faster, memory efficient decoding

– More syntactic models

Page 17: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases

– Academic and commercial needs

– Prevent forks

– Development/Stable versions

– Forwards/Backward compatibility

– Upgradability

• Features

• Deployment

• Future development

Page 18: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

• Future development

Page 19: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

– Platform/Clouds

– Docker containers

– Priorities

– Interaction and data formats

• Future development

Page 20: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features

• Deployment

• Future development

– User-friendliness

– End-to-end solution

– Users

Page 21: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh
Page 22: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Changes in Moses

Hieu HoangTAUS

October 2014

Thanks for inviting me to come

Here to tell you a little about the things I’ve been doing to Moses

- over the past 2 years - mainly concentrate of the past year - but will quickly tell you about things I did

prior to that

1

Page 23: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Easier installation– Binary releases– Pre-built models

• Testing and Releases– Linux, Mac OSX, Windows– 32 and 64-bit

• Faster training– Parallelism at all stages

Year 1 (2012)

In the 1st year - picked off the low hanging fruit - fixed many of the easy issues that required - time & effort

Made installation easierRun a lot of experiments anyway - gave some of them away - with all the scripts + configuration - used to run them - students can see how to replicate our

resultsLots of testing - all major platforms

Made obvious speed improvements - parallelising as much the traning as possible

2

Page 24: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

In year 2 - made it even easier to install - if you can’t be bother to compile or even

download the binaries

- download a virtual machine with moses + friends installed

OR rent an amazon server with moses + friends

installed

3

Page 25: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCore

• Even Easier installation– Binary releases– Pre-built models– Virtual Machines– Amazon EC2

• Refactored Decoder

Year 2 (2013)

However, the main reason I came here today - talk about the major changes I made - in decoder - and else where Makes is easier for us coders - add and change things in Moses

4

Page 26: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

What is a feature function? - something that gives a translation a score

over the last few years - gotten bored with existing features like

language models and reordering modelsthe trend in MT - create novel features which give a score to a translation - like any feature, tries to give bigger scores to better models

New feature function framework - designed to make it easy to add new features

Not totally new to Moses - always had the ability have - add new LM implementations - add new phrase-table implementation - now – generalize to mutiple implementations of arbitary features that gives a score to transation - always been able to add new features - just made it easier

Another trend - FF shouldn’t just have a fixed, limited number of scores - they can have unknown number of scores - that can flicker on when a particularly good, or bad translation, is used - this is usually called sparse featuresAim of feature function framework - give them equal prominense to dense features - rather than have them as abjuncts - easy to forget - all FF can have sparse features - don’t need to turn it - FF can have dense AND sparse features - not mutually exclusive

5

Page 27: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Simplify class structure - to make it easier for us to develop with

Moses - Moses has been around for 8 years now - everyone has the freedom to add what

they want - no-one is in overall control - this way of organising an open-source

project is great - gotten lots of contribution, lots of

features - downside - grown organically - things are not as well structured as

they can be - now I have the time - with the benefit of hindsight - go back and put some structure

to what we’ve done

6

Page 28: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Why did you Refactor?

• Feature Function Framework– easier to implement new features– use sparse features

• Simplify class structure– easier to develop with Moses

• Delete functionality– easier to refactor code– very little deletion

Why did I delete things - delete very little - I’m not the gatekeeper of moses, I don’t

control it - if a functionality was deleted, it’s not a

comment on usefulness of it - purely ‘cos it got in the way of the

refactoring

Quickly go thru the last 2 - before telling you about feature functions

7

Page 29: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Specify a Feature Function

• New Feature Function– New sections

● [feature-function-file]● [weight-?]

• Custom code– Parse ini file

– Initialize feature function

Then….[lmodel-file]8 0 3 europarl.en.srilm.gz

[weight-l]0.142

ini file:

completely bestoked - no framework to help you - if you don’t do it right, wont’ work

8

Page 30: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Adding new Feature Function

• New Feature Function– No new section

● Line in [feature] section

● Line in [weight] section

– Framework

● parse ini file● initialize feature

Now….[feature]KENLM file=path order=0

[weight]KENLM0= 0.142

ini file:

Write a class that implements the feature function

The framework does the rest - no need to create a custom section in the ini file or - change StaticData class or - change Paramater class

9

Page 31: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 32: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

● Dynamic suffix array● Stores training data

– Extract translation rule on-the-fly– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 33: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

● Continuous space LM– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

Page 34: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models● Replicate Devlin et al, 2014● Large quality gains

– Transliteration

• Translation rule properties

• Syntax decoding

Page 35: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration● Character level translation● Learns from parallel data● Integrate into decoder

• Translation rule properties

• Syntax decoding

Page 36: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

– Extra information for each rule● Context, syntax, domain etc

• Syntax decoding

Page 37: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

MosesCoreYear 3 (2014)

• Exploit new framework– Updatable phrase-table

– Neural network language model

– Bilingual language models

– Transliteration

• Translation rule properties

• Syntax decoding

– Faster, memory efficient decoding

– More syntactic models

Page 38: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases

– Academic and commercial needs

– Prevent forks

– Development/Stable versions

– Forwards/Backward compatibility

– Upgradability

• Features

• Deployment

• Future development

Page 39: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

• Future development

Page 40: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features• Deployment

– Platform/Clouds

– Docker containers

– Priorities

– Interaction and data formats

• Future development

Page 41: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh

Technical Breakout• Organization and Releases• Features

• Deployment

• Future development

– User-friendliness

– End-to-end solution

– Users

Page 42: TAUS Moses Industry Roundtable 2014, Changes in Moses, Hieu Hoang, University of Edinburgh