TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

29
TAUS MACHINE TRANSLATION SHOWCASE Moses Past, Present and Future 09:20 – 09:40 Wednesday, 12 June 2013 Hieu Hoang University of Edinburgh

description

This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit. MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme. For the latest updates, follow us on Twitter - #MosesCore

Transcript of TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Page 1: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

TAUS  MACHINE  TRANSLATION  SHOWCASE  

Moses Past, Present and Future 09:20 – 09:40 Wednesday, 12 June 2013 Hieu Hoang University of Edinburgh

Page 2: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Sta$s$cal  Machine  Transla$on  with  Moses  

Hieu  Hoang  Localiza$on  World  2013  

0.6227  

Page 3: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   3  

Page 4: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   4  

Page 5: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Sta$s$cal  Machine  Transla$on?    

It  is  very  temp,ng  to  say  that  a  book  wri5en  in  Chinese  is  simply  a  book  wri5en  in  English  which  was  coded  into  the  “Chinese  code.”  If  we  have  useful  methods  for  solving  almost  any  cryptographic  problem,  may  it  not  be  that  with  proper  interpreta,on  we  already  have  useful  methods  for  transla,on?  

Warren  Weaver  1949  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   5  

Page 6: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

•  NLP  Applica$on  – search  engines,  text  mining  etc.  

•  Big-­‐data  – bi-­‐text  from  the  Internet  

•  eg.  mul$lingual  websites,  documents  

–  large  monolingual  data  

•  Learn  to  translate  –  from  previous  transla$ons  – models  of  language  

What  is  Sta$s$cal  Machine  Transla$on?    

Moses  by  Hieu  Hoang,  University  of  Edinburgh   6  

Page 7: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Sta$s$cal  Machine  Transla$on?  Training  

Training  Data   Linguis$c  Tools  bi-­‐text  monolingual  data  dic$onary  

SMT  System  transla$on  model  language  model  lots  of  numbers…  

Using  

Source  Text  

SMT  System  transla$on  model  language  model  lots  of  numbers…  

§  

Source  Text  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   7  

Page 8: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  a  model?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   8  

thanks  to  Precision  Transla$on  Tools  

•  Transla$on  Model  •  Language  Model  –  (of  the  target  language)  

Page 9: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  a  model?  •  Transla$on  model  – source  à  transla$on  – probability  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   9  

source   target   probability  

den  Vorschlag   the  proposal   0.6227  

‘s  proposal   0.1068  

a  proposal   0.0341  

the  idea   0.0250  

this  proposal   0.0227  

proposal   0.0205  

….   ….  

Page 10: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  a  model?  •  Language  model  – Likelihood  of  sentence  –  in  target  language  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   10  

text   probability  

I  would  like   0.489  

would  like  to   0.905  

like  to  commend   0.002  

to  commend  the   0.472  

commend  the  rapporteur  

0.147  

….   ….  

Page 11: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   11  

Page 12: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Moses?  

•  Replacement  for  Pharoah  – Academic  so_ware  – Closed-­‐source  

•  Open  source  •  Re-­‐wriaen,  clean  code  – More  features  

•  Large  developer  community  –  Ini$ated  by  Hieu  Hoang  – Developed  at  NLP  Workshop  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   12  

Page 13: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Timeline  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   13  

Page 14: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Moses?  

•  Only  for  Linux  •  Difficult  to  use  •  Unreliable  •  Only  phrase-­‐based  •  Developed  by  one  person  •  Slow  

Common  Misconcep$ons  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   14  

Page 15: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Only  works  on  Linux  

•  Tested  on  – Windows  7  (32-­‐bit)  with  Cygwin  6.1    – Mac  OSX  10.7  with  MacPorts  – Ubuntu  12.10,  32  and  64-­‐bit  – Debian  6.0,  32  and  64-­‐bit  –  Fedora  17,  32  and  64-­‐bit  –  openSUSE  12.2,  32  and  64-­‐bit  

•  Project  files  for  –  Visual  Studio  –  Eclipse  on  Linux  and  Mac  OSX  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   15  

Page 16: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Difficult  to  use  •  Easier  compile  and  install  –  Boost  bjam    – No  installa$on  required  

•  Binaries  available  for  –  Linux  – Mac  – Windows/Cygwin  – Moses  +  Friends  

•  IRSTLM  •  GIZA++  and  MGIZA  

•  Ready-­‐made  models  trained  on  Europarl  Moses  by  Hieu  Hoang,  University  of  

Edinburgh   16  

Page 17: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Unreliable  •  Monitor  check-­‐ins  •  Unit  tests  •  More  regression  tests  •  Nightly  tests  –  Run  end-­‐to-­‐end  training  –  hap://www.statmt.org/moses/cruise/  

•  Tested  on  all  major  OSes  •  Train  Europarl  models  –  Phrase-­‐based,  hierarchical,  factored  –  8  language-­‐pairs  –  hap://www.statmt.org/moses/RELEASE-­‐1.0/models/  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   17  

Page 18: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Only  phrase-­‐based  model  –  replacement  for  Pharoah  – extension  of  Pharaoh  

•  From  the  beginning  – Factored  models  – Lamce  and  confusion  network  input  – Mul$ple  LMs,  mul$ple  phrase-­‐tables  

•  since  2009  – Hierarchical  model  – Syntac$c  models  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   18  

Page 19: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Developed  by  one  person  •  ANYONE  can  contribute  

 – 50  contributors  

‘git  blame’  of  Moses  repository  

0%  5%  10%  15%  20%  25%  30%  35%  40%  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   19  

Page 20: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Slow  

thanks  to  Ken!!  

Decoding  

-101.7

-101.6

-101.5

-101.4

1 2 3 4 5

Mod

elscore

CPU seconds/sentence excluding loading

Mosescdec

Joshua

Moses  by  Hieu  Hoang,  University  of  Edinburgh   20  

Page 21: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Slow  

•  Mul$threaded  

•  Reduced  disk  IO  – compress  intermediate  files  

•  Reduce  disk  space  requirement  

Time  (mins)   1-­‐core   2-­‐cores   4-­‐cores   8-­‐cores   Size  (MB)  

Phrase-­‐based  

60   47  (79%)  

37  (63%)  

33  (56%)  

893  

Hierarchical   1030   677  (65%)  

473  (45%)  

375  (36%)  

8300  

Training  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   21  

Page 22: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Moses?  Common  Misconcep$ons  

•  Only  for  Linux  •  Difficult  to  use  •  Unreliable  •  Only  phrase-­‐based  •  Developed  by  one  person  •  Slow  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   22  

Page 23: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  is  Moses?  

•  Only  for  Linux    Windows,  Linux,  Mac  •  Difficult  to  use  Easier  compile  and  install  •  Unreliable  Mul$-­‐stage  tes$ng  •  Only  phrase-­‐based  Hierarchical,  syntax  model  •  Developed  by  one  person  everyone  •  Slow  Fastest  decoder,  mul$threaded  training,  less  IO  

Common  Misconcep$ons  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   23  

Page 24: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   24  

Page 25: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Coming  up…  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   25  

•  Code  cleanup  •  Incremental  Training  •  Beaer  transla$on  – smaller  model  – bigger  data  –  faster  training  and  decoding  

•  Applica$ons  – CAT  tools  – Speech  transla$on  

Page 26: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Applica$ons  

•  EU  Project  – CASMACAT  – MATECAT  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   26  

Computer-­‐Aided  Transla$on  

Page 27: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

Agenda  

•  What  is  Sta$s$cal  Machine  Transla$on?  •  What  is  Moses?  – Common  misconcep$ons  

•  Coming  up  •  What  can  we  do  for  you?  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   27  

Page 28: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  can  we  do  for  you?  

– simpler  Moses  – graphical  interface  – Windows  compa$bility  –  terminology  and  glossary  –  incremental  training  

•  What  can  you  do  for  us?  – code  – data  –  funding  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   28  

Page 29: TAUS MT SHOWCASE, Moses Past, Present and Future, Hieu Hoang, University of Edinburgh, 12 June 2013

What  can  we  do  for  you?  

– simpler  Moses  – graphical  interface  – Windows  compa$bility  –  terminology  and  glossary  –  incremental  training  

•  What  can  you  do  for  us?  – code  – data  –  funding  

Moses  by  Hieu  Hoang,  University  of  Edinburgh   29