Moses

56
WELCOME

description

moses is a machine language translation tool

Transcript of Moses

  • 1.Presented by NIKHIL.P MCA S4 CHINTECH

2. INTRODUCTION TRANSLATION??Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. TRANSLITERATION?? It is the conversion of a text from one script to another. 3. INTRODUCTION Why TRANSLATION??Being able to establish links between two languages allows for transferring resources from one language to another. Books written in unknown foreign languages can be read by translating the contents of the book in our own language. 4. Computers DatabasesRoboticsArtificial IntelligenceAlgorithmsNatural Language ProcessingInformation RetrievalMachine TranslationNetworkingSearch 5. INTRODUCTION Natural Language Processing(NLP)NLP is a field of Computer Science, Artificial Intelligence and Linguistics, concerned with the interactions between computers and human(natural) languages. Applications of NLP Machine Translation database access information retrieval 6. Machine Translation?? Machine Translation is the automatic translation ,for example using a computer system, from a first language(source language) into another language(target language). 7. Background Automatic machine language processing was one ofthe first natural language processing applications developed in computer science. Explores rule based, example based, knowledge basedand statistical approaches. Statistical Machine Translation(SMT) is the preferredapproach in many industrial and academic research. 8. Rule based Machine Translation: a system of lexical,grammatical, and reordering rules is created for source/target pair. Rules are then applied to source to produce output. Example based Machine Translation: a bilingual textcorpus is used directly for comparison against source text and case based reasoning is applied to create output. 9. What is Moses? It is an open source toolkit Toolkit for (SMT)Statistical Machine Translation Moses is under LGPL license It uses standard external toolkits such as GIZA++ andSRILM 10. Statistical Machine Translation?? Goal is to produce a target sentence from a sourcesentence that maximizes the probability Statistical MT system is modeled as three separate parts: language model translation model decoder 11. language model(LM): assigns a probability to any target string of words {P(e)} an LM probability distribution over strings S that attempts to reflect how frequently a string S occurs as a sentence. 12. translation model(TM): assigns a probability to any pair of target and source strings {P(f|e)} decoder: determines translation based on probabilities of LM & TM 13. GIZA++ It is used for making word-alignments This toolkit is an implementation of the original IBMModels that started machine translation research. 14. First the language pairs are aligned bi-directionally, asEnglish to German and German to English This generates two word alignments, then performs Intersection-, we get a high-precision alignment ofhigh confidence alignment points, Union-, we get a high-recall alignment with additional alignment points. 15. SRILM It is used for language modeling. It consists of the following componentsA set of C++ class libraries implementing language models, supporting data structures and miscellaneous utility functions. A set of executable programs built on top of these libraries to perform standard tasks such as training LMs and testing them on data, A collection of miscellaneous scripts facilitating minor related tasks 16. Moses Translation Process It involves Segmenting the source sentence into source phrases Translating each source phrase into a target phrase & optionally reordering the target phrases into a targetsentence. 17. Moses Toolkit Consists of all the components needed to preprocessdata , train the language models and the translation models. Also contains tools for tuning these models usingminimum error rate. External tools like GIZA++ & SRILM 18. Moses Toolkit Decoder is the core component of Moses. Phrase based decoder is used. Job of decoder is to find the highest scoring sentencein the target language corresponding to source sentence. Possible to output a ranked list of translationcandidates 19. Principles used when developing Moses decoder Accessibility Easyto maintain Flexibility Easy for distributed team development Portability It was developed in C++ for efficiency and followedmodular, object-oriented design. 20. Decoding process in various ways:-Input:-can be plain sentence -Translation model -Decoding algorithm-Language model 21. Contributed Tools Moses Server- provides an xml-rpc interface to thedecoder Web translation- set of scripts to translate webpage Analysis tools- scripts to enable and analyze thevisualization of Moses output 22. Moses Decoder A simple translation modelContains two files: Phrase-table(phrase translation table) {de ||| the ||| 0.3 ||| |||} Moses.ini(configuration file) The decoder is controlled by moses.ini 23. Phrase table:The phrase translation tables are the main knowledge source for the machine translation decoder. entry means that the probability of translating theEnglish word the from the German der is 0.3. 24. Configuration fileThe decoder is controlled by the Moses configuration file moses.initranslation model files and language model files are specified here. 25. Moses Decoder TraceThis option reveals which phrase translation were used in the best translation found by the decoder. 26. Moses Decoder Tuning for Qualitythe probability cost is assigned by four models Phrase translation table (phi(f|e)ensures that both source and target language phrases are good translation of each other Language model (LM(e))ensures that the output is fluent target language 27. Reordering model (D(e,f))allows for the re-ordering of the input sentence Word penalty (W(e))to ensure that the translation do not get too long or too short 28. Moses Decoder Tuning for Speedspeed-ups are achieved by limiting the search space of the decoder Translation table size Hypothesis stack size 29. Translation table size one strategy is to reduce the number of translation options used for each input phrase , i.e., number of table entries that are retrieved.two ways to limit table size I. II.fixed limits on translation options retrieved phrase translation probability has to above some value 30. Hypothesis stack sizeanother way to reduce the search space is to reduce the size of hypothesis stacks. for each number of foreign words translated, decoder keeps a stack of the best translations. 31. Moses Decoder Limit on Distortion Reordering cost is measured by the number of wordsskipped when foreign phrases are picked out of order. Reordering cost is computed for finding the best target pair probability. 32. Moses Decoder 33. Decoding Algorithm Decoder uses a beam search algorithm The output sentence is generated left to right in formof hypothesis Final state in the search are hypotheses that cover allforeign words. 34. Beam Search an efficient search algorithm that quickly finds the highest probability translation among the exponential number of choices. Search through the space of hypotheses generated is performed using beam search that keeps in each node the list of the top best translations for the node. 35. The score for the translation is computed using the weights of the individual phrases that make up the translation and the overall LM probability of the combination. The scores are computed by querying the standard Moses Phrase Table and the LM for the target language. 36. Language Models Decoder works with the following language models: SRI language model IRST language model RandLMKenLM is included by default in moses 37. Translating Webpages with Moses 38. Moses servers are installed in one or several computers On each Moses server, a daemon(daemon.pl) acceptsnetwork connection on a given port and copies everything it receives from the connection to Moses. Another web server runs Apache or any web serversoftware Through web server cgi scripts(index.cgi, translate.cgi)are served to clients. 39. A client request index.cgi via the web server, a formcontaining textbox is served back to enter the URL. The form is submitted to translate.cgi which does thejob. it fetches page from web extract plaintext from it send those to moses server inserts the translation back into document& to client 40. Setting up MOSES server Choosing machines for moses servers running Moses is slow and expensive process, so the machine used must have a fast processor and as many GBs of memory as possible. Install Moses for each moses server, need to install and configure the language pair that we wish to use. 41. Setting up MOSES server Install daemon.pl open bin/daemon.pl and edit the $MOSES and $MOSES_INI paths to point to the location of moses binary and moses configuration file. Choose a port number pick any port number between 1024 and 49151 for the daemon process to listen on. 42. Setting up MOSES server Start the daemon to activate Moses server, type in a shell on the server, ./daemon.pl hostname is the name of the host where Moses is installed. port is the selected port 43. Setting up MOSES server Configure web server to connect to Moses server final step is to tell the front-end Web server where to find the back-end Moses server in the translate.cgi script set the @MOSES_ADDRESS array to the list of hostname:port strings identifying the Moses servers. 44. Comparison with pharaoh and phramer for a fren translation of 2000 sentences 45. Installing Moses Need to install boost sudo apt-get install libboost-all-dev get source code git clone git://github.com/mosessmt/mosesdecoder.git 46. Installing GIZA++ wget http://giza-pp.googlecode.com/files/giza-pp-v1.0.7.tar.gz tar xzvf giza-pp-v1.0.7.tar.gz cd giza-pp Make cd ~/mosesdecoder mkdir tools cp ~/giza-pp/GIZA++-v2/GIZA++ ~/giza-pp/GIZA++-v2/snt2cooc.out ~/giza-pp/mkcls-v2/mkcls tools 47. Installing IRSTLM tar zxvf irstlm-5.80.01.tgz cd irstlm-5.80.01 ./regenerate-makefiles.sh ./configure --prefix=$HOME/irstlm make install 48. Moses Platform Primary development platform for Moses is Linux. & recommended platform is Linux since it is easier toget support for it. However it works on other platforms also. 49. Moses Releases Moses 1.0 (28th Jan 2013) Moses 0.91 (12th Oct 2012) 50. Importance of Moses Moses is an installable software unlike other online-only translation systems Online systems cannot be trained on our own data There is also a problem with privacy, if you have totranslate sensitive info. 51. Conclusion Moses is an open source toolkit, so that the users can modify and customize the toolkit based on their needs and requirements. 52. Reference www.statmt.org/moses/ www.crosslang.com/en/machine-translation/custom-built-mt-engines/moses-smt 53. Questions??