METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical...

28
- FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific Targeted Research Project or Innovation Projects Future and Emergent Technologies D2.1 User requirements: Analysis and Comments Due date of deliverable: 31.1.2005 Actual submission date: 25.4.2005 Start date of project: 1.10.2004 Duration: 3 years IAI, Saarbrücken Revision [3] Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006) Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Transcript of METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical...

Page 1: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

-

FP6-IST-003768

M E T I S - I I S t a t i s t i c a l M ac h i n e T r an s l a t i o n u s i n g M o n o l i n g u a l C o r p o r a :

F r o m C o n c e p t t o Im p l e m e n t a t i o n

Specific Targeted Research Project or Innovation Projects Future and Emergent Technologies

D2.1 User requirements: Analysis and Comments

Due date of deliverable: 31.1.2005 Actual submission date: 25.4.2005

Start date of project: 1.10.2004 Duration: 3 years IAI, Saarbrücken

Revision [3]

Project co-funded by the European Commission within the Sixth Framework Programme (2002-2006)

Dissemination Level PU Public X PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

Page 2: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

1

1. Introduction........................................................................................2

2. Aim of the Task and Methods..................................................................2

3. Translation Concept .............................................................................4

4. Translation Work Flows ........................................................................5

5. User Group and Setting .........................................................................7

5.1 The Composition of Individual User Groups..............................................................7 5.1.1 ILSP........................................................................................................................7 5.1.2 KU Leuven ................................................................................................................8 5.1.3 IAI ..........................................................................................................................8 5.1.4 UPF ........................................................................................................................8 5.1.5 Web-based Access ........................................................................................................8

6. Parameters, Validation ..........................................................................9

7. The Results of the Investigation............................................................. 11

7.1 ILSP ......................................................................................................................... 11 7.2 KU Leuven ............................................................................................................... 14 7.3 IAI............................................................................................................................ 14 7.4 UPF .......................................................................................................................... 14

8. Some Preliminary Conclusions .............................................................. 19

8.1 Greek........................................................................................................................ 19 8.2 Spanish ..................................................................................................................... 19

9. Contributors ..................................................................................... 21

10. References....................................................................................... 21

Appendix A: Questionnaire ..................................................................... 22

Page 3: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

2

1. Introduction

Deliverable 2.1 aims at identifying user requirements concerning the translation

procedure in a real environment, which will be taken into account for the design of METIS. One of the tools for detecting this type of information is a questionnaire.

In addition, task 2.1 relates to tasks in other WPs such as WP 5 concerning the ‘validation and evaluation’ of the METIS system. Task 2.1 could be seen as the first in a series of interactions between the developers and the (potential) users of the METIS product, thus realising a fundamental principle, namely to have user interaction from the very beginning (design phase) to the very end (delivery of the product) of the project.

This requires approaching the developer-user relation in a planned fashion that will be done in the following three main phases, where a. refers to task 2.1. in WP2 and b. and c. to WP5:

a. Initial: Help to define the system

b. Intermediate evaluation

c. Final evaluation

2. Aim of the Task and Methods

Identifying user requirements and taking them into account for the design of METIS, means that users define the quality and properties of METIS under the guid-ance of the developers so as to deliver a useful tool. It also requires finding out how the METIS system could be usefully integrated into a translator’s workplace and be applied in a real world scenario.

This is different from the evaluations foreseen under WP 5, which is about the evaluation of the quality of the output of the METIS system and make a statement about whether METIS is useful or not. This will happen on the basis of a working sys-tem while for 2.1. there is no such system available that can be given to the user. This makes a fundamental difference.

For all the stages of user involvement the following issues are relevant:

∗ There is a basic requirement for both types of investigations foreseen in the pro-ject, the definition phase and the evaluation phase, that the use of translation tools is to be an integral part of an overall translation software that supports the human translator in her work (Jeff Allen 2001).

∗ It has to be taken into account ‘what translation is’, i.e. a concept of translation has to be sketched and taken into account for the determination of the properties of the METIS system and the development of the questionnaire (meant to pin down these properties) as the use of a translation tool and its integration into a translation work flow has to take into account that a human translator will evalu-

Page 4: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

3

ate the usefulness of the system on the basis of this concept (or practice) of translation.

∗ Probably, the most relevant issue for the success of a translation tool in the translation process is ‘post editing’. Only if post editing can be shaped in a way that allows for a useful and easy handling of the output of machine translation, METIS will have a chance to be accepted. Whether this is possible is highly de-pendent on the quality and the shape of the METIS output.

The involvement of the users in the system design and development from the beginning is to avoid a (common) scenario known from the history of machine transla-tion: The computational linguists and computer scientists develop an MT system and hope that it fits with the requirements of the users. To avoid that, METIS must be de-signed in a way that takes care of what the users expect from a translation tool.

The post editing process is crucial for the overall usefulness of a translation tool. The main actions taken at post-editing time are: replacing, adding and deleting mate-rial.

An interesting topic of study is how and under which conditions these actions take place/are performed and which are the factors that make post editing efficient and successful. So,

∗ What are the (linguistically or otherwise defined) structures that could / will most easily /preferably be added?

∗ What are the (linguistically or otherwise defined) structures that could / will most easily /preferably be deleted?

∗ What are the (linguistically or otherwise defined) structures that could / will most easily /preferably be replaced?

A special type of translation tool, namely Translation Memories (TMs) deliver a similarity rating with the translation. This allows the user to choose between high re-call and low precision or vice versa. We consider this feature interesting and useful and we will adopt a similar approach in order to deliver a ‘certainty’ rating for transla-tions.

Page 5: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

4

3. Translation Concept

The investigation of the usefulness and shortcomings of the METIS system, its evaluation, as well as its design has to take into account a concept of translation, i.e. what the translator does when (s)he translates, how (s)he proceeds, which aspects (s)he takes into account when translating. In an ideal case one would take into account a concept of translation based on the investigation of the cognitive processes that define ‘translation’. However, a detailed translation theory is not necessary as we would like to stress only a number of points.

METIS is meant to become a translation tool that usefully supports the human translator in her work. If its design does not take into account what ‘human translation’ is it will fall short of the requirements. The concept of translation that people usually apply is too simple. The major point is that translation is merely defined on a level of linguistic semantics.

This simplified approach considers translation to be a ‘recoding’ of semantic units. Semantic units are denoted by source language specific items. Translation consists in simply replacing these source language specific items by denotations provided by the target language. Translation thus is a meaning related action that consists in replacing linguistic material. Modern translation theory has it that this is too poor a concept of translation as it neglects major dimensions of translation. Usually, this kind of translation that tries to ‘recode’ source language in a target language is called ‘equivalence theory of translation’ a term we will use in the following.

In a nutshell, some of the major points of modern translation theory are:

a. In linguistic (Saussurean) terms one could say that translation is not ‘langue-based ‘, but ‘parole-based’. Translation is about texts, thus ‘historical incidents, and not about the relations of two language systems. Their interpretation / un-derstanding is situational. Thus, a linguistic recoding in a target language might not be sufficient as these semantic units might induce different situational inter-pretations (due to a different historical situation) and thus be completely mis-leading.

b. Translation is about ‘communicative acts’ in the target language. As communica-tion is ‘culturally determined’, translation has a lot to do with cultures. Some translation theories consider translation as ‘cultural mediation’, i.e. as a recon-struction of communicative acts according to the specific rules of the target cul-ture.

c. Another decisive factor that is to be taken into account for modern translation theory is the so called ‘translation purpose’. A translation has to take into ac-count what the translation is made for.

d. The impact of the ‘translation purpose’ refers to the following. There is not ‘THE’ (appropriate) translation of a text. What counts as an appropriate translation de-pends on what the purpose of the translation is. Certainly, one of the translation purposes could be ‘be (linguistically) as equivalent as possible’. So, equivalence

Page 6: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

5

translation could be one of the translation purposes, but even equivalence will never be without some ‘cultural adaptation’.

Of course, translation tools will not be able to deal with an advanced concept of translation, but there are some consequences for METIS resulting from these consid-erations:

a. METIS should not claim to be able to deliver anything else than ‘equivalence translation’ of texts.

b. Even on the basis of technical texts and with equivalence as the translation pur-pose some of the above mentioned constraints (as far as the relevance of cul-tural adaptation is concerned) apply.

The most important aspect, however, is that human translation is the basis against which post editing will have to be evaluated.

4. Translation Work Flows

Information technology plays an ever increasing role in the translator’s everyday work. In order to understand better the needs of the modern translator, a couple of important work flows have been described in brief, so that the METIS II system can be successfully placed in a future scenario.

ILSP has had a long cooperation in the past with the European Commission for the development of translation tools such as EC-Systran. The translation workflow used in the EC is described below:

The translation workflow in the European Commission could be described as ideal, as translators have a variety of tools at their disposal. A common scenario is that users use TM, which they often combine with machine translation and replace-ment tools. If translation memories do not contain any suitable data, they look for it in reference documents in order to import it.

Reference material plays a major role in the translation process and there are many such text databases such as SdTvista and CELEX. Searching into these data-bases allows translators to know if parts of these texts have been translated in the past or if relevant texts have been translated.

If so, the reference documents are downloaded and a sentence alignment re-quest is launched. Then, alignment results are corrected before being imported into the translation memory. If at this point the translators want to use their personal translation memories they can, otherwise they can use Sdt’s.

Sdt has developed server translation memories for seven thematic domains, whose design has facilitated data sharing, for which translators have certain rights. Part of the document may then go through machine translation.

Page 7: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

6

Another tool used for highly repetitive documents is TMan. It is usually used for a number of regular publications like the bulletin. TMan replaces predefined strings from word up to paragraphs.

All in all, it is often the case that a TM retrieval is carried out and the remaining gaps are filled with MT results.

In addition, other tools used are Dragon Naturallyspeaking, a dictation system and Euramis on-line Concordance, which searches the Euramis translation memories for the texts entered in the text box.

For the four major languages of the European Commission. English, French, German, Spanish there are special terminology tools, which consult terminology data-bases in Eurodicautom, Euterpe and TIS.

FUPF has some knowledge about the workflow adopted by some companies lo-cated in Barcelona and elsewhere, where some of its students have made end-of-studies placements. In general all the experiences concern companies which basically deal with technical documentation (in fields as diverse as automation, insurances, fi-nances…). All cases we know of use translation specific technologies; in some cases our students have witnessed the very migration process to a technology enhanced work flow. In the following paragraph we try to summarise the common aspects to these experiences.

Translation is conceived of as a separate action (e.g., from document creation) and is performed by a specific department (or a specific group of individuals). Transla-tions are prepared and distributed by the person(s) responsible of the translation de-partment. They are carried out by individual translators, who do not belong to that department: they may be members of the staff of the company (outside the transla-tion department) or they are freelancers (in some cases the translation jobs are given to a translation agency). Translators are given the corporate translation memory and, very often, a data base containing the terminology used by the company. Translation is usually validated by fellow translators. Terminology and translation memory data bases are taken care of by the central team. This means that new translations are not entered into the TM database by the individual translators but by the people responsi-ble of the translation department; and that terminology lists are produced (or at least, checked) by them as well.

Page 8: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

7

5. User Group and Setting

For doing the study it is necessary to build a user group and develop appropriate means to get the relevant information.

Criteria for establishing the user groups are the following:

∗ The need for defining the properties of the METIS system and the quality and form of output the translator needs to be able to do her work efficiently. This fo-cuses on a specific item in the translation work flow, post-editing.

∗ Availability: It is also necessary to have access to users and be able to get the relevant information.

∗ This is the reason why user groups were composed from the following sources:

∗ Members of university schools specialising in translation, advanced students, teachers

∗ Professional translators in different professional contexts

Other scenarios that should be established require a different setting.

5.1 The Composition of Individual User Groups

5.1.1 ILSP

In Greece due to the economic pressures of economic recession and the global-ization of business activities most of the translation work is outsourced and not done in-house. As a result, there are no big translations companies, only small ones with 5-7 people personnel while the rest of the work is done on an external basis. It is worth noting that even the Ministry of Foreign Affairs and the Hellenic Organization for Stan-dardization, bodies with continuous translation needs, employ very few translators and outsource most of their work. The same goes for the private sector as well, for example big translation companies that undertake translations for the Commission, have only a limited number of personnel and mainly outsource it, too.

The Metis User Group- ILSP: With the term user we mean someone who actu-ally or potentially makes a specific use of something. Users need not be customers; most users are probably not customers, given the centralization of purchasing author-ity in many organizations.

When trying to set up the local user group, our main aim was to try to spot users with the following profile:

∗ Many years of specialization in a particular field.

∗ Users of translation memories on a regular basis.

∗ Users willing to experiment with various translations tools.

Page 9: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

8

∗ Experienced Users: a list was prepared with experienced translators, who were briefly interviewed before being asked to participate in the survey.

∗ Translators who have mostly worked with technical texts.

5.1.2 KU Leuven

The KUL user group consists mainly of teachers and (advanced) students of uni-versity colleges (Departments of translation and interpretation) in Brussels, Ghent and Antwerp. Next to these some professional translators are involved.

This user group is expected to be aware of the latest developments in (machine assisted) translation and should be able to give the project good advice with respect to the needs of translators.

5.1.3 IAI

IAI’s user group consists of three different subgroups. One is a set of technical authors that work for the automotive industry. Their work will be investigated the way it has been described above in detail.

A second group is a set of advanced students from the university of Saarbrücken. They will experiment with general texts such as newspaper texts.

A third group consists of members of a small translation company that works for insurances and also in the technical field. They will experiment with texts from their areas.

5.1.4 UPF

UPF user group is formed by professional translators, teachers of translation, students of translation and people working on the development and/or teaching of tools for translators. It thus covers the whole range of people related to translation who may provide some interesting ideas with respect to what an adequate approach to MT should be.

At the moment of analysing the results the user group is formed by 10 individuals, most of them are either professional translators or translation teachers.

5.1.5 Web-based Access

For efficient collection of user responses, ILSP has created (and hosts) a server that is accessible from all over the world through the internet. It allows for automati-cally filling out questionnaires which are stored on this server. An automatic statistics is made on the answers separated by languages.

This is an extremely user friendly infrastructure which will be used in

the future for all interactions with user groups. The server might very well be used also for other easy communication and collection of data.

Page 10: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

9

6. Parameters, Validation

This section is about the parameters we are going to apply to assess the results of the first round of user involvement. There is first the questionnaire with different sections that are to provide information about different parameters.

The questionnaire has six big blocks each of which is important for the evaluation of the result.

1. Technical skills of the user and the technologies the user is acquainted with: The use of (translation related) technologies is a basic condition for a useful assess-ment of the validity and relevance of the statements of a given user. If the user has not so much experience with such technologies, statements might be less valuable e.g..

2. Translation scenario the user works in. This comprises text sorts, the types of translation, the question whether the translation requires a lot of cultural adapta-tion or only textual recoding. It also includes the tools the translator uses in rela-tion to the translation scenario (especially the type of translation). It determines quite in detail how translation takes place and which tools are used with which level of expertise.

3. The third large section of the questionnaire is about MT and other translation tools such as translation memories. The user’s attitude towards these tools is de-termined in detail.

4. The fourth block of questions is about pre-editing that may become relevant in the METIS scenario.

5. Post editing: The fifth section is about post-editing. It tries to pin down in detail what the post-editing of users consist of, what their attitude about post-editing is. The purpose of this section is to exactly know about how the process looks like, in order to optimally adapt the METIS tool to this process.

Post editing is most decisive for a successful design of the METIS system. At a later stage in the project ‘post-editing’ has to be further specified.

The following parameters for post-editing are suggested by Jeff Allen. He differ-entiates between different levels of post-editing:

∗ Minimal PE

∗ Rapid PE

∗ Partial PE

∗ Maximum PE

∗ Full PE

Page 11: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

10

It will be dependent on the translation scenario which of these types of post-editing we are going to investigate. The most interesting approach is the ‘full post-editing’ that is appropriate for high quality translations such as ‘technical documenta-tion’. This sort of post-editing has to be applied to produce ‘publishable’ texts and it is the sort of post editing we have in mind for the questionnaire.

The other sorts may be applied to texts that require less good translation quality. Those sorts of post-editing that imply a rather quick and dirty approach are appropri-ate for translation scenarios such as ‘informative translation’.

Another question that is to be discussed and attention has to be paid to is whether or not and if yes to which extent the post-editing methodology is system de-pendent, text dependent, language-pair dependent. This is information that has to be extracted from the information we get through the questionnaire.

A final problem that has to be addressed and that has to have consequences for the further development of METIS is how the cycle of post-editing and system devel-opment could work in the METIS scenario. Such cycles are foreseen as essential for MT system development by Allen.

6. A final section in the questionnaire tries to find out about the expectations of the users as far as newly designed translation tools are concerned. Again, a decisive question is the relation between accuracy and quality of the translation the tools should be able to deliver.

All the parameters have to be weighted. The core of the investigation, though, is to determine the way post-editing works best and derive consequences for the design of the METIS tool.

Page 12: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

11

7. The Results of the Investigation

7.1 ILSP

Section I refers to the technical devices, relevant to their translation needs that users are acquainted with. It aims at providing the user’s profile.

In general, the majority of users considered themselves as "Expert/advanced user" 5 (71,43 %), and only a minority considered itself as "Intermediate/skilled user" 1 (14,29 %) or "Beginner" 1 (14,29 %). This is mostly due to the fact that it was agreed among all partners that the majority of users should be professional translators.

All users are familiar with translation memories with Trados being the most popular one. Only a small percentage uses a term extraction tool, but those who do, use it on a frequent basis. Term Extract is the most popular tool. Users are not famil-iar with machine translation systems. This is partly due to the fact that there are not many MT systems including Greek in their language pairs. In terms of word proces-sors all users are familiar with Word.

Section II refers to the actual tools used and the type of documents translated by the users in their daily work. As regards the document type, the majority of texts translated belong to the technical domain with a percentage of 100%. Financial texts come second while medical documents and software localization follow. The main aim of the translators for these texts is to achieve linguistic equiva-lence between the source and target text. Word and Quark Express are the main edi-tors used while Multiterm is the basic terminology database used. Very few users use term extraction software and that is Term Extract. No pre-editing tools are used or controlled language checkers. This is not surprising as most translators in this survey translate into Greek, for which there are only few language checkers developed. Most users use spellcheckers and grammar checkers for post-editing. Regarding the ques-tion, ‘What would you like to have in addition?” The answers given were:

∗ Advanced building software for glossaries.

∗ Faster and more accurate grammar checkers with many agreement features.

∗ A tool that would check if the corrections are implemented in the texts and also if the right number or gender are inserted in the final text could be very useful.

SECTION III explores the users’ attitude towards machine translation and translation memory tools.

The majority of users have never used an MT system 71,43 %, so have skipped the respective questions. The few that have, think they are of no or little use. The reasons they mention is that they depend on an always up-to-date dictionary that is difficult to maintain and that too much post-editing required, which is a very time-consuming procedure and many times it is difficult to catch all the mistakes made by the MT system. Only one user made a suggestion for the improvement of MT systems,

Page 13: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

12

which was “An easier to handle dictionary and the ability of the system to easily ad-just to different types of documents”. Generally, they avoided remarks concerning MT.

With respect to the use of translation memories all users regard them as useful tools and think of them as powerful and useful tools. The positive points were well ex-pected, consistency of terminology, time-saving etc, while the most important nega-tive points are mentioned below:

∗ Heavy software, boring interface.

∗ You need to be very careful when choosing the one that you should use. In documents where there are large sections of pre-translated text you might at some point get the feeling of losing the document's coherence.

∗ Sometimes, the draft translations are maintained in the memories and then re-appear in some future text.

∗ Need of regular terminology management & revising, especially when used for multiple text types by many translators simultaneously.

∗ Sometimes they do not have the ability to detect minor differences in the text that may result in problems.

SECTION IV refers to pre-editing in the translation workflow.

In general the users do not use pre-editing in their translation workflow. An in-teresting comment came from a user who stated that: ‘I don't use such a tool. We are not normally allowed by the client to edit the source text at all. This causes inconsis-tencies in their translation memories.’

SECTION V refers to post-editing.

The most frequent actions in post-editing were moving words or phrases from one place of the sentence to another and replacing wrong words or phrases. The least frequent action is adding phrases. It is often the case that whole sentences have to be retranslated. Most of the respondents find the post-editing tasks interesting. It is only occasionally that a sentence translated on a sentential level has to be retranslated on a textual basis. 70% is the average threshold used in translation memories. There are no specific tools used for post-editing.

SECTION VI refers to the expectations from a newly designed translation software. This is the most interesting part of the questionnaire as it gives us hints about the gap in the market that should be covered. Therefore answers are cited:

“To offer all Trados features plus to be a light software, with good technical sup-port at cost efficient price and offer the chance to evaluate it before buying it.

∗ Fuzzy matches recognition and handling improvement.

∗ Full TMX compliance

Page 14: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

13

∗ Improvement in terms of accuracy, speed and reliability --Enhanced speed and accuracy.

∗ Fast and seamless in the translation process.

∗ The translator's software should not interfere with the translator's flow of thought.

∗ Integration in a single TM environment of those features most existing TM soft-ware already provides.

∗ Spelling and grammar checking tools for multilingual projects as a default feature (no need to add on such features).

∗ Preview tool with direct editing features in the preview mode and auto correction in the TM.

∗ Simplified term extraction tools.

∗ Detailed distinction of speech parts.

∗ I would expect increased syntactical and grammatical correctness.

∗ I think that this will be difficult regarding the complexity of the Greek language. Enhanced speed and accuracy.”

With respect to the main feature “reliability” the following seems to be the most important feature.

With respect to the question about the features that would most likely be in-cluded in the post-editing stage. Lexical compliance and grammatical coherence and localisation features were considered necessary.

Which is the sort of ‘similarity’ with the ‘ideal translation’ that makes you consider post editing at all? For example, what kind of mistakes would you consider as trivial when post editing.

To the question regarding the sort of ‘similarity’ with the ‘ideal translation’ that makes the translator consider post editing at all the following answers were given:

∗ spelling and punctuation errors

∗ A word missing here and there, a dot or a comma. “Post editing surely gives me the chance to have a global overview of the translation”.

∗ subject verb agreement

∗ spelling, grammatical errors

∗ verb tenses, singular/plural number distinction, adjectival vs adverbial usage (however, always depending on the situation)

∗ the most trivial error is the synonyms that are presented by the tools and the words with similar morphology. When the structure is incorrect, the text needs re-translation and therefore the whole procedure becomes very time-consuming.

Page 15: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

14

Concerning what is considered by users to be the easiest task when post-editing, deleting words and phrases came first, while moving words and phrase from one place of the sentence to another came considered to be most difficult. In addition, modifica-tion to noun phrases is considered to be easier than modifications to verb phrases.

All in all, the answers given by the users were to a great extent expected and proved to be quite useful as they verified our initial thoughts.

7.2 KU Leuven

The process of filling out the questionnaire has started. Some of them are in the database. However, the set of those available is not sufficient yet to draw conclusions.

The analysis and evaluation of the data from the Belgian user group will be handed in shortly.

7.3 IAI

The same situation holds for IAI. The German contribution will be handed in Mid May the latest.

7.4 UPF

At the moment of writing this report the questionnaire has been answered by 10 people, mainly professional translators and translation teachers.

The results are summarised by sections in the questionnaire.

Section on software skills

The people in our user group are reasonably skilled:

∗ they all know of text processing and translation memory tools, all but one know of terminology processing and machine translation tools, though only 4 know of terminology extraction tools.

∗ these tools are used at different levels:

∗ text processing tools are frequently used (9) and only one person uses them only occasionally

∗ terminology processing and translation memories are used frequently by 4 people, and occasionally or rarely by the rest (only one person never uses terminology processing tools)

∗ terminology extraction and machine translation are used less frequently: 3 people use them frequently

∗ finally, 5 people consider themselves advanced users; 3, intermediate users; and only 2, beginners.

Section on the translation procedure

Page 16: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

15

∗ Most of the people translate technical documentation (8), some translate brochures (3) and general texts (3), and a few translate legal texts (2), arts and humatities texts (2).

∗ Most of the people adopt the linguistic equivalence strategy (7), but some try to follow the cultural adaptation one (4). One works mainly on software localisation, and one tries to balance between both strategies. Finally one acknowledges that every translation job is different.

∗ A few tools are mentioned: Microsoft Office/Word (7), WebBudget (3), TextPad (2), UltraEdit (2), Déjà vu (2), Trados (2), Adobe Acrobat (1), FrameMaker (1), Word CAT (1).

∗ A terminology database is used by 5 people (TermBase, TermStar, Trados and Déjà vu).

∗ Only 2 people use a terminology extraction tool (Trados and their own tool).

∗ Only 2 people use a pre-editor (Word corrector, and one has it done by a third party).

∗ 8 people use translation memories (Déjà vu (5), Trados (2), Transit (1), Foreign Desk (1)).

∗ 6 people use machine translation (internostrum (3), comprendium (3)).

∗ 4 people have access to the terminology management tool from the authoring tool, and 4 have it integrated with the translation memory.

∗ Nobody has specific facilities for postediting.

∗ One asks for a good grammar checker, and another one for a better integration of term databases with translation memories.

Section on MT and TM

∗ 8 have a positive attitude towards MT.

∗ MT helps somehow (5), quite a lot (3).

∗ MT reduces the translation time somehow (4), quite a lot (4).

∗ 8 people think MT has several deficiencies:

linguistic errors (names, prepositions, agreement, premodification…) (4).

Work better between close languages (2).

They cannot be modified, and ‘silly’ errors are repeated all the time (2).

They may change the document formatting (1).

They are not easily integrated with TMs (1).

They are expensive (1).

∗ 8 people suggest some improvements to MT:

Page 17: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

16

easy intergration with TM and/or user made lists of equivalences (2).

Adaptation to specific domains (1), and more complete dictionaries (1).

Better user interaction (1).

Use of genuine target structures (1), and to be trained with data (1).

To add more rules to disambiguate premodification (1).

∗ 4 people indicate what they dislike most:

translations are too close to the original text (2).

Postediting is not facilitated (1).

It only works well with controlled language (1)

∗ 5 people indicate that they like most the speed and the time it spares.

All have a positive attitude towards TM.

∗ 9 people list negative points of TMs:

they segment by sentences (2), and that they do not have context (1)

the maintenance is a problem (2), and errors may spread (1)

most tools do not allow to work with very big memories (1)

they do not provide tentative solutions to ‘unknown’ segments (1)

they raise copyright issues (for freelancers) (1)

they are not easily integrated with MT (1)

∗ 9 people list positive aspects of TMs:

they speed up the translation process, save time (5)

they remember previous translations and prevent repeating work (4)

they enhance consistency (4)

∗ 9 people think TMs do not ‘devalue’ their work as translators

∗ 9 people think that translation memories help with the tedious work and leave the more creative part for the translator

∗ TMs save time: quite a lot (8), some (2).

Section on pre-editing

∗ Only one person has pre-editing in their work flow

∗ 4 people are prepared to include it

Page 18: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

17

∗ 8 people think it would be useful

∗ nobody has a controlled language tool in their work flow

Section on post-editing

∗ The actions in post-editing are ordered as follows (starting with the most frequent ones):

Replace wrong phrases

Add words

Moving words/phrases from one place in the sentence to another

Fill in missing words

Replace wrong phrases

Add phrases

Fill in missing phrases

∗ 4 people think they have to retranslate the whole sentence occasionally; and 6 people think they have to do it rarely

∗ 7 people find post-editing interesting; 3 people find it boring

∗ textual considerations force a correct sentence at sentence level to be modified rarely (3), occasionally (4) and frequently (3).

∗ degree of similarity used: 70% (2), 70-75% (1), 70-90% (1), 75% (2), 80% (1), 85% (1), 95% (2)

∗ only one person has specific support for post editing

Section on expectations with respect to a new tool

∗ they would expect from a newly designed translation software:

∗ integration of tools (TM, MTs, databases, corpora…) (3)

∗ flexibility (2), and robustness (1) or user friendliness (1)

∗ obtaining translation of phrases when sentences are not in the memory (1)

∗ to get more genuine target language sentences (1)

∗ a rapid and efficient high quality translation tool (1)

∗ they would expect the following features from a translation software (listed by importance given):

accuracy

reliability

speed

Page 19: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

18

∗ they would expect the following features in the post-editing facility (5 people answered):

grammar checker

degree of reliability (marked with different colours)

easiness of copy-and-paste, drag-and-drop, operations

facilities to check easily the original sentence

∗ the sort of similarity that makes them consider post-editing at all is (4 answers given):

register or style

use of formatting marks

85%

trivial errors

∗ the most easy tasks are (ranked per easiness):

Add words or phrases

Delete words or phrases

Replace words or phrases

Moving words/phrases from one place in the sentence to another

∗ they prefer to handle (ranked by preference):

verbal phrases

nominal phrases

Summing up:

∗ The integration of MT with TMs is seen as an essential component (as well as with other tools: term databases, corpora access…)

∗ MT results should be editable in a similar way as TMs allow

∗ Offering partial solutions seems to be seen as adequate by some translators

∗ Target language ‘fluency’ seems to be desirable

Page 20: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

19

8. Some Preliminary Conclusions

All in all 20 questionnaires are available so far. The information that can be re-trieved from the answers already at this point allows to draw conclusions which are very relevant for the project.

On a general level it is striking that there is a split according to the attitude to-wards MT. The Greek users are totally negative about MT. They think that MT is basi-cally useless. This is totally different with the Spanish user group. They are rather positive in contrast. This may be due to the different quality of MT that is available for the two languages.

A bit more in detail the points to be used for general conclusions at this point shall be summarised:

8.1 Greek

The Greek users were chosen according to a profile that requires expertise in translation relevant to IT technologies. So, it is not amazing that only technologically competent persons have filled in the questionnaire.

Translation work flow: Technical documentation is the main scenario, linguistic equivalence is the dominant translation sort. Only ‘wide spread’ technologies are used. For post editing, spell checkers and grammar checkers are used.

Wishes: Dictionary building should be better supported and better grammar checkers for target language should be available.

MT: The vast majority of people have no experience. The rest is completely negative, probably due to the Greek language situation. Reasons: The quality is too bad, the dictionary too difficult to maintain, and too much post editing required.

TMs are considered powerful and useful tools: Most positive points are consis-tency of terminology, time saving. Negative: ‘Heavy’ software, problems with simulta-neously used TMs (rather administrative).

No pre editing tools are used.

Post editing: Word reshuffling is the most frequent action. Sentences often have to be retranslated. Post editing is nevertheless considered a very interesting task.

Wishes of the users as far as translation software is concerned: It could be sum-marised that they want a ‘lighter’ Trados. A better TM functionality (without interfer-ence with flow of thought of translator), simplified term extraction etc.

As far as post editing is concerned the errors that are considered easy to handle are spelling and simple grammar errors like subject verb agreement, verb tenses, punctuation, handling of synonymy.

8.2 Spanish

The user group is composed of experienced users. In general, they are well ac-quainted with the IT technologies available for translation. This applies to the ‘core’

Page 21: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

20

technologies, less to technologies like ‘terminology mining’. As far as the translation work flow is concerned: The translators mostly work on technical documentation heavily using translation technologies. The sort of translation relevant is mainly ‘equivalence’. Most people use TM, some MT, all of them terminology management, no specific post editing tools.

The attitude towards as far as MT is concerned is very (unexpectedly) positive. All of them say that MT reduces translation time (some even say ‘considerably’). They make concrete proposals for improvements of the situation such as that they want more ‘complete dictionaries’. Maintaining dictionaries seems to be a problem for them. They also complain about MT not being integrated with TM. They like most ‘speed’.

TMs are considered very positive: They save time, avoid repetition of work. Translators do not think that their work is ‘invalidated’ by often being ‘post editors’. On the contrary, they mainly think that TMs support them so that they can concen-trate more on the creative work. TMs save them a lot of time.

Pre-editing is not widely used.

Post editing mainly consists of replacement of words or phrases, fill in words etc.. Occasionally, there has to be complete retranslation for some sentences but oth-ers say that this happens rather rarely.

The majority of people find post editing interesting, a minority finds it boring. This is a very interesting result.

Expectations for new software: integration of tools, better quality of pre-translation.

They wish to have a better post editing tool with better copy-paste-facilities, drag and drop and they want checking on the target side such as grammar checking etc.

The most interesting aspect about the answers of the Spanish user group is their general openness towards translation tools inclusive of MT. Obviously, they are able to make positive use of the tools and appreciate that these tools speed up the process. The different attitude by the Greeks may have to do with the quality of MT available for Greek.

Another aspect is also very encouraging, namely the positive attitude towards post editing. Post editing will be a major aspect for a translation scenario that involves METIS as a translation tool.

A third point that is very relevant is that users’ interest is to have a ‘light’ overall software package that integrates all the different functions. ‘Light’ probably means ‘to have an easy to use interface that integrates functions in an intelligent way’. Such packages are not the topic of METIS at this phase of development of the particular system, which is a phase of high risk research. However, such issues will be the future for the translation software market. It would be advisable to design METIS in a way that will allow it to fit with such a scenario.

Page 22: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

21

9. Contributors

A list of the people who have contributed to this report follows:

Paul Schmidt, GFAI

Olga Yannoutsou, ILSP

Ineke Schuurman, KUL

Gemma Boleda, FUPF

Stella Markantonatou, ILSP

Toni Badia, FUPF

10. References

Jeff Allen (2001): ‘An integrated part of a translation software program’ in: Language International Magazine pp. 26-29.

http://europa.eu.int/comm/dgs/translation/bookshelf/tools_and_workflow_en.pdf

http://europa.eu.int/comm/translation/reading/articles/pdf/1998_01_tt_blatt1.pdf

Page 23: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

22

Appendix A: Questionnaire

Approximate time required for filling in this questionnaire:

Personal and Contact Information Country:

First Name:

Last Name:

E-mail:

SECTION I: Software Skills I. Which technical devices relevant for translation do you know about?

1. Text Processing Word DTP Other (Please specify)

To which extent have you been using this technology for your profession?

frequently occasionally rarely never

2. Terminology processing-Storing

Trados Termstar Other (Please specify) None

To which extent have you been using this technology?

frequently occasionally rarely never

3. Term Extraction Term-o-nizer Other (Please specify) None

To which extent have you been using this technology?

frequently occasionally rarely never

Page 24: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

23

4. Translation Memories

Trados Star Déjà Vu Other (Please specify) None

To which extent have you been using this technology?

frequently occasionally rarely never

5. Machine translation (MT)

T 1 Personal Translator Prompt Systran Other (Please specify) None

To which extent have you been using this technology?

frequently occasionally rarely never

II. How would you rate your overall level of expertise in using the abovemen-tioned software?

Expert/advanced user Intermediate/skilled use Beginner

SECTION II. Translation procedure. Sorts/Types of texts – Software you use in your working environment:

1. Which text sort/type do you mainly work with?

Technical documentation Brochures / advertising material Financial texts Legal texts Other (Please specify)

2. Translation purposes: which is the main concept underlying the translation proce-dure you employ for the majority of the texts you translate:

Linguistic equivalence Cultural adaptation Other (Please specify) None

3. Software: Which software support do you have for your work? Which editor, au-thoring or DTP system do you mainly use in your work?

Page 25: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

24

4. Do you use a terminology database? No Yes (Please specify)

5. Do you use a terminology extraction software?

No Yes (Please specify)

6. Do you do any pre-editing or use con-trolled language checkers?

No Yes (Please specify)

7. Do you use a translation memory? No Yes (Please specify)

8. Do you use a machine translation sys-tem?

No Yes (Please specify)

9. How are the tools integrated? Please tick the appropriate answer.

Access to terminology management from the authoring tool Update of terminology from the authoring tool Integration with TM The tools are not integrated

10. Do you have any specific facilities for post editing?

No Yes (Please specify)

11. What would you like to have in addi-tion?

SECTION III: Machine Translation and Translation Memory tools

Dependent on what you use/have available, please answer the following questions:

A. Machine Translation If you use machine translation what is your attitude towards it:

Positive Negative I never use it *

*If you have ticked this box, you can skip the rest of SECTION III Part A

1. Do you think that machine translation does offer you help in your work?

Quite a lot Some Very little None

2. Measurement of saved time by using MT: How much time do you save by using machine translation?

Quite a lot Some Very little None

Page 26: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

25

3. Which is the worst deficiency of MT sys-tems?

4. Which improvement would have the big-gest effect on efficiency?

5. What do you dislike most about MT?

6. What do you like most about MT?

7. Do you have any other remarks con-cerning MT?”

B. Translation memories: What is your attitude towards translation memories?

Positive Negative Indifferent

1. What are the negative points of transla-tion memories?

2. What are the positive points of transla-tion memories?

3. Do you think that translation memories ‘devalue’ your work as a translator

No Yes

4. Do you think that translation memories help with the tedious work and leave the more creative part for the translator?

No Yes

5. Measurement of saved time by using translation memories: Roughly, how much time do you save by using translation memory?

Quite a lot Some Very little None

SECTION IV: Pre-editing

1. Do you have pre-editing tasks in your workflow?

No Yes

2. Would you be prepared to ‘pre-edit’ your texts?

No Yes

3. Do you think it is useful? No Yes

4. Do you have a controlled language tool in your work flow in order to support your pre-editing?

No Yes

5. If so, what does it mainly check? Grammar Corporate style Appropriateness for machine translation?

Page 27: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

26

6. Can you configure it yourself? No Yes

7. Do you like/dislike this tool? Please ex-plain why.

SECTION V: Post editing

1. A translation memory provides you with raw text. For those items the translation memory has found a solution, your task is to ‘post edit’. In ‘post editing’ what is the most frequent action you have to do?

Please rank them from 1 to 7 (where 1 is the most frequent) Fill in missing words Fill in missing phrases Replace wrong words Replace wrong phrases Add words Add phrases Moving words/phrases from one place in the sentence to another Other(s)? Please specify

2. How often do you have to retranslate the whole sentence?

frequently occasionally rarely never

3. What are your feelings about ‘post edit-ing’?

I find it interesting I find it boring but useful I find it boring. I’d rather do the entire job myself

4. How often is it that a translation re-trieved from the database is correct on sentential level (or segment level) but has to be corrected later on the basis of textual considerations?

frequently occasionally rarely never

5. What is the degree of similarity you use as threshold in your translation memory?

6. Do you have specific support for post editing by using a special tool?

No Yes

Page 28: METIS-II - Institute for Language and Speech Processing · FP6-IST-003768 METIS-II Statistical Machine Translation using Monolingual Corpora: From Concept to Implementation Specific

D2.1 User Requirements: Analysis and Comments

27

SECTION VI: Expectations from a newly designed translation software

1. What would you expect from a newly designed translation software?

2. What features do you expect in general from a translation software?

Please rank them in terms of priority, where 1 is the most frequent. Accuracy Speed Reliability Other(s)? Please specify

3. Which features would you like most to be included in the post editing stage?

4. Which is the sort of ‘similarity’ with the ‘ideal translation’ that makes you consider post editing at all? For example, what kind of mistakes would you consider as trivial when post editing.

5. Which task is easier when post editing?

Please rank them from 1 to 4, where 1 is the easiest. Replace words or phrases Add words or phrases Delete words or phrases Moving words/phrases from one place in the sentence to another Other(s)? Please specify

6. Are there any preferences for replacing, adding, deleting, moving as far as the sort of phrases are concerned?

Please rank them from 1 to 2 in terms of preference, where 1 is the highest. Nominal Verbal Other(s)? Please specify