Bow Fis Comaitr

download Bow Fis Comaitr

of 11

Transcript of Bow Fis Comaitr

  • 7/31/2019 Bow Fis Comaitr

    1/11

    Computer-Aided Translation

    Lynne Bowker and Des Fisher

    1. Introduction

    Computer-aided translation (CAT) is the use of computer software to assist a human

    translator in the translation process. The term applies to translation that remains primarily

    the responsibility of a human, but involves software that can facilitate certain aspects of it.

    This contrasts withMACHINE TRANSLATION(MT), which refers to translation that is

    carried out principally by computer but may involve some human intervention, such as pre-

    or post-editing. Indeed, it is helpful to conceive of CAT as part of a continuum of

    translation possibilities, where various degrees of machine or human assistance are

    possible.

    After recognizing that fully automatic MT was a formidable challenge, researchers

    gradually turned their attention to CAT in the 1960s, starting with the creation of term

    banks, which used computers to store large volumes of structured information (see

    TRANSLATION AND TERMINOLOGY). Advances in computing and computational

    linguistics in the late 1970s and early 1980s spurred the development of modern CAT

    tools, which rely on computers not only for storing information, but also for actively

    searching and retrieving it. Visionaries such as Martin Kay (1980), among others,

    conceived of tools that became the backbone of CAT. However, it was not until the mid-

    1990s that these tools became widely commercially available. Since then, the rapid

    evolution of technology has made CAT accessible, affordable, popular and even necessary

    to help translators in this globalized information age tackle the enormous volumes of text

  • 7/31/2019 Bow Fis Comaitr

    2/11

    to translate in ever shorter turnaround times (Esselink 2000; Lagoudaki 2006;

    GLOBALIZATION AND TRANSLATION; LOCALIZATION).

    2. CAT Tools

    Since many translators are avid technology users, a wide range of tools could fall under the

    heading of CAT. However, this term is typically reserved for software designed

    specifically with the translation task proper in mind, rather than tools intended for general

    applications (e.g. word processors, spelling checkers, e-mail).

    The most popular and widely marketed CAT tool in use today is the Translation

    Environment Tool (TEnT)sometimes referred to as a translators workstation or

    workbenchwhich is in fact an integrated suite of tools. Individual components differ

    from product to product; however, the main module around which a TEnT is generally

    constructed is a translation memory (TM), which, in turn, most often functions in close

    association with a terminology management system (TMS).

    2.1. Translation Memory Tools

    A TM is a tool that allows users to store previously translated texts and then easily consult

    them for potential reuse. To permit this, the source and target texts are stored in a TM

    database as bitexts. An aligned bitext is created by first dividing the texts into segments

    which are usually sentencesand then linking each segment from the source text to its

    corresponding segment in the translation.

    When a translator has a new text to translate, the TM system first divides this new

    text into segments and then compares each with the contents of the TM database. Using

  • 7/31/2019 Bow Fis Comaitr

    3/11

    pattern-matching, the TM system tries to identify whether any portion of the new text has

    been previously translated as part of a text stored in the TM database. When the TM system

    finds a match for a given segment (see Table 1), it presents the translator with the match

    from the TM database, allowing him or her to see how that segment was translated last

    time and to decide whether that previous translation can be usefully integrated into the new

    translation (see Figure 1). In keeping with the CAT philosophy, where tools are designed to

    assist rather than replace the translator, he or she is never forced to accept the matches

    presented by the TM system; these are presented only for consideration and can be

    accepted, modified or rejected as the translator sees fit.

    Exact match A segment from the new text is identical in every way to a segment stored

    in the TM database.

    Full match A segment from the new text is identical to a segment stored in the TM

    database save for proper nouns, dates, figures, etc.

    Fuzzy match A segment from the new text has some degree of similarity to a segment

    stored in the TM database. Fuzzy matches can range from 1% to 99%, and

    the threshold can be set by the user. Typically, the higher the match

    percentage, the more useful the match; many systems have default

    thresholds between 60% and 70%.

    Sub-segment

    match

    A contiguous chunk of text within a segment of the new text is identical

    to a chunk stored in the TM database.

    Term match A term found in the new text corresponds to an entry in the termbase of a

    TM systems integrated TMS.

    No match No part of a segment from the new text matches the contents of the TM

    database or termbase. The translator must start from scratch; however,

    this new translation can itself be sent to the TM database for future reuse.

  • 7/31/2019 Bow Fis Comaitr

    4/11

    Table 1. Types of matches commonly displayed in TMs.

    New segment to translate Instructions for reconstitution and intramuscular

    administration.Fuzzy match andcorresponding translation

    retrieved from TM

    database

    EN: Instructions for reconstitution and subcutaneousadministration.

    FR: Directives pour la prparation du produit et

    ladministration sous-cutane.

    Figure 1. An example of a fuzzy match retrieved from a TM database.

    2.2. Terminology Tools

    While TM systems are at the heart of TEnTs, they are typically integrated with terminology

    tools, which can greatly enhance their functionality. A terminology management system

    (TMS) is a tool that is used to store terminological information in and retrieve it from a

    termbase. Translators can customize term records with various fields (e.g. term, equivalent,

    definition, context, source), and they can fill these in and consult them at will in a

    standalone fashion. Retrieval of terms is possible through various types of searches (e.g.

    exact, fuzzy, wildcard, context).

    However, termbases can also be integrated with TM systems and work in a more

    automated way. For instance, using a feature known as active terminology recognition, a

    TMS can scan a new text, compare its contents against a specified termbase, and

    automatically display records for any matches that are found. If desired, users can further

    automate this step by having the system automatically replace any terms in the text with

    their target-language equivalents from the termbase. Consistency and appropriateness are

    maintained by using termbases specific to certain clients or text types.

  • 7/31/2019 Bow Fis Comaitr

    5/11

    2.3. Other TEnT Components

    TEnTs include more than just TMs and TMSs. Table 2 summarizes the functions of other

    common TEnT components. For more detailed descriptions, see Bowker (2002), Somers

    (2003a/b) or Quah (2006), as well as the entries forCORPORA IN TRANSLATION

    STUDIES,LOCALIZATION,MACHINE TRANSLATIONand TRANSLATION AND

    TERMINOLOGY.

    Component Brief description

    ConcordancerSearches a (bi)text for all occurrences of a user-specified character

    string and displays these in context.

    Document

    analysis module

    Compares a new text to translate with the contents of a specified TM

    database or termbase to determine the number/type of matches,

    allowing users to make decisions about which TM databases to

    consult, pricing and deadlines.

    Machine

    translation system

    Generates a machine translation for a given segment when no match is

    found in the TM database.

    Project

    management

    module

    Helps users to track client information, manage deadlines, and

    maintain project files for each translation job.

    Quality control

    module

    May include spelling, grammar, completeness, or terminology-

    controlled language-compliance checkers.

    Term extractor Analyzes (bi)texts and identifies candidate terms.

    Table 2. Some common TEnT components.

    3. Impact on translation

    With much touted benefits of increased productivity and improved quality being hard to

    resist, the incorporation of CAT into the translation profession has been significant: CAT

    tools have been adopted by many translation agencies, governments, international

  • 7/31/2019 Bow Fis Comaitr

    6/11

    organizations, transnational corporations and freelance translators (e.g. Joscelyne 2000: 94;

    Jaekel 2000: 159-160; Lagoudaki 2006). However, organizations and individuals must take

    into account their unique needs and, in light of these, must evaluate the costs and benefits

    of CAT before adopting it. The introduction of a new tool will almost certainly impact the

    existing workflow, and the use of TMs in particular can affectboth positively and

    negativelythe translation process and product.

    Given the way TMs operate, any gain in efficiency depends on the TMs ability to

    return matches. Texts that are internally repetitive or that are similar to others that have

    already been translated (e.g. revisions, updates and texts from specialized fields) will tend

    to generate useful matches. Texts that are less predictable, such as literary works or

    marketing material, will not. Once matches are found, simply being able to automatically

    copy and paste desired items from the TM database or termbase directly into the target text

    can save translators typing time while reducing the potential for typographic errors.

    However, significant gains in productivity are usually realized in the medium to long term,

    rather than in the short term, because the introduction of CAT tools entails a learning curve

    during which productivity could decline. Moreover, a number of translators find these tools

    so challenging to use that they simply give up on them before realizing any such gains

    (Lagoudaki 2006).

    With regard to quality, CAT still depends on human translators; should a client

    have a TM that it wants used for its work, the translator has no control over its contents.

    Furthermore, the segment-by-segment processing approach underlying most TM tools

    means that the notion of text is sometimes lost (Bowker 2006). For example, translators

    may be tempted to stay close to the structure of the source text, neglecting to logically split

  • 7/31/2019 Bow Fis Comaitr

    7/11

    or join sentences in the translation. To maximize the potential for repetition, they may

    avoid using synonyms or pronouns. Moreover, in cases where multiple translators have

    contributed to a collective TM database, the individual segments may bear the differing

    styles of their authors, and when brought together in a single text, the result may be a

    stylistic hodgepodge. Although such strategies may increase the number of matches

    generated by a TM, they are likely to detract from the overall readability of the resulting

    target text.

    CAT also affects the professional status of translators, their remuneration and their

    intellectual property rights. For instance, some clients ascribe less value to the work of the

    translator who uses CAT tools. If CAT is faster and easier than human translation, clients

    may ask to pay less for it. When a TM offers exact matches or fuzzy matches, should the

    translator offer a discount? Clients may be even more demanding if they use their own TM

    to pre-translate a text before sending it to a translator. Yet, even exact matches do not

    equate to zero time spent; a translator must evaluate the suggested sentences and make

    adjustments depending on the communicative context. Traditional fee structures (billing by

    the word or page) may no longer be appropriate. In many instances, these are being

    replaced by hourly charges. In addition, the movement away from desktop TM systems

    towards Web- and server-based access to TM databases is giving more control to clients

    and less to translators, particularly freelancers. Under this model, translators are restricted

    to using the clients databases and do not have the option of adding matches to or retrieving

    them from their own TMs, which makes it difficult for translators to build up their own

    linguistic assets (Garcia 2008: 62).

  • 7/31/2019 Bow Fis Comaitr

    8/11

    Ethical, financial and legal questions surround the ownership and sharing of CAT

    data. The contents of a TM are source texts and translations, the ownership of which

    presumably remains with the client. However, when collected as a database, control and

    ownership are thrown in doubt. Translators may wish to exchange or sell a TM or

    termbase. However, confidentiality of the information may be demanded by the client or

    original owner (e.g. for copyright protection). Currently, translators often deal with this

    problem through contracts and nondisclosure agreements. Variations in laws across

    jurisdictions further complicate matters in a global translation market. In the future, it is

    hoped that legal remedies, clients licensing of databases to translators and good-faith

    negotiation of agreements that serve both translators and clients can establish best practices

    and put the industrys collective linguistic knowledge to good use (Topping 2000; Gow

    2007: 175, 188).

    4. Emerging possibilities

    CAT is here to stay, and translators will continue to adapt to its presence and its increasing

    importance. However, CAT will undoubtedly continue to develop at a rapid pace.

    Advances in TMs may include the introduction of linguistic analysis and the ability to

    recall the surrounding context of matching segments, as well as that of their corresponding

    translationsalready a feature of certain TMs. Moreover, in the case of fuzzy matches,

    current TM systems readily identify differences between the new sentence to be translated

    and the source text segment retrieved from the TM database; however, future

    enhancements may also indicate which elements of the corresponding target segment may

    need to be preserved or modified. Sharing across different products will become easier as

  • 7/31/2019 Bow Fis Comaitr

    9/11

    standards such as Translation Memory eXchange (TMX), TermBase eXchange (TBX) and

    XML Localization Interchange File Format (XLIFF) become more widely adopted

    (Savourel 2007: 37). In addition, the availability of open source products will make it

    easier for more translators to access CAT tools, while Web- and server-based access to TM

    databases is also becoming an increasingly common model (Garcia 2008).

    The Internet provides several atypical possibilities for CAT. For instance,

    crowdsourcing translations involves leveraging the knowledge and free labour of the crowd

    the general public. Collaborative translationoften undertaken as part of a volunteer

    effortis similar in that multiple translators participate, but it could be limited to selected

    or professional ones. Tools such as wikis and weblogs can serve these purposes (Bey et al.

    2006;NETWORKING AND VOLUNTEER TRANSLATORSand WEB AND

    TRANSLATION). There will be no shortage of innovation while millions of

    technologically savvy people, speakers of dozens of languages, contribute daily to the

    information sphere that is our world. However, none of this will obviate the need for

    professional translators, equipped with specialized skills and knowledgeand even CAT

    toolsto continue their indispensable work into the next decades.

    References

    Bey, Youcef, Boitet, Christian and Kageura, Kyo. 2006. BEYTrans: A Wiki-based

    environment for helping online volunteer translators. In Topics in Language

    Resources for Translation and Localisation, Elia Yuste Rodrigo (ed.), 135-150.

    Amsterdam/Philadelphia: John Benjamins.

  • 7/31/2019 Bow Fis Comaitr

    10/11

    Bowker, Lynne. 2002. Computer-Aided Translation Technology: A Practical Introduction.

    Ottawa: University of Ottawa Press.

    Bowker, Lynne. 2006. Translation Memory and Text. InLexicography, Terminology and

    Translation: Text-based Studies in Honour of Ingrid Meyer, Lynne Bowker (ed.), 175-187.

    Ottawa: University of Ottawa Press.

    Esselink, Bert. 2000.A Practical Guide to Localization. Amsterdam/Philadelphia: John

    Benjamins.

    Garcia, Ignacio. 2008. Power Shifts in Web-based Translation Memory.Machine

    Translation 21(1): 55-68.

    Gow, Francie. 2007. You Must Remember This: The Copyright Conundrum of

    Translation Memory Databases. In Canadian Journal of Law and Technology

    6(3): 175-192.

    Jaekel, Gary. 2000. Terminology Management at Ericsson. In Translating into Success:

    Cutting-edge strategies for going multilingual in a global age, Robert C. Sprung

    (ed.), 159-171. Amsterdam/Philadelphia: John Benjamins.

    Joscelyne, Andrew. 2000. The Role of Translation in an International Organization. In

    Translating into Success: Cutting-edge strategies for going multilingual in a global

    age, Robert C. Sprung (ed.), 81-95. Amsterdam/Philadelphia: John Benjamins.

    Kay, Martin. 1980. The Proper Place of Men and Machines in Language Translation.

    Research Report CSL-80-11, Xerox Palo Alto Research Center, Palo Alto, CA.

    Reprinted inReadings in Machine Translation, Sergei Nirenburg, Harold L.

    Somers and Yorick A. Wilks (eds.), 221-232. Cambridge, MA: The MIT Press.

  • 7/31/2019 Bow Fis Comaitr

    11/11

    Lagoudaki, Elina. 2006. Imperial College London Translation Memories Survey 2006.

    Translation Memory systems: Enlightening users perspective. Online at:

    http://www3.imperial.ac.uk/pls/portallive/docs/1/7307707.PDF

    Lange, Carmen Andrs and Bennett, Winfield Scott. 2000. Combining Machine

    Translation with Translation Memory at Baan. In Translating into Success:

    Cutting-edge strategies for going multilingual in a global age, Robert C. Sprung

    (ed.), 203-218. Amsterdam/Philadelphia: John Benjamins.

    Quah, Chiew Kin. 2006. Translation and Technology. Houndmills, UK/New York:

    Palgrave Macmillan.

    Savourel, Yves. 2007. CAT Tools and Standards: A Brief Summary. MultiLingual18(6):

    37.

    Somers, Harold. 2003a. The translators workstation. In Computers and Translation: A

    Translators Guide, Harold Somers (ed.), 13-30. Amsterdam/Philadelphia: John

    Benjamins.

    Somers, Harold. 2003b. Translation memory systems. In Computers and Translation: A

    Translators Guide, Harold Somers (ed.), 31-47. Amsterdam/Philadelphia: John

    Benjamins.

    Topping, Suzanne. 2000. Sharing Translation Database Information.Multilingual

    Computing and Technology 11 (5): 59-61.