TAUS Scotland Asia Online Technology Platform V1


of 23

  • date post

  • Category

  • view

  • download


Embed Size (px)


TAUS Edinburgh Conf Presentation


  • 1. TM Translation Technology PlatformKirti Vashee VP Sales, Asia Online Kirti.vashee@asiaonline.net

2. Revolutionize the enterpriseRevolutionize the Internettranslation process with a experience for non-Englishcomprehensive, continuousspeakers in Asialearning SMT platformProvide 1 billion+ local-language pages online SaaS environment that allows data using mostly translated open license content,cleaning and preparation, develop SMT combined with compelling portal and social networking style services in Thailand, engines on demand and enable ongoing Indonesia, India, Malaysia, Philippines, comprehensive post editing and correction Vietnam and China, Japan & Korea to continuously improve engines The Consumer MarketThe Enterprise MarketLarge Buyer &Translation ToolsPublisher PerspectiveVendor Perspective TMCopyright 2008, All Rights Reserved 3. The only SMT technology provider that is also a major user of ALT technology on one of the largest translation projects in the world - English Wikipedia (1B Words+) into 11 Asian languages using SMT and crowdsourcing The translation tools and technology platform used to accomplish this, is also being made available as a SaaS product for the enterprise translation marketTMCopyright 2008, All Rights Reserved 4. Battlefield of words Fusion with customer support Continuous translation Community translation Industry-shared language data Massive online collaboration Translation automation TMCopyright 2008, All Rights Reserved 5. InteractiveSupport: EMAILKnowledgeKnowledgeInstantBaseBase DataMessaging User Manuals User GeneratedVoiceSupportContentBlogs DocumentationUserInteractive Manual Support Web 2.0 is much more interactive and dynamic Globalization will be further driven by internet penetration into Asia Word-of-mouth-marketing gaining prominence all over the world Unstructured content in blogs, review sites is becoming critical The dialogue with global customer needs to be more interactive TM Copyright 2008, All Rights Reserved 6. Continuous Improvement HDSMT Engines Sales /BlogsMarketing CRM Product Biz Intelligence Management Human Content Resources ManagementECM BPM The GlobalCustomerCRMEmail Customer Support IM Highly adaptive human driven process for continuous output qualityimprovement in SMT engines and translation automation Intensive Collaboration with human translators to raise quality of SMT Integration with content creation and content refinement tools to enhancespeed and improve business process management Continued evolution in standards to facilitate sharing linguistic assetsTM Copyright 2008, All Rights Reserved 7. Comprehensive SaaS Platform that facilitates the translation and continued refinement of any large high value translatable corpus using HDSMT Existing Feature Set Data Cleaning & Preparation Tools On Demand SMT engine development Support for both user created and online dictionaries and glossaries Ability to pool data for greater leverage Multiple level domain support Seamless integration with collaborative post-editing environment Real time updates of translated assets Web Services based APIs for integration System and process foundation for managed online community collaboration TMCopyright 2008, All Rights Reserved 8. Bilingual Data Preparation & Cleaning Bilingual Data Normalization & Optimization Source Cleanup and PreparationData Grammar and Spelling validation Management Monolingual Data Extraction & Analysis SMT System Training & Development Monolingual Data Training Ongoing Corpus Refinement and Tuning SMT Engine Analysis and Evaluation of Ngrams Error Pattern Identification & Correction Automated error correction tools Output Continuing Cycle of Exception Identification and Correction Proofing & Development of small sets of new data to correct errors EditingTMCopyright 2008, All Rights Reserved 9. TM Copyright 2008, All Rights Reserved 10. Data Cleaning Utilities to normalize and standardize data prior to consolidation to provide maximum leverage Recent study for TAUS proves conclusively that sharing clean data provides leverage Smaller amount of clean data can produce better results thandatasets even 2X larger Consistent Terminology matters and provides real leverage Data optimized for TM Tools can be dirty data for SMTTM Copyright 2008, All Rights Reserved 11. Initial System putinto productionChanges are collectedTrained Internaland added to initialExperts begin initialcorpus to drive clean up and correction continuous retrainingprocessAll users allowed toExpert Users alsosuggest changes whichallowed to makego through vettingchanges CommunityprocessTM Copyright 2008, All Rights Reserved 12. Targeted Corrections Initial Systemof Bad LearningSpelling & Terminology Correct Mistranslation Syntax/Grammar Terminology Spelling PunctuationHuman Feedback canraise the raw output to previouslyunseen quality levelsTMCopyright 2008, All Rights Reserved 13. TM Copyright 2008, All Rights Reserved 14. TM Copyright 2008, All Rights Reserved 15. Information Requests Data Training GetAccountInformationCancelTrainingJob GetAccountUsageHistory GetTrainingJobList GetAvailableDomainCombinationsForLanguagePairGetTrainingJobStatus GetAvailableDomainsForLanguagePair SubmitDatasetForTraining GetAvailableLanguagePairsData Preparation GetCustomDomainsForLanguagePairCleanText Data Storage ExtractText CreateDatasetNormalizeText DeleteDatasetOCRImage DeleteDataFromDatasetParagraphAlignLanguagePairText DownloadDatasetSentenceAlignLanguagePairText DownloadDatasetItemSentenceSegmentText GetDatasetList SpellCheckText GetDatasetItemList WordSegmentText LinkDataToDatasetTranslation MergeDatasetsCancelTranslationJob UploadData GetTranslationJobList UploadGlossary GetTranslationJobStatus UploadImageSubmitDatasetForTranslation UploadLanguageModelSubmitSinglePhraseForTranslation UploadMonolingualText UploadOCRPageLayout sUsername StringThe username of the person making the request. UploadPhrasePairs sPassword StringThe password of the person making the request. UploadTranslationMemory iAccountNoInteger The account number that this request is associated with. UploadZIP iDepartmentNo Integer The department number that this request is associated with. iLanguagePairCode Integer The code for the language pair that is being looked up.TMCopyright 2008, All Rights Reserved 16. TM Copyright 2008, All Rights Reserved 17. TM Copyright 2008, All Rights Reserved 18. TM Copyright 2008, All Rights Reserved 19. TM Copyright 2008, All Rights Reserved 20. TM Copyright 2008, All Rights Reserved 21. Provide existing humantranslated content for training language engines TranslationSystemsUser Publishers Constant User accessesImprovement online content in Social Networks / local languageCommunity Leverage ASP Translation serviceTranslated content prooffor translation ofread using communitynew materialprinciples and paid proof readers using Asia Online proof reading systemProof readingstill required whether human or machineNew translation translations sent back to publisher Translated Translation Asia Onlinecontent madeTranslated Content SaaS Portalavailable to usersHuman Proof ReadersTranslations are proof read via ASPOriginal Content translatedproof reading system to local language Original ContentTMCopyright 2008, All Rights Reserved 22. Integrated data cleaning, data preparation, SMT systems development and post-editing environment Comprehensive proof-reading and post-editing environment that is integrated with core SMT engines to enable instant updatesGreater Control & Better systems Greater transparency of many key SMT building blocks to enable users to see and modify what the system has learnt resulting in greater control and better systems A richer and deeper taxonomy for domains to ensure the best quality Better systems Incremental additions of new training data to any existing system to enable rapid updates Faster updates Easy handling of terminology, glossaries, dictionaries TM Copyright 2008, All Rights Reserved 23. TM Kirti Vashee VP Sales, Asia Online kirti.vashee@asiaonline.net