TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Setting Perspective, Luigi...

Post on 17-Jan-2015

151 views 3 download

description

he time spent looking for and not finding information cost an organization a total of $6 million a year, not including opportunity costs or the costs of reworking existing information that could not be located. Only 41% of localization-mature organizations have some terminology management policy in place, almost solely translation-oriented. Today we will talk about how terminology management works, demonstrate its power, through controlled languages, ontologies, search engine applications, content and knowledge management applications, and e-learning systems.

Transcript of TaaS Workshop 2014, Term Mining and Terminology Management in a Corporate Setting Perspective, Luigi...

Wednesday,  4  June  /10:50  –  11:20    

Term  Mining  and  Terminology  Management  in  a  Corporate  Se@ng  PerspecCve  

 Luigi  Muzii,  sQuid  

TaaS  Workshop  2014  4  June,  Dublin  (Ireland)  

The  research  within  the  project  TaaS  leading  to  these  results  has  received  funding  from  the  European  Union  Seventh  Framework  Programme  (FP7/2007-­‐2013),  grant  agreement  no  296312  

 

Welcome  Ivan  Smolniov,  ABBYY  Language  Services    

 

Term  Mining  and  Terminology  Management  A  Corporate  Setting  Perspective  

Awareness  

Globally  active  organizations  whose  core  business  is  not  communications-­‐related  (translation,  localization,  information  management,  etc.)  are  generally  unaware  of  the  benefits  of  performing  terminology  management.  

Kara  Warburton,  LISA,  2001  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Translation-­‐oriented  terminology  

Only  41%  of  localization-­‐mature  organizations  have  some  terminology  management  policy  in  place,  almost  solely  translation-­‐oriented  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Scope  

•  Technical  documentation  •  Controlled  languages  •  Translation  and  localization  •  Translation  automation  

•  Content  and  Knowledge  Management  Systems  •  Knowledge  organization  •  Taxonomies  and  ontologies  

•  Learning  Management  Systems  •  Knowledge  nugget  (knowledge  representation)  •  Self-­‐contained  reusable  educational  entities  (Learning  Object  

Metadata,  IEEE  1484.12.1)  

• Marketing  management  •  Customer  service  •  SEM/SEO  •  Sentiment  analysis  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Integrations  

Documentation  

CMS   Website  

Marketing  

Service  &  Support  

LMS  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

CVS  

Costs  (IDC,  2004)  

•  Productivity  of  knowledge  workers  •  15%  to  35%  searching  for  information  •  Successfully  completed  50%  of  the  time  or  less  •  Only  21%  found  the  information  they  needed  85%  to  100%  of  the  time  

•  $6  million  a  year  looking  for  and  not  finding  information  

•  15%  of  time  for  duplicating  existing  information  •  Opportunity  costs  •  Reworking  existing  information  that  could  not  be  located  

•  $12  million  a  year  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Terminology  cost  multiplier  (Jörg  Schütz/Rita  Nübel)  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Product  data  

Documentation  development  

Authoring  

Editing  

Approval  

Localization  

Maintenance  

0.1  -­‐  0.2  

0.5  

1.0  

2.0  

5.0  

10.0  

20.0  

Costs/Benefits  

• Huge  costs  in  the  short  term  •  $150  per  terminological  entry  (J.D.  Edwards,  2001)  

•  The  practical  value  does  not  match  the  technical  value  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Accuracy   Fundamental accuracy of statement is the one sole morality of writing.  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Payback  

• Cost  reduction  •  Authoring,  localization,  training,  customer  service  •  Overhead  •  Time  reduction  in  the  production  cycle  •  Immediate  1%  payback  for  larger  businesses  

•  Productivity  increase  •  Time-­‐to-­‐market  

• Qualitative  improvements  •  Branding  •  Safety  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Controlled  languages  

The  most  valuable  of  all  talents  is  that  of  never  using  two  words  when  one  will  do.  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Fatal  errors  

•  The  Linate  Airport  disaster  (Oct  8,  2001)  •  Deficiencies  in  the  airport  layout  and  procedures  •  Violations  of  ICAO  regulations  •  Incorrect  signs  to  runway  

•  Incorrect,  uncorrected  readback  •  Non-­‐standard  phraseology  •  Irrelevant  term  (extension)  leading  to  fatal  misunderstanding  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Keywords  advertising  

Rem  tene,  verba  sequentur  (Keep  to  the  subject,  words  will  follow)  

Marcus  Porcius  Cato  (Cato  the  Censor)  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

The  long  tail  

Rerum  enim  copia  verborum  copiam  gignit  (All  this  gives  rise  to  a  plethora  of  words)  

Cicero  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Term  mining  •  Complex  knowledge-­‐intensive  task  •  Different  approach  for  different  scope  

•  Hard  to  grasp  in  a  corporate  setting  perspective  •  Business  intelligence  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Mining  terms  

•  Linguistic  approach    •  Based  on  rules  and  dictionaries  •  Collocations  •  One  language  at  a  time  

•  Issues  •  Loans  •  Synonyms,  variants,  

abbreviations  •  Ellipses  •  Improper  usage  

•  Bitext  •  Knowledge  bases  •  Knowledge  discovery  

•  Statistical  approach    •  Language  independent  •  Based  on  frequency  •  Repeated  sequences  of  

syntagmas  •  The  frequency  threshold  

must  be  specified  •  Frequency  does  not  necessarily  

means  importance  •  Much  “noise”  

•  Monolingual  corpus    •  Indices  •  Controlled  languages  •  Keywords  

•  TQA  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

TaaS  test  drive  

• Building  a  Localization  Kit  •  13688  words,  142  repetitions  •  memoQ  Term  Extraction  •  Statistical  analysis  •  815  term  entries  from  the  English  document  •  647  term  entries  from  translation  memory  

•  Tilde  Wrapper  System  for  CollTerm  (TWSC)  •  Linguistic  analysis  enriched  by  statistical  features  •  3046  term  entries  

•  Kilgray  Terminology  extractor  •  Statistical  analysis  •  3218  term  entries  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Terminology  management  in  the  cloud  

Pros  

•  Zero  TCO  

•  Availability  and  deployability  •  Collaboration  features  

Cons  

•  Limited  scalability  

•  Security  issues  •  Integration  costs  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

ROI  

The  proof  of  performance,  i.e.  ROI  considerations,  of  terminology  management  within  the  corporate  setting  is  a  challenge  for  future  projects.  

Stefan  Kremer,  2005  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective  

Thank  you  

Term  Mining  and  Terminology  Management  in  a  Corporate  Setting  Perspective