A Universal approach to Building QA Models
-
Upload
kirestin-livingston -
Category
Documents
-
view
38 -
download
1
description
Transcript of A Universal approach to Building QA Models
(C) 2014Logrus International
A UNIVERSAL APPROACH TO BUILDING QA MODELS
Leonid Glazychev, Logrus International Corporation
(C) 2014Logrus International
QA MODEL: GENERAL CONSIDERATIONS
Reflecting perception and priorities of the target audience Concentrating on factors producing the strongest impression Separating global and local factors/issues
Universal applicability Covering the whole spectrum of materials
From slightly post-edited MT to ultra-polished manual translations Same approach for knowledge bases and marketing leaflets
Common approach Only adjusting acceptance criteria/thresholds based on expectations
Viability Clear, not overly complicated Process-oriented, i.e. applicable in the real world
Flexibility Concentrating on methodology Particular criteria/issue classification can be taken from elsewhere, for instance:
Based on MQM or other public source Based on legacy client-sourced criteria…
(C) 2014Logrus International
REAL-LIFE SCENARIO: NO REFERENCE TRANSLATIONS AVAILABLE
Two major criteria for any translation are always Adequacy (Correctly conveys the meaning) and Fluency (Readability) NEITHER of these depends on translation origin, target audience, brand impact, etc.
No need to delve into technical details or error counts if the text is Unreadable (incomprehensible) or Inadequate (inaccurate)
Acceptance thresholds depend on a number of parameters Goals Target audience Speed Expected longevity and brand impact, etc.
Assessment is relatively quick Often scanning through the text is sufficient
Especially so when quality is really low One needs to be bilingual or have a bilingual expert ready just in case
(C) 2014Logrus International
MAKING REAL-LIFE LQA AS OBJECTIVE AS POSSIBLE NONE of the two major criteria are completely objective
An expert panel would produce a normal opinion curve around the average value In real life there is no expert panel, but a single evaluator!
The grade assigned by this particular person will NOT be arbitrary, but… It might fall anywhere within the standard ±2σ range It depends on the individual’s taste, background, etc.
That is why both criteria can be called SEMI-OBJECTIVE or EXPERT OPINION-BASED Both criteria NOT too accurate by design!
Consequences EACH of these two major criteria should be evaluated SEPARATELY
Accurate but incomprehensible texts are as useless as fluent but inadequate ones Two independent “coordinates”, can’t be combined mechanically
EACH should be evaluated on a threshold-based PASS/FAIL basis Acceptance range needs to accommodate the whole spectrum of potential expert opinions
Marketing text: Between 8 and 10 (10-point scale) Knowledge base: Between 5 and 8 (10-point scale)
The minimal scale to be used is a 10-point one, to accommodate the normal curve properly Smaller scales just do not provide sufficient granularity
Acceptance threshold defined by the area, visibility of materials, time constraints, target audience, etc.
(C) 2014Logrus International
THE TECHNICAL FACTOR
Only content that passes on both accounts is further analyzed for technical imperfections Terminology inconsistency or deviations Style guides, country standards Tags, placeholders Formatting
Technical issues are OBJECTIVE Grades expected to be similar irrespective of the reviewer’s personality
A typo is still a typo An error in country standards is still an error anyway
Issue categories can be based on MQM or other public source Legacy client-sourced criteria
Error weights and acceptance thresholds depend on multiple factors Expectations, target audience, time, brand impact, etc. Each “quality vector” contains error weights for each category and acceptance levels
A limited number of “quality vectors” cover the whole spectrum The resulting technical (objective) quality grade is the third apex of the quality triangle
(C) 2014Logrus International
THE QUALITY TRIANGLE (OR SQUARE)
ADEQUACY
FLUENCY
TECHNICAL
MAJOR ERRORS
Acceptance Range Filters
(C) 2014Logrus International
CASE-STUDY: US ACA SPANISH WEBSITE REVIEW
Organized by GALA (Globalization and Localization Association, www.gala-global.org) Logrus developed and provided methodology Logrus organized the review and provided analytics Volunteer effort, crowdsourcing-based approach
Complicated special rules, strict definitions, lengthy training, etc. out of the question Contributors chosen among language professionals only
Simplified “quality square” methodology applied Major errors (10 = None, 0 = More than 2) Readability (fluency, 0 - 10) Adequacy (accuracy, 0 - 10) Technical (0 – 10)
18 language pros reviewing the website: www.CuidadoDeSalud.gov
(C) 2014Logrus International
CASE-STUDY: US ACA SPANISH WEBSITE REVIEW (II)
Major errors: None (11), More than 2 (7), 1 grade ignored Takeaways
Not too objective! YOUR reviewer could contribute to ANY of the bars Only threshold-based criteria really work
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
Actual Results Normal Distribution
Rating (0 - 10)
Rating Popu-larity
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
Actual Results Normal Distribution
Rating (0 - 10)
Rating Popu-larity
Readability / IntelligibilityMean value: 6.2, Std. Deviation: 2.1
Adequacy / AccuracyMean value: 6.6, Std. Deviation: 1.9
(C) 2014Logrus International
CASE-STUDY: US ACA SPANISH WEBSITE REVIEW (III)
Biggest opinion spread for technical errors Illustrates the gap between professional and crowdsourced work No detailed criteria or training applied Should be the most objective factor
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
Actual Results Normal Distribution
Rating (0 - 10)
Rating Popu-larity
Technical IssuesMean value: 4.8, Std. Deviation: 2.3 Overall review results still quite reliable/convincing
Not a big surprise given the website initial quality… “Obamacare’s poorly translated Spanish website
frustrates users”, AP, January 12, 2014
0 1 2 3 4 5 6 7 8 9 100123456789
10
Actual Results Normal Distribution
Rating (0 - 10)
Rating Popu-larity
(C) 2014Logrus International
WHY SEMI-OBJECTIVE AND OBJECTIVE FACTORS SHOULD NOT BE COMBINED Scope and nature
Objective factors are “local”, each applies to a particular small segment (sentence) Semi-objective factors typically apply to the text as a whole or its large chunks Semi-objective evaluations imprecise by definition, can’t be used in formulas
Natural variation might affect the summary score dramatically Importance/weight
Adequacy and fluency issues are way more important than most others Their relative weight will exceed everything else by orders of magnitude Combined summary result too dependent on adequacy/fluency
Almost no sensitivity to other factors Cost, Time, Viability
No reason to waste time on counting/grading technical errors for an incomprehensible or incorrect text
(C) 2014Logrus International
THE “QUALITY TRIANGLE/SQUARE” APPROACH RECIPE Preparation
Select/build the appropriate issue classification for objective errors Select/set the acceptance thresholds and error weights vector Define show-stoppers
Process Apply expert opinion-based (semi-objective) criteria with a PASS/FAIL result
Adequacy (Accuracy) Fluency (Readability)
Apply objective criteria based on error classification/typology (acceptable docs only) Language (spelling & grammar) References, lack of (over-/under-)translations Country and other standards Terminology, Style Guide and explicit client’s guidelines Tags, placeholders, formatting, etc.
Ignore Subjective Complaints Obtain 3 or 4 resulting ratings for each reasonably translated document
Adequacy (Accuracy) Fluency (Readability) Objective (Technical) error rating [Major problems]
(C) 2014Logrus International
SUMMARY
QA approach equally applicable to almost all real-life translations (without an existing reference) Works for MT post-editing or even raw MT output Complements the MQM back-end providing the methodology for quality assurance
The only things that need to be chosen or fine-tuned are Issue catalogue (for objective issues/errors) The vector comprising all acceptance thresholds and error weights
Can be chosen from a limited number of preset templates (content profiles) See concept details in tcworld as of February, 2012, Of power adapters and language quality assurance:
http://www.tcworld.info/tcworld/translation-and-localization/article/of-power-adapters-and-language-quality-assurance/
(C) 2014Logrus International
SEPARATE CASE: REFERENCE TRANSLATIONS AVAILABLE
There are plenty of time/money-saving, automated methods to get a ballpark quality evaluation Applicability area is narrowed dramatically:
Comparing different MTs or Different versions of the same MT Evaluating test translations
Results might be quick and cheap, but Not directly related to quality of the translation Rather illustrating translation’s closeness to the benchmark one
Can be used for developing/improving MTs or quickly evaluating new translators/students Very limited usability for real-life translation scenarios