RE: [ISSUE 34] preliminary list of top-level error categories

Hi Arle,

I think this is a very good start.

The only note I have so far is that maybe we could maybe merge together the 'locale-specific-content' and the 'locale-violation' categories. Each seems possibly a bit too specific for a top-level category, and maybe they could be joined into a single 'locale-specific' type. But this is minor.


From: Arle Lommel [] 
Sent: Monday, July 30, 2012 11:12 AM
To: Multilingual Web LT Public List
Subject: [ISSUE 34] preliminary list of top-level error categories

Hi all,

As part of some work going on with respect to the quality data categories, I have been asked to come up with a list of high-level error categories that reflect the top-level organization of quality issues checked by various tools and/or quality processes. After examining eight systems in current use, I have identified 26 categories that are either (a) used very commonly or (b) unify a variety of common categories.

Note that this list is not intended to cover all detailed errors, but rather to provide a high level of interoperability between different quality systems. For example, Okapi identifies three different categories related to whitespace while QA Distiller identifies two, one of which corresponds to all three of the Okapi categories, and one of which has no match in Okapi. In this case both would identify that an error corresponds to whitespace and then specify their own category. We are not yet sure of what syntax we will use for this, but it could be something like this:

<p its-quality-issue="whitespace;okapi:MISSING_LEADING_WS">Some content that was supposed to start with a tab</p>

At this point we need some feedback as to whether there are any obvious oversights in the categories or if they are too specific, etc. Note that these should cover all of the errors from at least the following:

• LISA QA Model
• SAE J2450
• SDL TMS Classic
• ISO DX 14080 (now withdrawn)
• ATA Certification
• CheckMate (Okapi)
• QA Distiller
• XLIFF:doc's error categories

This does not mean that all of the details of those categories should be reflected here, but rather that their categories could be fit into these as big boxes.

One final note, there ordering in this list is significant in some cases. If two categories could apply to an error, the first one that applies is to be used. For example, if an error could be considered a terminology error or a mistranslation, it should be considered a terminology error.

So please provide your feedback and thoughts. These categories can be modified and added to, but we want to keep the number down and the granularity quite coarse.



An incorrect term or a term from the wrong domain was used or terms are used inconsistently
The content of the target mistranslates the content of the source
Necessary text has been omitted from the translation or source
Content that should have been translated was not left untranslated
The translated text contains inappropriate additions
Content has been duplicated improperly
The text is inconsistent with itself (NB: not for use with terminology inconsistency)
The text contains a grammatical error (including errors of syntax and morphology)
The text is legally problematic (e.g., it is specific to the wrong legal system)
The text is written in the wrong linguistic register of uses slang or other language variants inappropriate to the text
The translation contains content that does not apply to the locale for which it was translated (e.g., a service manual for Japan contains a U.S. call center number)
Text violates norms for the intended locale (e.g., it uses the wrong form for names, addresses, sorting, measurement units, etc.)
The text contains stylistic errors
The text contains characters that are garbled or incorrect (e.g., the text should have a • but instead has a ¥ sign) or that are not used in the language in which the content appears
The text contains a misspelling
The text has typographical errors such as omitted/incorrect punctuation, incorrect capitalization, etc.
The text is formatted incorrectly
The source and target text contain different named entities (dates, times, place names, individual names, etc.)
Numbers are inconsistent between source and target
The text contains a language error not identified above (NB: This allows for the pass-through of data from Language Tool or other external tools that evaluate the text.)
There is an error related to markup or a mismatch in markup between source and target
The text fails to match a pattern that defines allowable content (or matches one that defines non-allowable content)
There is a mismatch in whitespace between source and target content
There is an error related to the internationalization of content
There is a significant difference in source and target length
Any issue not covered above

Received on Monday, 30 July 2012 12:24:20 UTC