W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

[ISSUE 34] preliminary list of top-level error categories

From: Arle Lommel <arle.lommel@dfki.de>
Date: Mon, 30 Jul 2012 11:12:04 +0200
Message-Id: <F8E274BE-17BE-4847-96DD-8AC5A570DCCA@dfki.de>
To: Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Hi all,

As part of some work going on with respect to the quality data categories, I have been asked to come up with a list of high-level error categories that reflect the top-level organization of quality issues checked by various tools and/or quality processes. After examining eight systems in current use, I have identified 26 categories that are either (a) used very commonly or (b) unify a variety of common categories.

Note that this list is not intended to cover all detailed errors, but rather to provide a high level of interoperability between different quality systems. For example, Okapi identifies three different categories related to whitespace while QA Distiller identifies two, one of which corresponds to all three of the Okapi categories, and one of which has no match in Okapi. In this case both would identify that an error corresponds to whitespace and then specify their own category. We are not yet sure of what syntax we will use for this, but it could be something like this:

<p its-quality-issue="whitespace;okapi:MISSING_LEADING_WS">Some content that was supposed to start with a tab</p>

At this point we need some feedback as to whether there are any obvious oversights in the categories or if they are too specific, etc. Note that these should cover all of the errors from at least the following:

LISA QA Model
SAE J2450
SDL TMS Classic
ISO DX 14080 (now withdrawn)
ATA Certification
CheckMate (Okapi)
QA Distiller
XLIFF:doc's error categories

This does not mean that all of the details of those categories should be reflected here, but rather that their categories could be fit into these as big boxes.

One final note, there ordering in this list is significant in some cases. If two categories could apply to an error, the first one that applies is to be used. For example, if an error could be considered a terminology error or a mistranslation, it should be considered a terminology error.

So please provide your feedback and thoughts. These categories can be modified and added to, but we want to keep the number down and the granularity quite coarse.

Best,

Arle

terminology
An incorrect term or a term from the wrong domain was used or terms are used inconsistently
mistranslation
The content of the target mistranslates the content of the source
omission
Necessary text has been omitted from the translation or source
untranslated
Content that should have been translated was not left untranslated
addition
The translated text contains inappropriate additions
duplication
Content has been duplicated improperly
inconsistency
The text is inconsistent with itself (NB: not for use with terminology inconsistency)
grammar
The text contains a grammatical error (including errors of syntax and morphology)
legal
The text is legally problematic (e.g., it is specific to the wrong legal system)
register
The text is written in the wrong linguistic register of uses slang or other language variants inappropriate to the text
locale-specific-content
The translation contains content that does not apply to the locale for which it was translated (e.g., a service manual for Japan contains a U.S. call center number)
locale-violation
Text violates norms for the intended locale (e.g., it uses the wrong form for names, addresses, sorting, measurement units, etc.)
style
The text contains stylistic errors
characters
The text contains characters that are garbled or incorrect (e.g., the text should have a  but instead has a  sign) or that are not used in the language in which the content appears
misspelling
The text contains a misspelling
typographical
The text has typographical errors such as omitted/incorrect punctuation, incorrect capitalization, etc.
formatting
The text is formatted incorrectly
inconsistent-entities
The source and target text contain different named entities (dates, times, place names, individual names, etc.)
numbers
Numbers are inconsistent between source and target
language-error
The text contains a language error not identified above (NB: This allows for the pass-through of data from Language Tool or other external tools that evaluate the text.)
markup
There is an error related to markup or a mismatch in markup between source and target
pattern-problem
The text fails to match a pattern that defines allowable content (or matches one that defines non-allowable content)
whitespace
There is a mismatch in whitespace between source and target content
internationalization
There is an error related to the internationalization of content
length
There is a significant difference in source and target length
other
Any issue not covered above


Received on Monday, 30 July 2012 09:12:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC