W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

Re: [ISSUE 34] preliminary list of top-level error categories

From: Phil Ritchie <philr@vistatec.ie>
Date: Tue, 31 Jul 2012 07:27:44 +0100
To: "Arle Lommel" <arle.lommel@dfki.de>
Message-ID: <478BC232-3462-4A69-8454-07BBA58D7B0E@vistatec.ie>
Cc: "Multilingual Web LT Public List" <public-multilingualweb-lt@w3.org>
This is a great compilation Arle.

You don't mention the GALE categories but I suspect that they will fall under the ones you've listed below. I'll try to check.

Phil



On 30 Jul 2012, at 10:12, "Arle Lommel" <arle.lommel@dfki.de> wrote:

> Hi all,
> 
> As part of some work going on with respect to the quality data categories, I have been asked to come up with a list of high-level error categories that reflect the top-level organization of quality issues checked by various tools and/or quality processes. After examining eight systems in current use, I have identified 26 categories that are either (a) used very commonly or (b) unify a variety of common categories.
> 
> Note that this list is not intended to cover all detailed errors, but rather to provide a high level of interoperability between different quality systems. For example, Okapi identifies three different categories related to whitespace while QA Distiller identifies two, one of which corresponds to all three of the Okapi categories, and one of which has no match in Okapi. In this case both would identify that an error corresponds to whitespace and then specify their own category. We are not yet sure of what syntax we will use for this, but it could be something like this:
> 
> <p its-quality-issue="whitespace;okapi:MISSING_LEADING_WS">Some content that was supposed to start with a tab</p>
> 
> At this point we need some feedback as to whether there are any obvious oversights in the categories or if they are too specific, etc. Note that these should cover all of the errors from at least the following:
> 
> LISA QA Model
> SAE J2450
> SDL TMS Classic
> ISO DX 14080 (now withdrawn)
> ATA Certification
> CheckMate (Okapi)
> QA Distiller
> XLIFF:doc's error categories
> 
> This does not mean that all of the details of those categories should be reflected here, but rather that their categories could be fit into these as big boxes.
> 
> One final note, there ordering in this list is significant in some cases. If two categories could apply to an error, the first one that applies is to be used. For example, if an error could be considered a terminology error or a mistranslation, it should be considered a terminology error.
> 
> So please provide your feedback and thoughts. These categories can be modified and added to, but we want to keep the number down and the granularity quite coarse.
> 
> Best,
> 
> Arle
> 
> terminology
> An incorrect term or a term from the wrong domain was used or terms are used inconsistently
> mistranslation
> The content of the target mistranslates the content of the source
> omission
> Necessary text has been omitted from the translation or source
> untranslated
> Content that should have been translated was not left untranslated
> addition
> The translated text contains inappropriate additions
> duplication
> Content has been duplicated improperly
> inconsistency
> The text is inconsistent with itself (NB: not for use with terminology inconsistency)
> grammar
> The text contains a grammatical error (including errors of syntax and morphology)
> legal
> The text is legally problematic (e.g., it is specific to the wrong legal system)
> register
> The text is written in the wrong linguistic register of uses slang or other language variants inappropriate to the text
> locale-specific-content
> The translation contains content that does not apply to the locale for which it was translated (e.g., a service manual for Japan contains a U.S. call center number)
> locale-violation
> Text violates norms for the intended locale (e.g., it uses the wrong form for names, addresses, sorting, measurement units, etc.)
> style
> The text contains stylistic errors
> characters
> The text contains characters that are garbled or incorrect (e.g., the text should have a • but instead has a ¥ sign) or that are not used in the language in which the content appears
> misspelling
> The text contains a misspelling
> typographical
> The text has typographical errors such as omitted/incorrect punctuation, incorrect capitalization, etc.
> formatting
> The text is formatted incorrectly
> inconsistent-entities
> The source and target text contain different named entities (dates, times, place names, individual names, etc.)
> numbers
> Numbers are inconsistent between source and target
> language-error
> The text contains a language error not identified above (NB: This allows for the pass-through of data from Language Tool or other external tools that evaluate the text.)
> markup
> There is an error related to markup or a mismatch in markup between source and target
> pattern-problem
> The text fails to match a pattern that defines allowable content (or matches one that defines non-allowable content)
> whitespace
> There is a mismatch in whitespace between source and target content
> internationalization
> There is an error related to the internationalization of content
> length
> There is a significant difference in source and target length
> other
> Any issue not covered above
> 

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************
Received on Tuesday, 31 July 2012 06:28:22 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC