Re: [ISSUE 34] preliminary list of top-level error categories from Felix Sasaki on 2012-07-31 (public-multilingualweb-lt@w3.org from July 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Tue, 31 Jul 2012 08:38:34 +0200
To: Arle Lommel <arle.lommel@dfki.de>
Cc: Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Message-ID: <CAL58czoPnc_HgsfX8fd4WtE1mVdKC-Uvue_DtJLu3OmN0_-Pyg@mail.gmail.com>
Hi Arle, all,

Arle, that looks like a great job indeed. A few comments below.

2012/7/30 Arle Lommel <arle.lommel@dfki.de>

> Hi all,
>
> As part of some work going on with respect to the quality data categories,
> I have been asked to come up with a list of high-level error categories
> that reflect the top-level organization of quality issues checked by
> various tools and/or quality processes. After examining eight systems in
> current use, I have identified 26 categories that are either (a) used very
> commonly or (b) unify a variety of common categories.
>
> Note that this list is not intended to cover all detailed errors, but
> rather to provide a high level of interoperability between different
> quality systems. For example, Okapi identifies three different categories
> related to whitespace while QA Distiller identifies two, one of which
> corresponds to all three of the Okapi categories, and one of which has no
> match in Okapi. In this case both would identify that an error corresponds
> to *whitespace* and then specify their own category. We are not yet sure
> of what syntax we will use for this, but it could be something like this:
>
> <p its-quality-issue="whitespace;okapi:MISSING_LEADING_WS">Some content
> that was supposed to start with a tab</p>
>
> At this point we need some feedback as to whether there are any obvious
> oversights in the categories or if they are too specific, etc.
>

It would be very helpful to get this feedback from the people working on
the specifications / tools you list below. With Yves, I think we covered
CheckMate. Can you take an action to make feedback from the other groups
publicly?

The analysis covers a lot of tools and specifications. Esp. for readers
from the non localization area, it would be helpful to have examples how to
represent output of these tools in ITS markup. I guess you have written
that up anyway as part of your detailed analysis. Do you think you could
write a non-normative appendix with that material? Like with the standoff
version of provenance such material might eventually end up in the best
practice document, but it would be good to have it available now again for
people inside and outside the group to review.

Best,

Felix



> Note that these should cover all of the errors from at least the following:
>
>
>    - LISA QA Model
>    - SAE J2450
>    - SDL TMS Classic
>    - ISO DX 14080 (now withdrawn)
>    - ATA Certification
>    - CheckMate (Okapi)
>    - QA Distiller
>    - XLIFF:doc's error categories
>
>
> This does not mean that all of the *details* of those categories should
> be reflected here, but rather that their categories could be fit into these
> as big boxes.
>
> One final note, there ordering in this list is significant in some cases.
> If two categories could apply to an error, the first one that applies is to
> be used. For example, if an error could be considered a terminology error
> or a mistranslation, it should be considered a terminology error.
>
> So please provide your feedback and thoughts. These categories can be
> modified and added to, but we want to keep the number down and the
> granularity quite coarse.
>
> Best,
>
> Arle
>
> terminologyAn incorrect term or a term from the wrong domain was used or
> terms are used inconsistentlymistranslationThe content of the target
> mistranslates the content of the sourceomissionNecessary text has been
> omitted from the translation or sourceuntranslatedContent that should
> have been translated was not left untranslatedadditionThe translated text
> contains inappropriate additionsduplicationContent has been duplicated
> improperlyinconsistencyThe text is inconsistent with itself (NB: not for
> use with terminology inconsistency)grammarThe text contains a grammatical
> error (including errors of syntax and morphology)legalThe text is legally
> problematic (e.g., it is specific to the wrong legal system)registerThe
> text is written in the wrong linguistic register of uses slang or other
> language variants inappropriate to the textlocale-specific-contentThe
> translation contains content that does not apply to the locale for which it
> was translated (e.g., a service manual for Japan contains a U.S. call
> center number)locale-violationText violates norms for the intended locale
> (e.g., it uses the wrong form for names, addresses, sorting, measurement
> units, etc.)styleThe text contains stylistic errorscharactersThe text
> contains characters that are garbled or incorrect (e.g., the text should
> have a • but instead has a ¥ sign) or that are not used in the language in
> which the content appearsmisspellingThe text contains a misspelling
> typographicalThe text has typographical errors such as omitted/incorrect
> punctuation, incorrect capitalization, etc.formattingThe text is
> formatted incorrectlyinconsistent-entitiesThe source and target text
> contain different named entities (dates, times, place names, individual
> names, etc.)numbersNumbers are inconsistent between source and target
> language-errorThe text contains a language error not identified above
> (NB: This allows for the pass-through of data from Language Tool or other
> external tools that evaluate the text.)markupThere is an error related to
> markup or a mismatch in markup between source and targetpattern-problemThe
> text fails to match a pattern that defines allowable content (or matches
> one that defines non-allowable content)whitespaceThere is a mismatch in
> whitespace between source and target contentinternationalizationThere is
> an error related to the internationalization of contentlengthThere is a
> significant difference in source and target lengthotherAny issue not
> covered above
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Tuesday, 31 July 2012 06:39:08 UTC