Re: Error categories from Okapi from Arle Lommel on 2012-07-05 (public-multilingualweb-lt@w3.org from July 2012)

From: Arle Lommel <arle.lommel@dfki.de>
Date: Thu, 5 Jul 2012 15:40:10 +0200
To: Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
Cc: Felix Sasaki <felix.sasaki@dfki.de>, Yves Savourel <ysavourel@enlaso.com>, Phil Ritchie <philr@vistatec.ie>
Message-Id: <3FBAE8D8-4D76-4FAB-8BDA-738B4B55227E@dfki.de>

Phil,

Would you have data that would let you check that? If not, how hard would it be to gather some data going forward? I really have no idea if there would be any correlation or if they would be entirely independent variables.

Also, can you send me a list of the automated checks you run? I would like to compare them to the other lists I am looking at.

-Arle



On Jul 5, 2012, at 14:45 , Phil Ritchie wrote:

> In general we run automated checks prior to human review. The logic being to reduce human effort (in discovering or repeat discovering) and/or reducing repeated cycles with humans. We've never thought of trying to correlate the two data sets. 
> 
> Phil.
> 
> 
> 
> 
> 
> From:        Arle Lommel <arle.lommel@dfki.de> 
> To:        Yves Savourel <ysavourel@enlaso.com>, 
> Cc:        Felix Sasaki <felix.sasaki@dfki.de>, Phil Ritchie <philr@vistatec.ie> 
> Date:        05/07/2012 13:39 
> Subject:        Re: Error categories from Okapi 
> 
> 
> 
> Hi Yves,
> 
> This is actually very useful since these are categories that, in many cases, are not covered by the old LISA QA Model. This actually shows why we need to allow for declaring different kinds of QA categories and referring to them.
> 
> What intrigues me is that these are, for the most part, automatically checkable. I wonder to what extent finding these errors would correlate with finding other errors that require manual checking. Obviously the overlap would not be exact, but I do wonder if they correlate at all such that you could use automatic error checking to flag files that might ALSO need manual checking as well.
> 
> One note: I'm not sure what “The definition of those errors goes along with the item in the note” means in what you wrote.
> 
> Best,
> 
> -Arle
> 
> On Jul 5, 2012, at 14:28 , Yves Savourel wrote:
> 
> > Hi Arle, (CCing Phil in case he has input)
> > 
> >> Do you have a flat list of the error categories 
> >> checked in Okapi? I need to start building a list 
> >> of categories (and definitions) that could be 
> >> placed on a website so there are URLs to point 
> >> to the categories.
> > 
> > Here is the current list of QA item types that we use. That list evolves though.
> > 
> > MISSING_TARGETTU,
> > MISSING_TARGETSEG,
> > EXTRA_TARGETSEG,
> > EMPTY_TARGETSEG,
> > EMPTY_SOURCESEG,
> > MISSING_LEADINGWS,
> > MISSINGORDIFF_LEADINGWS,
> > EXTRA_LEADINGWS,
> > EXTRAORDIFF_LEADINGWS,
> > MISSING_TRAILINGWS,
> > MISSINGORDIFF_TRAILINGWS,
> > EXTRA_TRAILINGWS,
> > EXTRAORDIFF_TRAILINGWS,
> > TARGET_SAME_AS_SOURCE,
> > MISSING_CODE,
> > EXTRA_CODE,
> > SUSPECT_CODE,
> > UNEXPECTED_PATTERN,
> > SUSPECT_PATTERN,
> > TARGET_LENGTH,
> > ALLOWED_CHARACTERS,
> > TERMINOLOGY,
> > LANGUAGETOOL_ERROR
> > 
> > I'm not sure how useful this would be: The definition of those errors goes along with the item in the note.
> > 
> > -ys
> > 
> 
> 
> 
> ************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the sender immediately by e-mail.
> 
> www.vistatec.com
> ************************************************************
>

Received on Thursday, 5 July 2012 13:40:44 UTC