Re: Error categories from Okapi from Phil Ritchie on 2012-07-05 (public-multilingualweb-lt@w3.org from July 2012)

From: Phil Ritchie <philr@vistatec.ie>
Date: Thu, 5 Jul 2012 14:45:16 +0100
To: Arle Lommel <arle.lommel@dfki.de>
Cc: Felix Sasaki <felix.sasaki@dfki.de>, Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>, Yves Savourel <ysavourel@enlaso.com>
Message-ID: <OF1A740B63.E2AE1053-ON80257A32.004B798B-80257A32.004B8EAA@vistatec.ie>

I'll talk to the QA Team and find out.

Phil.

From:   Arle Lommel <arle.lommel@dfki.de>
To:     Multilingual Web LT Public List 
<public-multilingualweb-lt@w3.org>, 
Cc:     Felix Sasaki <felix.sasaki@dfki.de>, Yves Savourel 
<ysavourel@enlaso.com>, Phil Ritchie <philr@vistatec.ie>
Date:   05/07/2012 14:40
Subject:        Re: Error categories from Okapi

Phil,

Would you have data that would let you check that? If not, how hard would 
it be to gather some data going forward? I really have no idea if there 
would be any correlation or if they would be entirely independent 
variables.

Also, can you send me a list of the automated checks you run? I would like 
to compare them to the other lists I am looking at.

-Arle

On Jul 5, 2012, at 14:45 , Phil Ritchie wrote:

In general we run automated checks prior to human review. The logic being 
to reduce human effort (in discovering or repeat discovering) and/or 
reducing repeated cycles with humans. We've never thought of trying to 
correlate the two data sets. 

Phil.

From:        Arle Lommel <arle.lommel@dfki.de> 
To:        Yves Savourel <ysavourel@enlaso.com>, 
Cc:        Felix Sasaki <felix.sasaki@dfki.de>, Phil Ritchie <
philr@vistatec.ie> 
Date:        05/07/2012 13:39 
Subject:        Re: Error categories from Okapi 

Hi Yves,

This is actually very useful since these are categories that, in many 
cases, are not covered by the old LISA QA Model. This actually shows why 
we need to allow for declaring different kinds of QA categories and 
referring to them.

What intrigues me is that these are, for the most part, automatically 
checkable. I wonder to what extent finding these errors would correlate 
with finding other errors that require manual checking. Obviously the 
overlap would not be exact, but I do wonder if they correlate at all such 
that you could use automatic error checking to flag files that might ALSO 
need manual checking as well.

One note: I'm not sure what “The definition of those errors goes along 
with the item in the note” means in what you wrote.

Best,

-Arle

On Jul 5, 2012, at 14:28 , Yves Savourel wrote:

> Hi Arle, (CCing Phil in case he has input)
> 
>> Do you have a flat list of the error categories 
>> checked in Okapi? I need to start building a list 
>> of categories (and definitions) that could be 
>> placed on a website so there are URLs to point 
>> to the categories.
> 
> Here is the current list of QA item types that we use. That list evolves 
though.
> 
> MISSING_TARGETTU,
> MISSING_TARGETSEG,
> EXTRA_TARGETSEG,
> EMPTY_TARGETSEG,
> EMPTY_SOURCESEG,
> MISSING_LEADINGWS,
> MISSINGORDIFF_LEADINGWS,
> EXTRA_LEADINGWS,
> EXTRAORDIFF_LEADINGWS,
> MISSING_TRAILINGWS,
> MISSINGORDIFF_TRAILINGWS,
> EXTRA_TRAILINGWS,
> EXTRAORDIFF_TRAILINGWS,
> TARGET_SAME_AS_SOURCE,
> MISSING_CODE,
> EXTRA_CODE,
> SUSPECT_CODE,
> UNEXPECTED_PATTERN,
> SUSPECT_PATTERN,
> TARGET_LENGTH,
> ALLOWED_CHARACTERS,
> TERMINOLOGY,
> LANGUAGETOOL_ERROR
> 
> I'm not sure how useful this would be: The definition of those errors 
goes along with the item in the note.
> 
> -ys
> 

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.
www.vistatec.com
************************************************************

************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.

www.vistatec.com
************************************************************

Received on Thursday, 5 July 2012 13:45:53 UTC