[ISSUE-34] - Forming of quality data category

Sure, sorry about that.

Here we go: it’s posted.

Cheers,

-ys

 

From: felix.sasaki@googlemail.com [mailto:felix.sasaki@googlemail.com] On Behalf Of Felix Sasaki
Sent: Thursday, July 05, 2012 3:09 PM
To: Yves Savourel
Cc: Arle Lommel; Phil Ritchie
Subject: Re: Error categories from Okapi

 

Guys, a side note: would it be OK for you to have this discussion on the public list? I think it's a very useful thread and I it would be great if others could see it.

 

Best,

 

Felix

2012/7/5 Yves Savourel <ysavourel@enlaso.com>

> I'll review and may have some questions for you.
> For instance, I already want to know how you define
> what gets flagged in ALLOWED_CHARACTERS.
> Does it work with a user-defined set or something else.

It flag the entry if it contains one or more characters not part of a user-defined set.

One link that may help:
http://www.opentag.com/okapi/wiki/index.php?title=CheckMate_-_Quality_Check_Configuration

That is the help for the configuration file. The issue types are not listed (they are not visible to end-users), but you probably can get a lot from the options.

Also:
http://code.google.com/p/okapi/source/browse/okapi/libraries/lib-verification/src/main/java/net/sf/okapi/lib/verification/QualityChecker.java

That is the code where the checks are done. If you look for the type (e.g. "ALLOWED_") you should see some error messages that go with it.

Questions are welcome too, obviously.

-ys



-----Original Message-----
From: Arle Lommel [mailto:arle.lommel@dfki.de]

Sent: Thursday, July 05, 2012 2:58 PM
To: Yves Savourel
Cc: 'Felix Sasaki'; 'Phil Ritchie'
Subject: Re: Error categories from Okapi

Thanks Yves.

You are correct. There are two things. I have wanted to address both and I think that our system can be agnostic as to how the errors are found. So I am working on a meta-framework that could, in principle, be used to describe ANY loc/translation-related error type. That is external to MLW-LT, but closely related.

What I was interested in for definitions was an abstract description, something like MISSING_TRAILINGWS = "white space characters present in the source are not found in the target". If you don't have something like that, don't worry since I think your categories are pretty self-explananatory.

I'll review and may have some questions for you. For instance, I already want to know how you define what gets flagged in ALLOWED_CHARACTERS. Does it work with a user-defined set or something else.

Best,

Arle

On Jul 5, 2012, at 14:52 , Yves Savourel wrote:

>> What intrigues me is that these are, for the most part, automatically
>> checkable. I wonder to what extent finding these errors would
>> correlate with finding other errors that require manual checking.
>
> They are all automatically found.
> This system is not trying to record translation error found "manually".
>
> Your comment actually helps me to understand better the current details of those data categories. It seems we are talking about different but overlapping things.
>
> I'm talking about a common way to record issue found with automated checking. Like CheckMate, XBench, Trados, WF, and many other tool work.
>
> You seem to be talking about a way for proofers/testers to record the errors the found "manually".
>
>
>> I'm not sure what “The definition of those errors goes along with the
>> item in the note” means in what you wrote.
>
> I meant to say that the definition of what the issue is recorded along with the issue. So there is no real need to 'point to a definition'. Something similar to:
>
> <its:qaError type='MISSING_TRAILINGWS'> <note>The character ' ' is
> missing at position 34</note> </its:qaError>
>
> Cheers,
> -ys
>
>
> -----Original Message-----
> From: Arle Lommel [mailto:arle.lommel@dfki.de]
> Sent: Thursday, July 05, 2012 2:39 PM
> To: Yves Savourel
> Cc: Felix Sasaki; Phil Ritchie
> Subject: Re: Error categories from Okapi
>
> Hi Yves,
>
> This is actually very useful since these are categories that, in many cases, are not covered by the old LISA QA Model. This actually shows why we need to allow for declaring different kinds of QA categories and referring to them.
>
> What intrigues me is that these are, for the most part, automatically checkable. I wonder to what extent finding these errors would correlate with finding other errors that require manual checking. Obviously the overlap would not be exact, but I do wonder if they correlate at all such that you could use automatic error checking to flag files that might ALSO need manual checking as well.
>
> One note: I'm not sure what “The definition of those errors goes along with the item in the note” means in what you wrote.
>
> Best,
>
> -Arle
>
> On Jul 5, 2012, at 14:28 , Yves Savourel wrote:
>
>> Hi Arle, (CCing Phil in case he has input)
>>
>>> Do you have a flat list of the error categories checked in Okapi? I
>>> need to start building a list of categories (and definitions) that
>>> could be placed on a website so there are URLs to point to the
>>> categories.
>>
>> Here is the current list of QA item types that we use. That list evolves though.
>>
>> MISSING_TARGETTU,
>> MISSING_TARGETSEG,
>> EXTRA_TARGETSEG,
>> EMPTY_TARGETSEG,
>> EMPTY_SOURCESEG,
>> MISSING_LEADINGWS,
>> MISSINGORDIFF_LEADINGWS,
>> EXTRA_LEADINGWS,
>> EXTRAORDIFF_LEADINGWS,
>> MISSING_TRAILINGWS,
>> MISSINGORDIFF_TRAILINGWS,
>> EXTRA_TRAILINGWS,
>> EXTRAORDIFF_TRAILINGWS,
>> TARGET_SAME_AS_SOURCE,
>> MISSING_CODE,
>> EXTRA_CODE,
>> SUSPECT_CODE,
>> UNEXPECTED_PATTERN,
>> SUSPECT_PATTERN,
>> TARGET_LENGTH,
>> ALLOWED_CHARACTERS,
>> TERMINOLOGY,
>> LANGUAGETOOL_ERROR
>>
>> I'm not sure how useful this would be: The definition of those errors goes along with the item in the note.
>>
>> -ys
>>
>







 

-- 
Prof. Dr. Felix Sasaki

Senior Researcher, Language Technology Lab

Manager W3C Germany/Austrian Office
DFKI GmbH, Alt-Moabit 91c, 10559 Berlin, Germany  <http://www.dfki.de/> http://www.dfki.de
phone: +49-30-23895-1807 (fax: -1810) 
------------------------------------------------
Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH
Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern
Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender), Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313

 

Received on Thursday, 5 July 2012 13:27:26 UTC