- From: Yves Savourel <ysavourel@enlaso.com>
- Date: Thu, 5 Jul 2012 15:26:48 +0200
- To: <public-multilingualweb-lt@w3.org>
- Cc: "'Arle Lommel'" <arle.lommel@dfki.de>, "'Phil Ritchie'" <philr@vistatec.ie>, "'Felix Sasaki'" <felix.sasaki@dfki.de>
- Message-ID: <assp.05339e0857.assp.05334d3ec9.008d01cd5ab1$d56b9080$8042b180$@com>
Sure, sorry about that. Here we go: it’s posted. Cheers, -ys From: felix.sasaki@googlemail.com [mailto:felix.sasaki@googlemail.com] On Behalf Of Felix Sasaki Sent: Thursday, July 05, 2012 3:09 PM To: Yves Savourel Cc: Arle Lommel; Phil Ritchie Subject: Re: Error categories from Okapi Guys, a side note: would it be OK for you to have this discussion on the public list? I think it's a very useful thread and I it would be great if others could see it. Best, Felix 2012/7/5 Yves Savourel <ysavourel@enlaso.com> > I'll review and may have some questions for you. > For instance, I already want to know how you define > what gets flagged in ALLOWED_CHARACTERS. > Does it work with a user-defined set or something else. It flag the entry if it contains one or more characters not part of a user-defined set. One link that may help: http://www.opentag.com/okapi/wiki/index.php?title=CheckMate_-_Quality_Check_Configuration That is the help for the configuration file. The issue types are not listed (they are not visible to end-users), but you probably can get a lot from the options. Also: http://code.google.com/p/okapi/source/browse/okapi/libraries/lib-verification/src/main/java/net/sf/okapi/lib/verification/QualityChecker.java That is the code where the checks are done. If you look for the type (e.g. "ALLOWED_") you should see some error messages that go with it. Questions are welcome too, obviously. -ys -----Original Message----- From: Arle Lommel [mailto:arle.lommel@dfki.de] Sent: Thursday, July 05, 2012 2:58 PM To: Yves Savourel Cc: 'Felix Sasaki'; 'Phil Ritchie' Subject: Re: Error categories from Okapi Thanks Yves. You are correct. There are two things. I have wanted to address both and I think that our system can be agnostic as to how the errors are found. So I am working on a meta-framework that could, in principle, be used to describe ANY loc/translation-related error type. That is external to MLW-LT, but closely related. What I was interested in for definitions was an abstract description, something like MISSING_TRAILINGWS = "white space characters present in the source are not found in the target". If you don't have something like that, don't worry since I think your categories are pretty self-explananatory. I'll review and may have some questions for you. For instance, I already want to know how you define what gets flagged in ALLOWED_CHARACTERS. Does it work with a user-defined set or something else. Best, Arle On Jul 5, 2012, at 14:52 , Yves Savourel wrote: >> What intrigues me is that these are, for the most part, automatically >> checkable. I wonder to what extent finding these errors would >> correlate with finding other errors that require manual checking. > > They are all automatically found. > This system is not trying to record translation error found "manually". > > Your comment actually helps me to understand better the current details of those data categories. It seems we are talking about different but overlapping things. > > I'm talking about a common way to record issue found with automated checking. Like CheckMate, XBench, Trados, WF, and many other tool work. > > You seem to be talking about a way for proofers/testers to record the errors the found "manually". > > >> I'm not sure what “The definition of those errors goes along with the >> item in the note” means in what you wrote. > > I meant to say that the definition of what the issue is recorded along with the issue. So there is no real need to 'point to a definition'. Something similar to: > > <its:qaError type='MISSING_TRAILINGWS'> <note>The character ' ' is > missing at position 34</note> </its:qaError> > > Cheers, > -ys > > > -----Original Message----- > From: Arle Lommel [mailto:arle.lommel@dfki.de] > Sent: Thursday, July 05, 2012 2:39 PM > To: Yves Savourel > Cc: Felix Sasaki; Phil Ritchie > Subject: Re: Error categories from Okapi > > Hi Yves, > > This is actually very useful since these are categories that, in many cases, are not covered by the old LISA QA Model. This actually shows why we need to allow for declaring different kinds of QA categories and referring to them. > > What intrigues me is that these are, for the most part, automatically checkable. I wonder to what extent finding these errors would correlate with finding other errors that require manual checking. Obviously the overlap would not be exact, but I do wonder if they correlate at all such that you could use automatic error checking to flag files that might ALSO need manual checking as well. > > One note: I'm not sure what “The definition of those errors goes along with the item in the note” means in what you wrote. > > Best, > > -Arle > > On Jul 5, 2012, at 14:28 , Yves Savourel wrote: > >> Hi Arle, (CCing Phil in case he has input) >> >>> Do you have a flat list of the error categories checked in Okapi? I >>> need to start building a list of categories (and definitions) that >>> could be placed on a website so there are URLs to point to the >>> categories. >> >> Here is the current list of QA item types that we use. That list evolves though. >> >> MISSING_TARGETTU, >> MISSING_TARGETSEG, >> EXTRA_TARGETSEG, >> EMPTY_TARGETSEG, >> EMPTY_SOURCESEG, >> MISSING_LEADINGWS, >> MISSINGORDIFF_LEADINGWS, >> EXTRA_LEADINGWS, >> EXTRAORDIFF_LEADINGWS, >> MISSING_TRAILINGWS, >> MISSINGORDIFF_TRAILINGWS, >> EXTRA_TRAILINGWS, >> EXTRAORDIFF_TRAILINGWS, >> TARGET_SAME_AS_SOURCE, >> MISSING_CODE, >> EXTRA_CODE, >> SUSPECT_CODE, >> UNEXPECTED_PATTERN, >> SUSPECT_PATTERN, >> TARGET_LENGTH, >> ALLOWED_CHARACTERS, >> TERMINOLOGY, >> LANGUAGETOOL_ERROR >> >> I'm not sure how useful this would be: The definition of those errors goes along with the item in the note. >> >> -ys >> > -- Prof. Dr. Felix Sasaki Senior Researcher, Language Technology Lab Manager W3C Germany/Austrian Office DFKI GmbH, Alt-Moabit 91c, 10559 Berlin, Germany <http://www.dfki.de/> http://www.dfki.de phone: +49-30-23895-1807 (fax: -1810) ------------------------------------------------ Deutsches Forschungszentrum fuer Kuenstliche Intelligenz GmbH Firmensitz: Trippstadter Strasse 122, D-67663 Kaiserslautern Geschaeftsfuehrung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender), Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313
Received on Thursday, 5 July 2012 13:27:26 UTC