Re: [ISSUE 34] Potential problem with high-level quality issues

Hi all,

I think Felix's mail gets at part of the confusion here. So let me try to state the problem as I see it:

The tools that generate and receive the quality data (at least what we are talking about here in terms of issue types) don't actually do anything themselves. Their function is to preserve the data and to present it to the user in an intelligible format. Part of the reason is that, at least in the case of something like CheckMate, the tools doesn't know whether the issue is an error or not; all it is doing is flagging something that may be an issue. For example, if the source starts with six spaces and the target has only three non-breaking spaces instead, it will flag that there is a difference in white-space, but it doesn't have a way to know whether that was an actual error or something done deliberately by a knowledgeable translator. All it can do is raise a hand and politely say "guys, you might want to take a look at this because it looks off."

So the problem with looking for interoperable workflow actions based on the values of these attributes is that they really are just informational in nature. In most cases the tools simply present an issue to the user and allow him/her to take appropriate actions. I think those instances where the tools take action on their own are pretty limited and trivial (something like fixing the order of trailing punctuation).

I think Phil's original example of the HTML preview that highlights segments with problems is actually typical of what one would normally expect from quality categories.

Would something like highlighting purported errors for the user and color coding them be sufficient? Since ultimately these categories are about presenting the information to the user for further action, that actually is the correct interpretation of the categories and anything more would be problematic in most cases.


To think of this another way, consider the issues flagged by Language Tool. What is the action that a tool should take based on them? The answer really is "it depends". In the case of a tool like CheckMate, simply passing them on and presenting them to the user is the appropriate action. In any case, they will require human intervention and they are, at most, a flag for attention that the human reviewer may or may not decide to act on.

In other cases a tool may present quality data to the user who could decide to do any number of things based on it:

Fix the error himself (e.g., the translator clearly mistyped a number and it can be fixed).
Send it back to the translator for fixing
Send it to another translator for fixing
Do nothing (the "error" isn't an error or is so minor it doesn't justify the cost of fixing)
Rejecting the entire translation because it is so ridden with errors that it needs to be started over from scratch

What is done is not determined by the tool (although the tool may make suggestions), but rather by the user, so the goal in all of this is to pass the information on to the user in a way that is intelligible. The category mapping is needed to help the tool decide how to present it to the user and what to require of the user. For example, in one scenario a tool may ignore whitespace errors entirely (e.g., they don't matter for the format in question) but insist on having each and every terminology issue checked and either fixed or explicitly acknowledged as being OK.

One possible workflow scenario corresponds roughly to Phil’s example: the tool generates an HTML preview with embedded ITS quality metadata and then uses a styling mechanism to control how that is visualized for the user. For example, you might have something like this:

<!DOCTYPE html>
<html lang="en">
   <head>
      <title>Telharmonium 1897</title>
      <style type="text/css">
         [qualitytype=untranslated] {
            border:5px solid green;
            background-color: red;
         }
      </style>
   </head>
   <body>
      <h1 id="h0001" qualitytype="untranslated">Telharmonium (1897)</h1>
      <p id="p0001">
         <span class="segment" id="s0001">Thaddeus Cahill (1867–1934) conceived of
             an instrument that could transmit its sound from a power plant for
             hundreds of miles to listeners over telegraph wiring.</span>
         <span class="segment" id="s0002">Beginning in 1889 the sound quality of
             regular telephone concerts was very poor on account of the buzzing
             generated by carbon-granule microphones. As a result Cahill decided to
             set a new standard in perfection of sound quality with his instrument,
             a standard that would not only satisfy listeners but that would overcome
             all the flaws of traditional instruments.</span>
      </p>
   </body>
</html>

Which renders nicely as:


(Note that I used the non "its-" prefixed attribute name per Felix's email yesterday).

This example also suggests to me that, per one of Yves' recent emails (which I don't find at the moment), we need to split the attributes used for the general type and the tool-specific type. If we glom them together in one attribute with an internal syntax, we lose the ability to do this sort of CSS-based highlighting and then need things like this:

<style type="text/css">
   [qualitytype=whitespace\;okapi\:MISSING_LEADINGWS],
   [qualitytype=whitespace\;okapi\:MISSINGORDIFF_LEADINGWS],
   [qualitytype=whitespace\;okapi\:EXTRA_LEADINGWS],
   [qualitytype=whitespace\;okapi\:EXTRAORDIFF_LEADINGWS],
   [qualitytype=whitespace\;okapi\:MISSING_TRAILINGWS],
   [qualitytype=whitespace\;okapi\:MISSINGORDIFF_TRAILINGWS],
   [qualitytype=whitespace\;okapi\:EXTRAORDIFF_TRAILINGWS]
   {
      border:5px solid green;
      background-color: red;
   }
</style>

This kind of syntax would break the ability to automatically apply styling based only on the top-level categories (which might be desirable if you are using a browser to render ITS 2.0 quality-tagged data without knowledge of the specific tool that is the source). Much better then to split the attributes, which would make selection based on the top-level categories much easier.

Felix, based on my explanation, would it be enough for an implementation to say that a browser or tool simply displays the different issues to the user in a visually distinct manner? If so, the bar for implementation isn't so high and it would meet typical user needs quite well.

Best,

-Arle

On Aug 2, 2012, at 7:23, Felix Sasaki <fsasaki@w3.org> wrote:

> 2012/8/1 Yves Savourel <ysavourel@enlaso.com>
> Ok, sorry I missed the distinction in Arle’s note et read your email too fast.
> 
> So this is a requirement that we put upon ourselves.
> 
> Yes.
>  
> 
> > The test cases must be more robust that simply seeing
> > that a tool identifies an issue and passes it on:
> > we also need to see that they do this consistently with
> > each other, which is hard since the set of issues
> > from the various tools only partially overlap.
> 
> I’m not sure I get "we also need to see that they do this consistently with each other". Each tool has its own set of issues. The only exchange part between tools is when a tool A generates a list of qa notes and those are then read into a tool B which do something with them.
> 
> My point is just: what useful thing can a tool do when all it knows is that something is e.g. a grammar error? See the workflow I tried to explain at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
>  
> 
> The interoperability I can see is that, for example, when tool A and B filter the same list of qa notes on the 'omission' type we get the same sub-list.
> 
> If you mean that we must make sure that tool A map its issue that we see as omissions to the 'omission' top-level types, that seems to be out of our purview. Or am I missing something?
> 
> I am probably asking for mapping in the sense of
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
> 
> For other data categories, we have a small set of allowed values like "yes" or "no". So even if we don't test that tools do the same stuff with theses values, the value set is so small that the interpretation becomes very clear. I just don't understand what useful and testable thing (one or two) tools can do with a high level information like "this is a grammar error". Maybe you or others can draft an example, filling 1-4 at
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
> in? That would help me a lot.
> 
> Best, 
> 
> Felix 
>  
> 
> Cheers,
> -ys
> 
> 
> 
> 
> From: Felix Sasaki [mailto:fsasaki@w3.org]
> Sent: Wednesday, August 01, 2012 7:07 PM
> To: Yves Savourel
> Cc: Arle Lommel; Multilingual Web LT Public List
> Subject: Re: [ISSUE 34] Potential problem with high-level quality issues
> 
> 
> 2012/8/1 Yves Savourel <ysavourel@enlaso.com>
> I’m not sure I completely understand the requirement. For each value we need two applications that use it?
> 
> Did we have such requirement for 1.0?
> 
> No, we didn't, since - see below - the number of values was very small and easy to understand.
> 
> With the need (more on that later) to convince people working on the content production side of the usefulness of our metadata, I think we have a higher bar than for locNoteType.
> 
> Best,
> 
> Felix
> 
> 
> For example we have a locNoteType with ‘alert’ or ‘description’. Do we have two applications that generate those two values?
> 
> Just wondering.
> -ys
> 
> From: Felix Sasaki [mailto:fsasaki@w3.org]
> Sent: Wednesday, August 01, 2012 5:22 PM
> To: Arle Lommel
> Cc: Multilingual Web LT Public List
> 
> Subject: Re: [ISSUE 34] Potential problem with high-level quality issues
> 
> Hi Arle, all,
> 
> let me just add that for other data categories, we have only small set of predefined values - e.g. for "Translate" only "yes" or "no", or for localization note type "alert" or "description". Also, these values are distinct - you have either "yes" or "no", so there is no danger of doing the wrong thing then an application produces or consumes the values. Finally, the categorization of an error seems to be difficult, with so many categories being proposed.
> 
> This situation led me to the thinking that we should set a high bar for the normative values - otherwise there won't be any interoperability of what implementations produce or consume, as Arle described. I don't see a clear way out, and I'm looking very much forward to feedback from implementors - Yves, Phil etc.
> 
> Best,
> 
> Felix
> 
> 2012/8/1 Arle Lommel <arle.lommel@dfki.de>
> Hello all,
> 
> I was discussing the high-level quality issues with Felix this morning and we have an issue. If they are to be normative, then we will need to find at least two interoperable implementations for each value, not just for the mechanism as a whole, and to test those implementations against test cases. While that would not be hard for some like terminology, it would be difficult for others like legal, because, while they are used in metrics, they are not particularly embedded in tools that would produce or consume ITS 2.0 markup.
> 
> One solution is to put the issue names in an informative annex and very strongly recommend that they be used. That approach is, I realize, unlikely to satisfy Yves, for good reason: if we cannot know what values are allowed in that slot, then we cannot reliably expect interoperability. At the same time, if we only go with those values for which we can find two or more interoperable implementations, that list of 26 issues will probably become something like six or eight, thereby leaving future tools that might address the other issues out in the cold.
> 
> I have to confess that I do not see a solution to this issue right now since we really need the values to be normative but if we cannot test them in fairly short order they cannot be normative. The test cases must be more robust that simply seeing that a tool identifies an issue and passes it on: we also need to see that they do this consistently with each other, which is hard since the set of issues from the various tools only partially overlap.
> 
> If anyone has any brilliant ideas on how to solve the issue, please feel free to chime in. We're still working on this and hope to find a way to move forward with normative values.
> 
> Best,
> 
> Arle
> 
> 
> 
> 
> --
> Felix Sasaki
> DFKI / W3C Fellow
> 
> 
> 
> 
> 

Received on Thursday, 2 August 2012 08:09:00 UTC