- From: Phil Ritchie <philr@vistatec.ie>
- Date: Thu, 2 Aug 2012 10:46:56 +0100
- To: Arle Lommel <arle.lommel@dfki.de>
- Cc: Multilingual Web LT Public List <public-multilingualweb-lt@w3.org>
- Message-ID: <OF67EA5670.60EC5586-ON80257A4E.003552B8-80257A4E.0035BC63@vistatec.ie>
Phil.
From: Arle Lommel <arle.lommel@dfki.de>
To: Multilingual Web LT Public List
<public-multilingualweb-lt@w3.org>,
Date: 02/08/2012 09:09
Subject: Re: [ISSUE 34] Potential problem with high-level quality
issues
Hi all,
I think Felix's mail gets at part of the confusion here. So let me try to
state the problem as I see it:
The tools that generate and receive the quality data (at least what we are
talking about here in terms of issue types) don't actually do anything
themselves. Their function is to preserve the data and to present it to
the user in an intelligible format. <pr>Exactly!</pr> Part of the reason
is that, at least in the case of something like CheckMate, the tools
doesn't know whether the issue is an error or not; all it is doing is
flagging something that may be an issue. For example, if the source starts
with six spaces and the target has only three non-breaking spaces instead,
it will flag that there is a difference in white-space, but it doesn't
have a way to know whether that was an actual error or something done
deliberately by a knowledgeable translator. All it can do is raise a hand
and politely say "guys, you might want to take a look at this because it
looks off."
So the problem with looking for interoperable workflow actions based on
the values of these attributes is that they really are just informational
in nature. In most cases the tools simply present an issue to the user and
allow him/her to take appropriate actions. I think those instances where
the tools take action on their own are pretty limited and trivial
(something like fixing the order of trailing punctuation).
I think Phil's original example of the HTML preview that highlights
segments with problems is actually typical of what one would normally
expect from quality categories.
Would something like highlighting purported errors for the user and color
coding them be sufficient? Since ultimately these categories are about
presenting the information to the user for further action, that actually
is the correct interpretation of the categories and anything more would be
problematic in most cases. <pr>See my previous comment about metrics. We
have automation which takes actions upon rolled up data, not each specific
error instance.</pr>
To think of this another way, consider the issues flagged by Language
Tool. What is the action that a tool should take based on them? The answer
really is "it depends". In the case of a tool like CheckMate, simply
passing them on and presenting them to the user is the appropriate action.
In any case, they will require human intervention and they are, at most, a
flag for attention that the human reviewer may or may not decide to act
on.
In other cases a tool may present quality data to the user who could
decide to do any number of things based on it:
Fix the error himself (e.g., the translator clearly mistyped a number and
it can be fixed).
Send it back to the translator for fixing
Send it to another translator for fixing
Do nothing (the "error" isn't an error or is so minor it doesn't justify
the cost of fixing)
Rejecting the entire translation because it is so ridden with errors that
it needs to be started over from scratch
What is done is not determined by the tool (although the tool may make
suggestions), but rather by the user, so the goal in all of this is to
pass the information on to the user in a way that is intelligible. The
category mapping is needed to help the tool decide how to present it to
the user and what to require of the user. For example, in one scenario a
tool may ignore whitespace errors entirely (e.g., they don't matter for
the format in question) but insist on having each and every terminology
issue checked and either fixed or explicitly acknowledged as being OK.
One possible workflow scenario corresponds roughly to Phil’s example: the
tool generates an HTML preview with embedded ITS quality metadata and then
uses a styling mechanism to control how that is visualized for the user.
For example, you might have something like this:
<!DOCTYPE html>
<html lang="en">
<head>
<title>Telharmonium 1897</title>
<style type="text/css">
[qualitytype=untranslated] {
border:5px solid green;
background-color: red;
}
</style>
</head>
<body>
<h1 id="h0001" qualitytype="untranslated">Telharmonium (1897)</h1>
<p id="p0001">
<span class="segment" id="s0001">Thaddeus Cahill (1867–1934)
conceived of
an instrument that could transmit its sound from a power
plant for
hundreds of miles to listeners over telegraph wiring.</span>
<span class="segment" id="s0002">Beginning in 1889 the sound
quality of
regular telephone concerts was very poor on account of the
buzzing
generated by carbon-granule microphones. As a result Cahill
decided to
set a new standard in perfection of sound quality with his
instrument,
a standard that would not only satisfy listeners but that
would overcome
all the flaws of traditional instruments.</span>
</p>
</body>
</html>
Which renders nicely as:
(Note that I used the non "its-" prefixed attribute name per Felix's email
yesterday).
This example also suggests to me that, per one of Yves' recent emails
(which I don't find at the moment), we need to split the attributes used
for the general type and the tool-specific type. If we glom them together
in one attribute with an internal syntax, we lose the ability to do this
sort of CSS-based highlighting and then need things like this:
<style type="text/css">
[qualitytype=whitespace\;okapi\:MISSING_LEADINGWS],
[qualitytype=whitespace\;okapi\:MISSINGORDIFF_LEADINGWS],
[qualitytype=whitespace\;okapi\:EXTRA_LEADINGWS],
[qualitytype=whitespace\;okapi\:EXTRAORDIFF_LEADINGWS],
[qualitytype=whitespace\;okapi\:MISSING_TRAILINGWS],
[qualitytype=whitespace\;okapi\:MISSINGORDIFF_TRAILINGWS],
[qualitytype=whitespace\;okapi\:EXTRAORDIFF_TRAILINGWS]
{
border:5px solid green;
background-color: red;
}
</style>
This kind of syntax would break the ability to automatically apply styling
based only on the top-level categories (which might be desirable if you
are using a browser to render ITS 2.0 quality-tagged data without
knowledge of the specific tool that is the source). Much better then to
split the attributes, which would make selection based on the top-level
categories much easier.
Felix, based on my explanation, would it be enough for an implementation
to say that a browser or tool simply displays the different issues to the
user in a visually distinct manner? If so, the bar for implementation
isn't so high and it would meet typical user needs quite well.
<pr>Great mail Arle. All the things I wanted to say but done more
eloquently!</pr>
Best,
-Arle
On Aug 2, 2012, at 7:23, Felix Sasaki <fsasaki@w3.org> wrote:
2012/8/1 Yves Savourel <ysavourel@enlaso.com>
Ok, sorry I missed the distinction in Arle’s note et read your email too
fast.
So this is a requirement that we put upon ourselves.
Yes.
> The test cases must be more robust that simply seeing
> that a tool identifies an issue and passes it on:
> we also need to see that they do this consistently with
> each other, which is hard since the set of issues
> from the various tools only partially overlap.
I’m not sure I get "we also need to see that they do this consistently
with each other". Each tool has its own set of issues. The only exchange
part between tools is when a tool A generates a list of qa notes and those
are then read into a tool B which do something with them.
My point is just: what useful thing can a tool do when all it knows is
that something is e.g. a grammar error? See the workflow I tried to
explain at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
The interoperability I can see is that, for example, when tool A and B
filter the same list of qa notes on the 'omission' type we get the same
sub-list.
If you mean that we must make sure that tool A map its issue that we see
as omissions to the 'omission' top-level types, that seems to be out of
our purview. Or am I missing something?
I am probably asking for mapping in the sense of
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
For other data categories, we have a small set of allowed values like
"yes" or "no". So even if we don't test that tools do the same stuff with
theses values, the value set is so small that the interpretation becomes
very clear. I just don't understand what useful and testable thing (one or
two) tools can do with a high level information like "this is a grammar
error". Maybe you or others can draft an example, filling 1-4 at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0032.html
in? That would help me a lot.
Best,
Felix
Cheers,
-ys
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Wednesday, August 01, 2012 7:07 PM
To: Yves Savourel
Cc: Arle Lommel; Multilingual Web LT Public List
Subject: Re: [ISSUE 34] Potential problem with high-level quality issues
2012/8/1 Yves Savourel <ysavourel@enlaso.com>
I’m not sure I completely understand the requirement. For each value we
need two applications that use it?
Did we have such requirement for 1.0?
No, we didn't, since - see below - the number of values was very small and
easy to understand.
With the need (more on that later) to convince people working on the
content production side of the usefulness of our metadata, I think we have
a higher bar than for locNoteType.
Best,
Felix
For example we have a locNoteType with ‘alert’ or ‘description’. Do we
have two applications that generate those two values?
Just wondering.
-ys
From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Wednesday, August 01, 2012 5:22 PM
To: Arle Lommel
Cc: Multilingual Web LT Public List
Subject: Re: [ISSUE 34] Potential problem with high-level quality issues
Hi Arle, all,
let me just add that for other data categories, we have only small set of
predefined values - e.g. for "Translate" only "yes" or "no", or for
localization note type "alert" or "description". Also, these values are
distinct - you have either "yes" or "no", so there is no danger of doing
the wrong thing then an application produces or consumes the values.
Finally, the categorization of an error seems to be difficult, with so
many categories being proposed.
This situation led me to the thinking that we should set a high bar for
the normative values - otherwise there won't be any interoperability of
what implementations produce or consume, as Arle described. I don't see a
clear way out, and I'm looking very much forward to feedback from
implementors - Yves, Phil etc.
Best,
Felix
2012/8/1 Arle Lommel <arle.lommel@dfki.de>
Hello all,
I was discussing the high-level quality issues with Felix this morning and
we have an issue. If they are to be normative, then we will need to find
at least two interoperable implementations for each value, not just for
the mechanism as a whole, and to test those implementations against test
cases. While that would not be hard for some like terminology, it would be
difficult for others like legal, because, while they are used in metrics,
they are not particularly embedded in tools that would produce or consume
ITS 2.0 markup.
One solution is to put the issue names in an informative annex and very
strongly recommend that they be used. That approach is, I realize,
unlikely to satisfy Yves, for good reason: if we cannot know what values
are allowed in that slot, then we cannot reliably expect interoperability.
At the same time, if we only go with those values for which we can find
two or more interoperable implementations, that list of 26 issues will
probably become something like six or eight, thereby leaving future tools
that might address the other issues out in the cold.
I have to confess that I do not see a solution to this issue right now
since we really need the values to be normative but if we cannot test them
in fairly short order they cannot be normative. The test cases must be
more robust that simply seeing that a tool identifies an issue and passes
it on: we also need to see that they do this consistently with each other,
which is hard since the set of issues from the various tools only
partially overlap.
If anyone has any brilliant ideas on how to solve the issue, please feel
free to chime in. We're still working on this and hope to find a way to
move forward with normative values.
Best,
Arle
--
Felix Sasaki
DFKI / W3C Fellow
************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the sender immediately by e-mail.
www.vistatec.com
************************************************************
Attachments
- image/png attachment: 01-part
Received on Thursday, 2 August 2012 09:47:32 UTC