RE: Tool info specification (Re: action-221 summary of overriding discussion) from Yves Savourel on 2012-09-21 (public-multilingualweb-lt@w3.org from September 2012)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Fri, 21 Sep 2012 07:11:09 -0600
To: "'Felix Sasaki'" <fsasaki@w3.org>, <public-multilingualweb-lt@w3.org>
Message-ID: <assp.0611c95844.assp.06119bc8a5.003f01cd97fa$92cd5200$b867f600$@com>
Thanks for the example Felix,

> ... All tool specifications allow for identifying the relevant 
> data categories. In that way it becomes explicit that e.g. a 
> certain MT tool is relevant for mt-confidence.
>
> the tool specifications have "id" attributes, e.g. "t-2" for "bing" translator.
> Yves' requirement of referring to tool info from a piece of XLIFF could be 
> realized by referring to the ID attribute.

How exactly the relationship between the local data category markup and the tool is expressed?

It seems you are saying: the ITS way is to look at the itsDataCategoryIdentifer element in the tool info.
That's clumsy IMO, but it is indeed preventing any tool-specific data on the data category side.

But the case for several tools used for the same data category is not really catered for.
When you say "referring to tool info from a piece of XLIFF could be realized by referring to the ID attribute" who is defining the attribute that does the referring? ITS or XLIFF?

If it's XLIFF, then I disagree: I think the ITS mechanism must have provision for both cases. (Actually I even think the MT case would tend to favor that multi-tool case: knowing which tool produced a given MT is probably more relevant when you have several candidates).

Having such provision probably means some kind of tool-ref attribute in each data category using the tool information.
Which means it probably needs to be specify for each local occurrence over and over again.
We're back to square one, admittedly now with only one attribute referring to the tool info rather than with all the tool info... I suppose that's a progress :)

-ys








From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Friday, September 21, 2012 2:28 AM
To: public-multilingualweb-lt@w3.org
Subject: Tool info specification (Re: action-221 summary of overriding discussion)

Hi all again,

to give a concrete example for the envisaged solution with a separate tool block, I have created examples.

The attached zip file contains
- tool-info.xsd: OLIF XML Schema fragment for specifying tool information
- tool-info-its.xsd: a modification of tool-info.xsd. To be able to make use of existing tool description, I renamed elements so that they are not specific to terminology extraction and left out various elements, and moved the "WorkflowProcessInfo" as part of the tool info. 
- xml.xsd: needed for XML attributes
- an example file tool-its-info-example.xml which contains tool specifications for Enrycher, two MT tools and Okapi. All tool specifications allow for identifying the relevant data categories. In that way it becomes explicit that e.g. a certain MT tool is relevant for mt-confidence.
- the tool specifications have "id" attributes, e.g. "t-2" for "bing" translator. Yves' requirement of referring to tool info from a piece of XLIFF could be realized by referring to the ID attribute.
- If we want to refer to this from HTML inline, we could have again an its-* block with span element, e.g. <span its-tool-name="enrycher">...</span>.

Comments welcome. Above can also serve as a way to do action-194 "Work on issue-42, provide examples and template for various data categories".

Best,

Felix 
2012/9/21 Felix Sasaki <fsasaki@w3.org>
One correction to the below: 


"The question is maybe: do you need XPath to specify that in XLIFF, and would it be OK for this information to be orthogonal? If not, there is no need to influence selection precedence."

This should have been "If this is the case, there is no need to influence selection precedence."

Also, apologies for repeating the "orthogonal proposal" so often :)

Best,

Felix

2012/9/21 Felix Sasaki <fsasaki@w3.org>
Hi Yves, Dave,

thanks for your feedback. We are probably stuck, since I still won't agree with partial overriding and also not with combining data categories. A few comments below. 
2012/9/21 Yves Savourel <ysavourel@enlaso.com>
Hi Felix, all,

The solution of some tool information being declared outside the usual selector mechanism may work for one set of instances of a given data category, but not several. For example:

In an XLIFF document I'd like to use the its:mtConfidenceScore attribute for each <m:match> element that holds a translation candidate for an entry (This is a likely real-life case, not just a random example). You can have multiple matches per entry, and they are very likely to be from different engines. Having a document-level tool information does not work. In this case we do need to have the tool information per entry.

Tool information is really orthogonal to data categories IMO - for the content itself (like in MT) and for ITS annotations you may want to express: who produced this? For ITS annotations, this is urgently needed for disambiguation. But actually each data category can be produced by a tool, and it would be useful to capture that information in an orthogonal manner IMO.

The question is maybe: do you need XPath to specify that in XLIFF, and would it be OK for this information to be orthogonal? If not, there is no need to influence selection precedence.
 


Looking again at the "partial override" solution I don't think pointers are a problem. They just tell where to get the information to apply to the node, as far as overriding goes it's no different than setting directly the information.


We can continue the discussion on partial override, but I would suggest to stop it. I can "promise" - as said before - that I would (formally) object against this, and this is very unlikely to change. The backwards compatibility, the ambiguity wrt the intention of the data category author (e.g. "is the 'alert' type of a locnote intended or not?"), and esp. the constraints about pieces of information are an issue. Such constraints are a different beast than pointers or standoff markup. With constraints I mean what we say e.g. with loc quality issues: "exactly one of the following, none or one of the following" or in other areas we say "optionally". With these mutually exclusive and other options the complexity rises: if there are two mutually exclusive items, one at a node and one inherited, which one takes precedence? Sure there can be answers, but these are data category specific and much more complex than "if a value doesn't exist on a node take the inherited one". 

With partial inheritance I would need to re-engineer my implementation, and very likely I wouldn't do that but rather drop the implementation completely. Even if that is no theoretical issue as you had mentioned in a mail before, I think it is a valid concern.

So my proposal to move forward would be to re-iterate the orthogonal character of tool information: it seems tool information is really the only case there the partial overriding or the data category combination (see comment below) have a strong case. 

Wrt to Dave's proposal from 
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0124.html

[
However, if we relax this assumption in a controlled way we can simply avoid partial override by designing certain data categories to be used in _combination_ (the subtle difference to a single data category being'compound'). In this event we can then either just live with the fact that there may be one or to data categories that may impart conformance individually even though they are not useful by themselves, or we add them as a specific exclusion to the single-data-category-for-conformance rules.
]
 
In my experience, if you introduce a feature once (combination of data categories), people will develop use case over time to have it in other areas to. So that will influence conformance a lot and in essence destroy a basic design principle from ITS 1.0: "It adopts the use of data categories to define *discrete* units of functionality".

So we are probably stuck.

My proposal to move forward would be to see if we agree on the orthogonal character of the tool information and define it for all data categories, as a separate piece of information and with the effect of a separate conformance clause, but with no effect on selection mechanisms. If we have that agreement we can explore how to accomodate Yves' requirement to attach the information not to a whole document, but to parts of it. Maybe even XPath is not needed for that. At least that is what Declan and Tadej said for mtConfidence and disambiguation on the call.

Best,

Felix



The case of the stand-off markup is specific (so far) to Localization Quality Issue. I haven't thought yet about what that implies for the "partial override" but it's likely that there are ways to specify what is done in those cases.
The bottom line is that all those local/global/standoff attributes specify information and are applies in a given order: we've got to be able to know if the information ABC exists or not when we apply the next rule, and therefore be able to keep the current value or override it depending on whether the next rule re-define that information or not.

Cheers,
-ys







-- 
Felix Sasaki
DFKI / W3C Fellow





-- 
Felix Sasaki
DFKI / W3C Fellow





-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Friday, 21 September 2012 13:11:49 UTC