Tool info specification (Re: action-221 summary of overriding discussion)

Hi all again,

to give a concrete example for the envisaged solution with a separate tool
block, I have created examples.

The attached zip file contains
- tool-info.xsd: OLIF XML Schema fragment for specifying tool information
- tool-info-its.xsd: a modification of tool-info.xsd. To be able to make
use of existing tool description, I renamed elements so that they are not
specific to terminology extraction and left out various elements, and moved
the "WorkflowProcessInfo" as part of the tool info.
- xml.xsd: needed for XML attributes
- an example file tool-its-info-example.xml which contains tool
specifications for Enrycher, two MT tools and Okapi. All tool
specifications allow for identifying the relevant data categories. In that
way it becomes explicit that e.g. a certain MT tool is relevant for
mt-confidence.
- the tool specifications have "id" attributes, e.g. "t-2" for "bing"
translator. Yves' requirement of referring to tool info from a piece of
XLIFF could be realized by referring to the ID attribute.
- If we want to refer to this from HTML inline, we could have again an
its-* block with span element, e.g. <span
its-tool-name="enrycher">...</span>.

Comments welcome. Above can also serve as a way to do action-194 "Work on
issue-42, provide examples and template for various data categories".

Best,

Felix

2012/9/21 Felix Sasaki <fsasaki@w3.org>

> One correction to the below:
>
>
> "The question is maybe: do you need XPath to specify that in XLIFF, and
> would it be OK for this information to be orthogonal? If not, there is no
> need to influence selection precedence."
>
> This should have been "If this is the case, there is no need to influence
> selection precedence."
>
> Also, apologies for repeating the "orthogonal proposal" so often :)
>
> Best,
>
> Felix
>
>
> 2012/9/21 Felix Sasaki <fsasaki@w3.org>
>
>> Hi Yves, Dave,
>>
>> thanks for your feedback. We are probably stuck, since I still won't
>> agree with partial overriding and also not with combining data categories.
>> A few comments below.
>>
>>  2012/9/21 Yves Savourel <ysavourel@enlaso.com>
>>
>>> Hi Felix, all,
>>>
>>> The solution of some tool information being declared outside the usual
>>> selector mechanism may work for one set of instances of a given data
>>> category, but not several. For example:
>>>
>>> In an XLIFF document I'd like to use the its:mtConfidenceScore attribute
>>> for each <m:match> element that holds a translation candidate for an entry
>>> (This is a likely real-life case, not just a random example). You can have
>>> multiple matches per entry, and they are very likely to be from different
>>> engines. Having a document-level tool information does not work. In this
>>> case we do need to have the tool information per entry.
>>>
>>
>> Tool information is really orthogonal to data categories IMO - for the
>> content itself (like in MT) and for ITS annotations you may want to
>> express: who produced this? For ITS annotations, this is urgently needed
>> for disambiguation. But actually each data category can be produced by a
>> tool, and it would be useful to capture that information in an orthogonal
>> manner IMO.
>>
>> The question is maybe: do you need XPath to specify that in XLIFF, and
>> would it be OK for this information to be orthogonal? If not, there is no
>> need to influence selection precedence.
>>
>>
>>>
>>>
>>> Looking again at the "partial override" solution I don't think pointers
>>> are a problem. They just tell where to get the information to apply to the
>>> node, as far as overriding goes it's no different than setting directly the
>>> information.
>>>
>>
>>
>> We can continue the discussion on partial override, but I would suggest
>> to stop it. I can "promise" - as said before - that I would (formally)
>> object against this, and this is very unlikely to change. The backwards
>> compatibility, the ambiguity wrt the intention of the data category author
>> (e.g. "is the 'alert' type of a locnote intended or not?"), and esp. the
>> constraints about pieces of information are an issue. Such constraints are
>> a different beast than pointers or standoff markup. With constraints I mean
>> what we say e.g. with loc quality issues: "exactly one of the following,
>> none or one of the following" or in other areas we say "optionally". With
>> these mutually exclusive and other options the complexity rises: if there
>> are two mutually exclusive items, one at a node and one inherited, which
>> one takes precedence? Sure there can be answers, but these are data
>> category specific and much more complex than "if a value doesn't exist on a
>> node take the inherited one".
>>
>> With partial inheritance I would need to re-engineer my implementation,
>> and very likely I wouldn't do that but rather drop the implementation
>> completely. Even if that is no theoretical issue as you had mentioned in a
>> mail before, I think it is a valid concern.
>>
>> So my proposal to move forward would be to re-iterate the orthogonal
>> character of tool information: it seems tool information is really the only
>> case there the partial overriding or the data category combination (see
>> comment below) have a strong case.
>>
>> Wrt to Dave's proposal from
>>
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0124.html
>>
>> [
>>
>> However, if we relax this assumption in a controlled way we can simply
>> avoid partial override by designing certain data categories to be used in
>> _combination_ (the subtle difference to a single data category
>> being'compound'). In this event we can then either just live with the fact
>> that there may be one or to data categories that may impart conformance
>> individually even though they are not useful by themselves, or we add them
>> as a specific exclusion to the single-data-category-for-conformance rules.
>> ]
>>
>> In my experience, if you introduce a feature once (combination of data
>> categories), people will develop use case over time to have it in other
>> areas to. So that will influence conformance a lot and in essence destroy a
>> basic design principle from ITS 1.0: "It adopts the use of data categories
>> to define *discrete* units of functionality".
>>
>> So we are probably stuck.
>>
>> My proposal to move forward would be to see if we agree on the orthogonal
>> character of the tool information and define it for all data categories, as
>> a separate piece of information and with the effect of a separate
>> conformance clause, but with no effect on selection mechanisms. If we have
>> that agreement we can explore how to accomodate Yves' requirement to attach
>> the information not to a whole document, but to parts of it. Maybe even
>> XPath is not needed for that. At least that is what Declan and Tadej said
>> for mtConfidence and disambiguation on the call.
>>
>> Best,
>>
>> Felix
>>
>>
>>
>>> The case of the stand-off markup is specific (so far) to Localization
>>> Quality Issue. I haven't thought yet about what that implies for the
>>> "partial override" but it's likely that there are ways to specify what is
>>> done in those cases.
>>> The bottom line is that all those local/global/standoff attributes
>>> specify information and are applies in a given order: we've got to be able
>>> to know if the information ABC exists or not when we apply the next rule,
>>> and therefore be able to keep the current value or override it depending on
>>> whether the next rule re-define that information or not.
>>>
>>> Cheers,
>>> -ys
>>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Felix Sasaki
>> DFKI / W3C Fellow
>>
>>
>
>
> --
> Felix Sasaki
> DFKI / W3C Fellow
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Friday, 21 September 2012 08:28:41 UTC