Re: Tool info specification (Re: action-221 summary of overriding discussion) from Felix Sasaki on 2012-09-21 (public-multilingualweb-lt@w3.org from September 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 21 Sep 2012 16:13:01 +0200
To: Yves Savourel <ysavourel@enlaso.com>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czqi5qf_EBYLm5NdToFcJdMdsPpQKNXWTV4+cetnw2eReg@mail.gmail.com>

Hi Yves,

2012/9/21 Yves Savourel <ysavourel@enlaso.com>

> Thanks for the example Felix,
>
> > ... All tool specifications allow for identifying the relevant
> > data categories. In that way it becomes explicit that e.g. a
> > certain MT tool is relevant for mt-confidence.
> >
> > the tool specifications have "id" attributes, e.g. "t-2" for "bing"
> translator.
> > Yves' requirement of referring to tool info from a piece of XLIFF could
> be
> > realized by referring to the ID attribute.
>
> How exactly the relationship between the local data category markup and
> the tool is expressed?
>

Currently not at all.


>
> It seems you are saying: the ITS way is to look at the
> itsDataCategoryIdentifer element in the tool info.
> That's clumsy IMO, but it is indeed preventing any tool-specific data on
> the data category side.
>

Correct, that's a huge benefit IMO: to separate the metadata itself from
information about production of metadata - or in the case production of
content+metadata.


>
> But the case for several tools used for the same data category is not
> really catered for.
>

Correct.


> When you say "referring to tool info from a piece of XLIFF could be
> realized by referring to the ID attribute" who is defining the attribute
> that does the referring? ITS or XLIFF?
>

Good question :) In my mind it was XLIFF, but obviously you are pushing for
a mechanism on the ITS side.


>
> If it's XLIFF, then I disagree: I think the ITS mechanism must have
> provision for both cases. (Actually I even think the MT case would tend to
> favor that multi-tool case: knowing which tool produced a given MT is
> probably more relevant when you have several candidates).
>
> Having such provision probably means some kind of tool-ref attribute in
> each data category using the tool information.
> Which means it probably needs to be specify for each local occurrence over
> and over again.
> We're back to square one, admittedly now with only one attribute referring
> to the tool info rather than with all the tool info... I suppose that's a
> progress :)
>

Yes, that's a progress :)

I think the issues you mention can be resolved, but first we'd need to
agree on the following:
- Partial inheritance is out of scope
- Information about tools used for producing metadata (+content) is
orthogonal to data categories


Now, if we agree on that, I think it would be OK to have a data category
"ITS Tool information" which is available both locally and globally.
Locally, it would have the tool references you mention, e.g.

<span its:tool-ref="#t1" ...> (in tool-ref there might be a comma-separated
list of "ref" values)
meaning Enrycher and the "disambiguation" data data category have been used
to create metadata for the content of "span". We could also have a global
rule like

<its:toolInfoRule selector="trans-unit/target" tool-ref="#t-2"/>
meaning that "Bing" translate has been used to create translated content
and the mt confidence score information.

What is the difference to previous approaches? With the above we don't
change selection at all and actually don't see anything about the relation
between data categories. E.g. there might be no disambiguation or
mt-confidence annotation at all. The "toolInfo" data category allows
applications to interrelate the annotations, if they are available - but we
don't require testing that and don't create new conformance claims. That's
a huge benefit IMO.

If in above approach there is a "local" tool-ref attribute, that would
inherit in the document. So since Declan and Tadej need a "document only"
solution without XPath, that global approach would accomodate that.

The "new" ITS mechanism of referencing is actually not new: we do that with
standoff in localization quality issue already. And it seems that in the
new draft of Provenance, standoff also would be much more appropriate,
instead of too much usage of pointer attributes.

Best,

Felix

Received on Friday, 21 September 2012 14:13:35 UTC