Re: Tool info specification (Re: action-221 summary of overriding discussion) from Felix Sasaki on 2012-09-23 (public-multilingualweb-lt@w3.org from September 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Sun, 23 Sep 2012 23:41:15 +0200
To: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czpPMxEpqKGPan9d5PknK8SoGaJxiezg-X7EzvOaCSSafQ@mail.gmail.com>
Hi Phil, all,

good point. The proposal is mostly targeted at mtConfidence and
textAnalysisAnnotation, but could also help with locQualityPrécis. The main
issue is probably what Yves said in a different mail: the tool reference is
somewhat different than data categories - hence my initial proposal at the
bottom of
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0119.html
to have them completely separated.

So it seems the main use cases for tool info are:

1) just one piece of tool information per document (Declan, Tadej).
2) attaching potential different pieces of tool information to one or
several elements like <alt-trans match-quality="100">.

Would it be OK for 2) to require that these pieces of tool information have
an ID? e.g.

<alt-trans match-quality="100"><target xml:lang="fr-fr"
its:mtConfidenceScore="98.67" xml:id="at1">Ceci est le texte du
message.</target></alt-trans>
<alt-trans match-quality="100" its:mtConfidenceScore="0.9876"><target
xml:lang="fr-fr" xml:id="at2>C'est le texte du message.</target></alt-trans>

We could then realize 1) with the following
doc: list of tools + data categories (e.g. with the XML format I proposed
based on OLIF)
and 2) like this
list of idrefs with tools + data categories.

Best,

Felix

2012/9/23 Phil Ritchie <philr@vistatec.ie>

> Lovely definition. I had always understood the geometric explanation but
> this usage is nice.
>
> I'm liking Felix's proposal and wondering (allowing for the fact I've not
> had caffeine yet this morning) if it could harmonise the locQualityPrécis
> category.
>
> Phil
>
>
>
> On 21 Sep 2012, at 19:39, "Arle Lommel" <arle.lommel@dfki.de> wrote:
>
> Orthogonal usually means that one issue intersects with another but is not
> otherwise determined by it. The term comes from the idea of a two-axis
> system in which the axes are at right angles (orthogonal).
>
> So they are not unrelated—their combination may be extremely important—but
> they can vary independently and the value of one does not entail any
> particular value for the other.
>
> Orthogonal is different from parallel, where there is a correspondence of
> value, and also from totally independent, where there is no meaningful
> intersection or relationship between two things.
>
> Hope that help explain the academic mumbo-jumbo
>
> Arle
>
> --
> *Arle Lommel*
> Berlin, Germany
> Skype: arle_lommel
> Phone (US): +1 707 709 8650
>
> *Sent from a mobile device. Please excuse any typos.*
>
> On Sep 21, 2012, at 18:22, Yves Savourel <ysavourel@enlaso.com> wrote:
>
> I think the issues you mention can be resolved, but first we'd need
>
> to agree on the following:
>
> ...
>
> Information about tools used for producing metadata (+content)
>
> is orthogonal to data categories
>
>
> Shockingly some of us don't have PhDs and, not being completely familiar
> with the academic lingo, may need a specific definition of what
> 'orthogonal' exactly means in this context :)
>
> For me, I agree that the information about the tool that was used to
> annotate the document is un-related to the information of the data category
> itself.
> With one exception: somewhere in the data category information there
> should be a way to point to the tool information.
>
> -yves
>
>
>
> From: Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>]
> Sent: Friday, September 21, 2012 8:13 AM
> To: Yves Savourel
> Cc: public-multilingualweb-lt@w3.org
> Subject: Re: Tool info specification (Re: action-221 summary of overriding
> discussion)
>
> Hi Yves,
> 2012/9/21 Yves Savourel <ysavourel@enlaso.com>
> Thanks for the example Felix,
>
> ... All tool specifications allow for identifying the relevant
>
> data categories. In that way it becomes explicit that e.g. a
>
> certain MT tool is relevant for mt-confidence.
>
>
> the tool specifications have "id" attributes, e.g. "t-2" for "bing"
> translator.
>
> Yves' requirement of referring to tool info from a piece of XLIFF could be
>
> realized by referring to the ID attribute.
>
> How exactly the relationship between the local data category markup and
> the tool is expressed?
>
> Currently not at all.
>
>
> It seems you are saying: the ITS way is to look at the
> itsDataCategoryIdentifer element in the tool info.
> That's clumsy IMO, but it is indeed preventing any tool-specific data on
> the data category side.
>
> Correct, that's a huge benefit IMO: to separate the metadata itself from
> information about production of metadata - or in the case production of
> content+metadata.
>
>
> But the case for several tools used for the same data category is not
> really catered for.
>
> Correct.
>
> When you say "referring to tool info from a piece of XLIFF could be
> realized by referring to the ID attribute" who is defining the attribute
> that does the referring? ITS or XLIFF?
>
> Good question :) In my mind it was XLIFF, but obviously you are pushing
> for a mechanism on the ITS side.
>
>
> If it's XLIFF, then I disagree: I think the ITS mechanism must have
> provision for both cases. (Actually I even think the MT case would tend to
> favor that multi-tool case: knowing which tool produced a given MT is
> probably more relevant when you have several candidates).
>
> Having such provision probably means some kind of tool-ref attribute in
> each data category using the tool information.
> Which means it probably needs to be specify for each local occurrence over
> and over again.
> We're back to square one, admittedly now with only one attribute referring
> to the tool info rather than with all the tool info... I suppose that's a
> progress :)
>
> Yes, that's a progress :)
>
> I think the issues you mention can be resolved, but first we'd need to
> agree on the following:
> - Partial inheritance is out of scope
> - Information about tools used for producing metadata (+content) is
> orthogonal to data categories
>
>
> Now, if we agree on that, I think it would be OK to have a data category
> "ITS Tool information" which is available both locally and globally.
> Locally, it would have the tool references you mention, e.g.
>
> <span its:tool-ref="#t1" ...> (in tool-ref there might be a
> comma-separated list of "ref" values)
> meaning Enrycher and the "disambiguation" data data category have been
> used to create metadata for the content of "span". We could also have a
> global rule like
>
> <its:toolInfoRule selector="trans-unit/target" tool-ref="#t-2"/>
> meaning that "Bing" translate has been used to create translated content
> and the mt confidence score information.
>
> What is the difference to previous approaches? With the above we don't
> change selection at all and actually don't see anything about the relation
> between data categories. E.g. there might be no disambiguation or
> mt-confidence annotation at all. The "toolInfo" data category allows
> applications to interrelate the annotations, if they are available - but we
> don't require testing that and don't create new conformance claims. That's
> a huge benefit IMO.
>
> If in above approach there is a "local" tool-ref attribute, that would
> inherit in the document. So since Declan and Tadej need a "document only"
> solution without XPath, that global approach would accomodate that.
>
> The "new" ITS mechanism of referencing is actually not new: we do that
> with standoff in localization quality issue already. And it seems that in
> the new draft of Provenance, standoff also would be much more appropriate,
> instead of too much usage of pointer attributes.
>
> Best,
>
> Felix
>
>
>
> ************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the sender immediately by e-mail.
>
> www.vistatec.com
> ************************************************************
>



-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Sunday, 23 September 2012 21:41:41 UTC