Re: Tool info specification (Re: action-221 summary of overriding discussion)

Hi Phil, all,

good point. The proposal is mostly targeted at mtConfidence and
textAnalysisAnnotation, but could also help with locQualityPrécis. The main
issue is probably what Yves said in a different mail: the tool reference is
somewhat different than data categories - hence my initial proposal at the
bottom of
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0119.html
to have them completely separated.

So it seems the main use cases for tool info are:

1) just one piece of tool information per document (Declan, Tadej).
2) attaching potential different pieces of tool information to one or
several elements like <alt-trans match-quality="100">.

Would it be OK for 2) to require that these pieces of tool information have
an ID? e.g.

<alt-trans match-quality="100"><target xml:lang="fr-fr"
its:mtConfidenceScore="98.67" xml:id="at1">Ceci est le texte du
message.</target></alt-trans>
<alt-trans match-quality="100" its:mtConfidenceScore="0.9876"><target
xml:lang="fr-fr" xml:id="at2>C'est le texte du message.</target></alt-trans>

We could then realize 1) with the following
doc: list of tools + data categories (e.g. with the XML format I proposed
based on OLIF)
and 2) like this
list of idrefs with tools + data categories.

Best,

Felix

2012/9/23 Phil Ritchie <philr@vistatec.ie>

> Lovely definition. I had always understood the geometric explanation but
> this usage is nice.
>
> I'm liking Felix's proposal and wondering (allowing for the fact I've not
> had caffeine yet this morning) if it could harmonise the locQualityPrécis
> category.
>
> Phil
>
>
>
> On 21 Sep 2012, at 19:39, "Arle Lommel" <arle.lommel@dfki.de> wrote:
>
> Orthogonal usually means that one issue intersects with another but is not
> otherwise determined by it. The term comes from the idea of a two-axis
> system in which the axes are at right angles (orthogonal).
>
> So they are not unrelated—their combination may be extremely important—but
> they can vary independently and the value of one does not entail any
> particular value for the other.
>
> Orthogonal is different from parallel, where there is a correspondence of
> value, and also from totally independent, where there is no meaningful
> intersection or relationship between two things.
>
> Hope that help explain the academic mumbo-jumbo
>
> Arle
>
> --
> *Arle Lommel*
> Berlin, Germany
> Skype: arle_lommel
> Phone (US): +1 707 709 8650
>
> *Sent from a mobile device. Please excuse any typos.*
>
> On Sep 21, 2012, at 18:22, Yves Savourel <ysavourel@enlaso.com> wrote:
>
> I think the issues you mention can be resolved, but first we'd need
>
> to agree on the following:
>
> ...
>
> Information about tools used for producing metadata (+content)
>
> is orthogonal to data categories
>
>
> Shockingly some of us don't have PhDs and, not being completely familiar
> with the academic lingo, may need a specific definition of what
> 'orthogonal' exactly means in this context :)
>
> For me, I agree that the information about the tool that was used to
> annotate the document is un-related to the information of the data category
> itself.
> With one exception: somewhere in the data category information there
> should be a way to point to the tool information.
>
> -yves
>
>
>
> From: Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>]
> Sent: Friday, September 21, 2012 8:13 AM
> To: Yves Savourel
> Cc: public-multilingualweb-lt@w3.org
> Subject: Re: Tool info specification (Re: action-221 summary of overriding
> discussion)
>
> Hi Yves,
> 2012/9/21 Yves Savourel <ysavourel@enlaso.com>
> Thanks for the example Felix,
>
> ... All tool specifications allow for identifying the relevant
>
> data categories. In that way it becomes explicit that e.g. a
>
> certain MT tool is relevant for mt-confidence.
>
>
> the tool specifications have "id" attributes, e.g. "t-2" for "bing"
> translator.
>
> Yves' requirement of referring to tool info from a piece of XLIFF could be
>
> realized by referring to the ID attribute.
>
> How exactly the relationship between the local data category markup and
> the tool is expressed?
>
> Currently not at all.
>
>
> It seems you are saying: the ITS way is to look at the
> itsDataCategoryIdentifer element in the tool info.
> That's clumsy IMO, but it is indeed preventing any tool-specific data on
> the data category side.
>
> Correct, that's a huge benefit IMO: to separate the metadata itself from
> information about production of metadata - or in the case production of
> content+metadata.
>
>
> But the case for several tools used for the same data category is not
> really catered for.
>
> Correct.
>
> When you say "referring to tool info from a piece of XLIFF could be
> realized by referring to the ID attribute" who is defining the attribute
> that does the referring? ITS or XLIFF?
>
> Good question :) In my mind it was XLIFF, but obviously you are pushing
> for a mechanism on the ITS side.
>
>
> If it's XLIFF, then I disagree: I think the ITS mechanism must have
> provision for both cases. (Actually I even think the MT case would tend to
> favor that multi-tool case: knowing which tool produced a given MT is
> probably more relevant when you have several candidates).
>
> Having such provision probably means some kind of tool-ref attribute in
> each data category using the tool information.
> Which means it probably needs to be specify for each local occurrence over
> and over again.
> We're back to square one, admittedly now with only one attribute referring
> to the tool info rather than with all the tool info... I suppose that's a
> progress :)
>
> Yes, that's a progress :)
>
> I think the issues you mention can be resolved, but first we'd need to
> agree on the following:
> - Partial inheritance is out of scope
> - Information about tools used for producing metadata (+content) is
> orthogonal to data categories
>
>
> Now, if we agree on that, I think it would be OK to have a data category
> "ITS Tool information" which is available both locally and globally.
> Locally, it would have the tool references you mention, e.g.
>
> <span its:tool-ref="#t1" ...> (in tool-ref there might be a
> comma-separated list of "ref" values)
> meaning Enrycher and the "disambiguation" data data category have been
> used to create metadata for the content of "span". We could also have a
> global rule like
>
> <its:toolInfoRule selector="trans-unit/target" tool-ref="#t-2"/>
> meaning that "Bing" translate has been used to create translated content
> and the mt confidence score information.
>
> What is the difference to previous approaches? With the above we don't
> change selection at all and actually don't see anything about the relation
> between data categories. E.g. there might be no disambiguation or
> mt-confidence annotation at all. The "toolInfo" data category allows
> applications to interrelate the annotations, if they are available - but we
> don't require testing that and don't create new conformance claims. That's
> a huge benefit IMO.
>
> If in above approach there is a "local" tool-ref attribute, that would
> inherit in the document. So since Declan and Tadej need a "document only"
> solution without XPath, that global approach would accomodate that.
>
> The "new" ITS mechanism of referencing is actually not new: we do that
> with standoff in localization quality issue already. And it seems that in
> the new draft of Provenance, standoff also would be much more appropriate,
> instead of too much usage of pointer attributes.
>
> Best,
>
> Felix
>
>
>
> ************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the sender immediately by e-mail.
>
> www.vistatec.com
> ************************************************************
>



-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Sunday, 23 September 2012 21:41:41 UTC