- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sun, 23 Sep 2012 23:41:15 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czpPMxEpqKGPan9d5PknK8SoGaJxiezg-X7EzvOaCSSafQ@mail.gmail.com>
Hi Phil, all, good point. The proposal is mostly targeted at mtConfidence and textAnalysisAnnotation, but could also help with locQualityPrécis. The main issue is probably what Yves said in a different mail: the tool reference is somewhat different than data categories - hence my initial proposal at the bottom of http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0119.html to have them completely separated. So it seems the main use cases for tool info are: 1) just one piece of tool information per document (Declan, Tadej). 2) attaching potential different pieces of tool information to one or several elements like <alt-trans match-quality="100">. Would it be OK for 2) to require that these pieces of tool information have an ID? e.g. <alt-trans match-quality="100"><target xml:lang="fr-fr" its:mtConfidenceScore="98.67" xml:id="at1">Ceci est le texte du message.</target></alt-trans> <alt-trans match-quality="100" its:mtConfidenceScore="0.9876"><target xml:lang="fr-fr" xml:id="at2>C'est le texte du message.</target></alt-trans> We could then realize 1) with the following doc: list of tools + data categories (e.g. with the XML format I proposed based on OLIF) and 2) like this list of idrefs with tools + data categories. Best, Felix 2012/9/23 Phil Ritchie <philr@vistatec.ie> > Lovely definition. I had always understood the geometric explanation but > this usage is nice. > > I'm liking Felix's proposal and wondering (allowing for the fact I've not > had caffeine yet this morning) if it could harmonise the locQualityPrécis > category. > > Phil > > > > On 21 Sep 2012, at 19:39, "Arle Lommel" <arle.lommel@dfki.de> wrote: > > Orthogonal usually means that one issue intersects with another but is not > otherwise determined by it. The term comes from the idea of a two-axis > system in which the axes are at right angles (orthogonal). > > So they are not unrelated—their combination may be extremely important—but > they can vary independently and the value of one does not entail any > particular value for the other. > > Orthogonal is different from parallel, where there is a correspondence of > value, and also from totally independent, where there is no meaningful > intersection or relationship between two things. > > Hope that help explain the academic mumbo-jumbo > > Arle > > -- > *Arle Lommel* > Berlin, Germany > Skype: arle_lommel > Phone (US): +1 707 709 8650 > > *Sent from a mobile device. Please excuse any typos.* > > On Sep 21, 2012, at 18:22, Yves Savourel <ysavourel@enlaso.com> wrote: > > I think the issues you mention can be resolved, but first we'd need > > to agree on the following: > > ... > > Information about tools used for producing metadata (+content) > > is orthogonal to data categories > > > Shockingly some of us don't have PhDs and, not being completely familiar > with the academic lingo, may need a specific definition of what > 'orthogonal' exactly means in this context :) > > For me, I agree that the information about the tool that was used to > annotate the document is un-related to the information of the data category > itself. > With one exception: somewhere in the data category information there > should be a way to point to the tool information. > > -yves > > > > From: Felix Sasaki [mailto:fsasaki@w3.org <fsasaki@w3.org>] > Sent: Friday, September 21, 2012 8:13 AM > To: Yves Savourel > Cc: public-multilingualweb-lt@w3.org > Subject: Re: Tool info specification (Re: action-221 summary of overriding > discussion) > > Hi Yves, > 2012/9/21 Yves Savourel <ysavourel@enlaso.com> > Thanks for the example Felix, > > ... All tool specifications allow for identifying the relevant > > data categories. In that way it becomes explicit that e.g. a > > certain MT tool is relevant for mt-confidence. > > > the tool specifications have "id" attributes, e.g. "t-2" for "bing" > translator. > > Yves' requirement of referring to tool info from a piece of XLIFF could be > > realized by referring to the ID attribute. > > How exactly the relationship between the local data category markup and > the tool is expressed? > > Currently not at all. > > > It seems you are saying: the ITS way is to look at the > itsDataCategoryIdentifer element in the tool info. > That's clumsy IMO, but it is indeed preventing any tool-specific data on > the data category side. > > Correct, that's a huge benefit IMO: to separate the metadata itself from > information about production of metadata - or in the case production of > content+metadata. > > > But the case for several tools used for the same data category is not > really catered for. > > Correct. > > When you say "referring to tool info from a piece of XLIFF could be > realized by referring to the ID attribute" who is defining the attribute > that does the referring? ITS or XLIFF? > > Good question :) In my mind it was XLIFF, but obviously you are pushing > for a mechanism on the ITS side. > > > If it's XLIFF, then I disagree: I think the ITS mechanism must have > provision for both cases. (Actually I even think the MT case would tend to > favor that multi-tool case: knowing which tool produced a given MT is > probably more relevant when you have several candidates). > > Having such provision probably means some kind of tool-ref attribute in > each data category using the tool information. > Which means it probably needs to be specify for each local occurrence over > and over again. > We're back to square one, admittedly now with only one attribute referring > to the tool info rather than with all the tool info... I suppose that's a > progress :) > > Yes, that's a progress :) > > I think the issues you mention can be resolved, but first we'd need to > agree on the following: > - Partial inheritance is out of scope > - Information about tools used for producing metadata (+content) is > orthogonal to data categories > > > Now, if we agree on that, I think it would be OK to have a data category > "ITS Tool information" which is available both locally and globally. > Locally, it would have the tool references you mention, e.g. > > <span its:tool-ref="#t1" ...> (in tool-ref there might be a > comma-separated list of "ref" values) > meaning Enrycher and the "disambiguation" data data category have been > used to create metadata for the content of "span". We could also have a > global rule like > > <its:toolInfoRule selector="trans-unit/target" tool-ref="#t-2"/> > meaning that "Bing" translate has been used to create translated content > and the mt confidence score information. > > What is the difference to previous approaches? With the above we don't > change selection at all and actually don't see anything about the relation > between data categories. E.g. there might be no disambiguation or > mt-confidence annotation at all. The "toolInfo" data category allows > applications to interrelate the annotations, if they are available - but we > don't require testing that and don't create new conformance claims. That's > a huge benefit IMO. > > If in above approach there is a "local" tool-ref attribute, that would > inherit in the document. So since Declan and Tadej need a "document only" > solution without XPath, that global approach would accomodate that. > > The "new" ITS mechanism of referencing is actually not new: we do that > with standoff in localization quality issue already. And it seems that in > the new draft of Provenance, standoff also would be much more appropriate, > instead of too much usage of pointer attributes. > > Best, > > Felix > > > > ************************************************************ > This email and any files transmitted with it are confidential and > intended solely for the use of the individual or entity to whom they > are addressed. If you have received this email in error please notify > the sender immediately by e-mail. > > www.vistatec.com > ************************************************************ > -- Felix Sasaki DFKI / W3C Fellow
Received on Sunday, 23 September 2012 21:41:41 UTC