RE: [ISSUE-42] Wording for the tool information markup

Dear Felix, Yves, Dear All,


W.r.t. the ongoing discussion on toolInfo and mtConfidence, I have in mind the following potential attributes proposed by Tilde in view of terminology use case, I mean, its-termInfoRef, its-termCandidate, and its-termConfidence and their values. These are not represented in the current draft  and if we go this way then we will have to discuss and, probably, add them. I can remember that Tadej raised this  questionin Prague and we did not talk about it, unfortunately. On the other hand, as soon as we start the project we will have opportunity and time to do it and my colleagues will also join the discussion.



With best wishes,

Tatiana

From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Wednesday, October 03, 2012 12:29 AM
To: Yves Savourel
Cc: public-multilingualweb-lt@w3.org
Subject: Re: [ISSUE-42] Wording for the tool information markup

Hi Yves, all,

no opinion on my side on the delimiter topic, sorry for bringing it up. A comment on the tool specific aspect below.
2012/10/2 Yves Savourel <ysavourel@enlaso.com<mailto:ysavourel@enlaso.com>>
> <doc its:toolRefs="mtConfidence/file:///tools.xml#T1"
> xlmns:its="http://www.w3.org/2005/11/its">
>
> Would it make sense to use a different delimiter? "/" may conflict with "/" in paths.
Hmm... almost any ASCII delimiter may also be in the path. The first occurrence is the delimiter.
But I suppose '|' could be used instead. It just doesn't look as graceful for some reason.


> Do you need the "dataCategory" attribute? It seems the
> data category is made explicit via the reference mechanism in "its:toolRefs".
> Also, dropping the "dataCategory" attribute allows then to refer to
> the same tools from various data categories - e.g. OKAPI used for quality
> issue versus for creating translation metadata etc.
I'm not sure we can go from many data category instances to one tool information. And this is where I'm having trouble with tool information:

The mtConfidence need to have a defined way to specify the engine used

Is there really a defined way? The current version of the draft at
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence-implementation
says:

"Some examples of values are:
A BCP 47 language tag with t-extension, e.g. ja-t-it for an Italian to Japanese MT engine
A Domain as per the Section 6.9: Domain
A privately structured string, eg. Domain:IT-Pair:IT-JA, IT-JA:Medical, etc."

To me that is the same as saying: you can use anything. Of course we can wrap the "anything" in a field saying "here is MT engine information". Is that what you mean?


, the Text analysis may need something else

I actually doubt that the text analysis "anything" will be more specific. My prediction is that there will be not more interop than saying "in this field there is data category specific information: ...".

So you could achieve that by changing your proposal like this

<its:processInfo>

 <its:toolInfo xml:id="T1">

  <its:toolName>Bing Translator</its:toolName>

  <its:toolVersion>123</its:toolVersion>

  <its:toolAddInfo datacategory="mtconfidence">ja-t-it</its:toolAddInfo>
 <its:toolInfo>

 <its:toolInfo xml:id="T2">

  <its:toolName>myMT</its:toolName>

  <its:toolVersion>456</its:toolVersion>

  <its:toolAddInfo datacategory="mtconfidence">Domain:IT-Pair:IT-JA</its:AddInfo>



 <its:toolInfo>

<its:processInfo>

and allow for several addInfo elements in one "toolInfo". You won't gain a lot from these, but not less as with "FR-to-EN-General" inside "toolValue" at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0000.html

Best,

Felix


, etc. It seems each data category will need one or two entry that mean different things depending on the data category. We can use a common element for this, but then we need to have one tool information per data category.

Maybe the examples people are working on (action items 239 to 243 for Arle, Phil, Declan and Tadej) will help in defining this.

Cheers
-yves





--
Felix Sasaki
DFKI / W3C Fellow

Received on Wednesday, 3 October 2012 13:16:00 UTC