W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > October 2012

Tool info only URI? (Re: [ISSUE-42] Wording for the tool information markup)

From: Felix Sasaki <fsasaki@w3.org>
Date: Sat, 6 Oct 2012 06:47:23 +0200
Message-ID: <CAL58czp1vgr1XqchWcJQrXXna77oBsfagduwDzpac+Ygz3UC9w@mail.gmail.com>
To: public-multilingualweb-lt@w3.org
Hi Tadej, all,

thanks, Tadej. So I think that tool name and tool version would probably be
enough for text analytics annotation.

I am now wondering whether that information could be "packed" into a URI -
Yves made a proposal at

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0035.html

the "text-analytics-agent-ref" attribute at
http://www.w3.org/International/multilingualweb/lt/wiki/images/b/b9/Disambiguation-prague-2012.pdf
(slide "output example")
is just a URI.

What do people think for "their" data categories? Declan, Phil? See
action-240, action-241 and action-242.

It also seems that asking implementations to create a URI is less error
prone and invasive than requiring separate fields.

Best,

Felix

2012/10/5 Tadej Štajner <tadej.stajner@ijs.si>

>  Hi, Yves, Felix,
> from the TA tool standpoint, the information present toolName, toolVersion
> and toolAddInfo cover all use cases I've encountered so far with regard to
> figuring out what was generated where.
>
> I have a feeling that toolAddInfo has the risk of becoming a 'kitchen
> sink' attribute, and I'd pre-empt that with prescribing what kind of things
> 'should' be there (for instance, for MT: language pairs, engine, for TA:
> inner engine, model parameters). Structuring the toolInfo data model even
> more is likely overkill, and this looks like a good sweet spot.
>
> -- Tadej
>
>
>
> On 02. 10. 2012 23:28, Felix Sasaki wrote:
>
> Hi Yves, all,
>
>  no opinion on my side on the delimiter topic, sorry for bringing it up.
> A comment on the tool specific aspect below.
>
> 2012/10/2 Yves Savourel <ysavourel@enlaso.com>
>
>> > <doc its:toolRefs="mtConfidence/file:///tools.xml#T1"
>> > xlmns:its="http://www.w3.org/2005/11/its">
>> >
>>  > Would it make sense to use a different delimiter? "/" may conflict
>> with "/" in paths.
>>
>>  Hmm... almost any ASCII delimiter may also be in the path. The first
>> occurrence is the delimiter.
>> But I suppose '|' could be used instead. It just doesn't look as graceful
>> for some reason.
>>
>>
>> > Do you need the "dataCategory" attribute? It seems the
>> > data category is made explicit via the reference mechanism in
>> "its:toolRefs".
>> > Also, dropping the "dataCategory" attribute allows then to refer to
>> > the same tools from various data categories - e.g. OKAPI used for
>> quality
>> > issue versus for creating translation metadata etc.
>>
>>  I'm not sure we can go from many data category instances to one tool
>> information. And this is where I'm having trouble with tool information:
>>
>> The mtConfidence need to have a defined way to specify the engine used
>
>
>  Is there really a defined way? The current version of the draft at
>
> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence-implementation
> says:
>
>  "Some examples of values are:
> A BCP 47 language tag with t-extension, e.g. ja-t-it for an Italian to
> Japanese MT engine
> A Domain as per the Section 6.9: Domain
> A privately structured string, eg. Domain:IT-Pair:IT-JA, IT-JA:Medical,
> etc."
>
>  To me that is the same as saying: you can use anything. Of course we can
> wrap the "anything" in a field saying "here is MT engine information". Is
> that what you mean?
>
>
>
>> , the Text analysis may need something else
>
>
>  I actually doubt that the text analysis "anything" will be more
> specific. My prediction is that there will be not more interop than saying
> "in this field there is data category specific information: ...".
>
>  So you could achieve that by changing your proposal like this
>
> <its:processInfo>
>  <its:toolInfo xml:id="T1">
>   <its:toolName>Bing Translator</its:toolName>
>   <its:toolVersion>123</its:toolVersion>
>   <its:toolAddInfo datacategory="mtconfidence">ja-t-it</its:toolAddInfo>
>  <its:toolInfo>
>  <its:toolInfo xml:id="T2">
>   <its:toolName>myMT</its:toolName>
>   <its:toolVersion>456</its:toolVersion>
>   <its:toolAddInfo datacategory="mtconfidence">Domain:IT-Pair:IT-JA</its:AddInfo>
>
>  <its:toolInfo>
>
> <its:processInfo>
>
>
>  and allow for several addInfo elements in one "toolInfo". You won't gain
> a lot from these, but not less as with "FR-to-EN-General" inside
> "toolValue" at
>
> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0000.html
>
>  Best,
>
>  Felix
>
>
>
>> , etc. It seems each data category will need one or two entry that mean
>> different things depending on the data category. We can use a common
>> element for this, but then we need to have one tool information per data
>> category.
>>
>> Maybe the examples people are working on (action items 239 to 243 for
>> Arle, Phil, Declan and Tadej) will help in defining this.
>>
>> Cheers
>> -yves
>>
>>
>>
>>
>
>
>  --
> Felix Sasaki
> DFKI / W3C Fellow
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Saturday, 6 October 2012 04:47:48 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:55 UTC