- From: Felix Sasaki <fsasaki@w3.org>
- Date: Sat, 6 Oct 2012 06:47:23 +0200
- To: public-multilingualweb-lt@w3.org
- Message-ID: <CAL58czp1vgr1XqchWcJQrXXna77oBsfagduwDzpac+Ygz3UC9w@mail.gmail.com>
Hi Tadej, all, thanks, Tadej. So I think that tool name and tool version would probably be enough for text analytics annotation. I am now wondering whether that information could be "packed" into a URI - Yves made a proposal at http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0035.html the "text-analytics-agent-ref" attribute at http://www.w3.org/International/multilingualweb/lt/wiki/images/b/b9/Disambiguation-prague-2012.pdf (slide "output example") is just a URI. What do people think for "their" data categories? Declan, Phil? See action-240, action-241 and action-242. It also seems that asking implementations to create a URI is less error prone and invasive than requiring separate fields. Best, Felix 2012/10/5 Tadej Štajner <tadej.stajner@ijs.si> > Hi, Yves, Felix, > from the TA tool standpoint, the information present toolName, toolVersion > and toolAddInfo cover all use cases I've encountered so far with regard to > figuring out what was generated where. > > I have a feeling that toolAddInfo has the risk of becoming a 'kitchen > sink' attribute, and I'd pre-empt that with prescribing what kind of things > 'should' be there (for instance, for MT: language pairs, engine, for TA: > inner engine, model parameters). Structuring the toolInfo data model even > more is likely overkill, and this looks like a good sweet spot. > > -- Tadej > > > > On 02. 10. 2012 23:28, Felix Sasaki wrote: > > Hi Yves, all, > > no opinion on my side on the delimiter topic, sorry for bringing it up. > A comment on the tool specific aspect below. > > 2012/10/2 Yves Savourel <ysavourel@enlaso.com> > >> > <doc its:toolRefs="mtConfidence/file:///tools.xml#T1" >> > xlmns:its="http://www.w3.org/2005/11/its"> >> > >> > Would it make sense to use a different delimiter? "/" may conflict >> with "/" in paths. >> >> Hmm... almost any ASCII delimiter may also be in the path. The first >> occurrence is the delimiter. >> But I suppose '|' could be used instead. It just doesn't look as graceful >> for some reason. >> >> >> > Do you need the "dataCategory" attribute? It seems the >> > data category is made explicit via the reference mechanism in >> "its:toolRefs". >> > Also, dropping the "dataCategory" attribute allows then to refer to >> > the same tools from various data categories - e.g. OKAPI used for >> quality >> > issue versus for creating translation metadata etc. >> >> I'm not sure we can go from many data category instances to one tool >> information. And this is where I'm having trouble with tool information: >> >> The mtConfidence need to have a defined way to specify the engine used > > > Is there really a defined way? The current version of the draft at > > http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#mtconfidence-implementation > says: > > "Some examples of values are: > A BCP 47 language tag with t-extension, e.g. ja-t-it for an Italian to > Japanese MT engine > A Domain as per the Section 6.9: Domain > A privately structured string, eg. Domain:IT-Pair:IT-JA, IT-JA:Medical, > etc." > > To me that is the same as saying: you can use anything. Of course we can > wrap the "anything" in a field saying "here is MT engine information". Is > that what you mean? > > > >> , the Text analysis may need something else > > > I actually doubt that the text analysis "anything" will be more > specific. My prediction is that there will be not more interop than saying > "in this field there is data category specific information: ...". > > So you could achieve that by changing your proposal like this > > <its:processInfo> > <its:toolInfo xml:id="T1"> > <its:toolName>Bing Translator</its:toolName> > <its:toolVersion>123</its:toolVersion> > <its:toolAddInfo datacategory="mtconfidence">ja-t-it</its:toolAddInfo> > <its:toolInfo> > <its:toolInfo xml:id="T2"> > <its:toolName>myMT</its:toolName> > <its:toolVersion>456</its:toolVersion> > <its:toolAddInfo datacategory="mtconfidence">Domain:IT-Pair:IT-JA</its:AddInfo> > > <its:toolInfo> > > <its:processInfo> > > > and allow for several addInfo elements in one "toolInfo". You won't gain > a lot from these, but not less as with "FR-to-EN-General" inside > "toolValue" at > > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Oct/0000.html > > Best, > > Felix > > > >> , etc. It seems each data category will need one or two entry that mean >> different things depending on the data category. We can use a common >> element for this, but then we need to have one tool information per data >> category. >> >> Maybe the examples people are working on (action items 239 to 243 for >> Arle, Phil, Declan and Tadej) will help in defining this. >> >> Cheers >> -yves >> >> >> >> > > > -- > Felix Sasaki > DFKI / W3C Fellow > > > -- Felix Sasaki DFKI / W3C Fellow
Received on Saturday, 6 October 2012 04:47:48 UTC