Re: [All] domain data category section proposal, please review

Hi Declan, Felix,
I'm not sure we are talking about 'precedence' in the same way here.

What Felix is referring is the precedence rule for executing elements in 
an itsRule element, namely that rules defined in later sub elements 
completely override prior ones.

so if we modify the example in the draft to to include multipel domainRules:

<its:rules
   xmlns:its="http://www.w3.org/2005/11/its"  version="2.0">
  <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content"
    domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>
  <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content"
    domainMapping="automotive auto, medical medicine/>
  </its:rules>

This mean that the semantics of the first rule mapping the local 
document meta tags to the 'law' domain tag used by the consumer tool 
would not apply at all, i.e. the second rule complete replaces the 
first, even if it is less specific as in this case.

However, my understanding from Declan is that he's referring to how 
domain annotation usually have some precedence in importance, i.e. the 
text is usually assumed to be largely in one domain (hence parallel text 
and language model training data is drawn mostly from that domain), and 
other domains are added to indicate further sources of training data 
might be drawn to get better covering of this particular input. Have i 
got that right Declan?

So i don't thing therefore the ITS rule precedence helps with Declan's 
domain precedence issue, and the data category as currently defined 
doesn't support communication of domain precedence to the consumer tool.

Now, we already indicate in the note for the domain data category that 
the consumer tool may chose of ignore or make its own decision about the 
importance of the domains specified, for instance based on the relative 
volume of content annotated with a domain tag. So we could just rely on 
that to handle relative domain significance, though that doesn't help in 
cases where the meta-data tags pointers to are all in the header - but 
I'm not sure how common a use case that is?

However, if we _do_ want to have a mean of communicating the relative 
significance of domains couple of options to do this might be:
A)
add a new optional attribute 'domainPrecedence' which containsone or 
more of local tags that match the selector that are considered the 
'primary' domain of the document, but without the order of those 
provided being significant (essentially providing a two tier domain 
precendence annotation)

<its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content"
    domainPrecendence="criminal law, medical"
    domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>
</its:rules>

B)
Overload the domainMapping so that the order also represents the 
significance of the domain. This is a bit messy however, since we would 
need to accommodate local annotation that may be more significant but 
don't require a mapping (e.g. by omitting the RHS of the pair or 
repeating the LHS).

I'm just laying out some options. I think we need a steer from the MT 
guys on whether the need to convey (rather than calculate based on 
volume) the relative significance of domains is a common enough a use 
case to be accommodated by such a change and, of course, implemented?

cheers,
Dave


On 23/07/2012 15:24, Declan Groves wrote:
> Hi Dave, Felix,
>
> Good solution to the issue of precedence, which is extremely important 
> to specify.
>
> For MT this is an important issue and it is worth noting that for most 
> MT providers, the higher-level domain category would generally be the 
> most preferable (i.e. it is far more likely for an MT provider to have 
> a system available for the "medical" domain, than for the sub-domain 
> "cardiology" for example), but it is important to provide the flexibility.
>
> Declan
>
>
> On 23 July 2012 12:08, Felix Sasaki <fsasaki@w3.org 
> <mailto:fsasaki@w3.org>> wrote:
>
>     Hi Dave,
>
>     2012/7/19 Dave Lewis <dave.lewis@cs.tcd.ie
>     <mailto:dave.lewis@cs.tcd.ie>>
>
>         Felix,
>         thanks for the explanation, that's clear now. But yes, perhaps
>         we could make the override semantics a bit clearer using your
>         wording, as the question of partial application of
>         datacategory attribute to element may be raised with any such
>         data categoy with this sort of 'set valued' attribute.
>
>         So after: "In case of conflicts between global selections via
>         multiple rule
>         <http://www.w3.org/TR/2012/WD-its20-20120626/#selection-global> elements,
>         the last selector has higher precedence."
>         include:
>         "Override semantics are always complete, that is all
>         information that is specified in one rule element is
>         overridden by the next one."
>
>
>
>     Very good idea. I just would add above sentence to the note
>
>     "The precedence order fulfills the same purpose as the built-in
>     template rules of [XSLT 1.0]."
>
>     So that we avoid changing normative text. I just talked to Arle,
>     he is just doing some editing and will have a look.
>
>     Best,
>
>     Felix
>
>
>         cheers,
>         Dave
>
>
>         On 11/07/2012 08:06, Felix Sasaki wrote:
>>         Hi Dave,
>>
>>         the override semantic are always complete, that is: all
>>         information that is specified in one "rule" element is
>>         overridden by the next one. See
>>         http://www.w3.org/TR/2012/WD-its20-20120626/#selection-precedence
>>         "In case of conflicts between global selections via multiple
>>         rule
>>         <http://www.w3.org/TR/2012/WD-its20-20120626/#selection-global> elements,
>>         the last selector has higher precedence."
>>
>>         So there are no "rule type" specific semantics of overriding:
>>         the metadata of the previous rule is just not taken into
>>         account.
>>
>>         Do we think we should make this clearer at
>>         http://www.w3.org/TR/2012/WD-its20-20120626/#selection-precedence
>>         ?
>>
>>         2012/7/11 Dave Lewis <dave.lewis@cs.tcd.ie
>>         <mailto:dave.lewis@cs.tcd.ie>>
>>
>>             Yves,
>>             This sound sensible, but you get me thinking, what are
>>             the override semantics between a domain rule that just
>>             specifies a source meta-data selector and one that
>>             subsequently specifies a mapping, should we specify that
>>             the consumer tool takes the RHS value of the mapping
>>             rather than the LHS?
>>
>>             Also, this got me thinking. The selector in the example
>>             would select all meta data, regardless of whether its
>>             useful for translation domains or not. Can we specify
>>             that more specific meta-data selectors should NOT be used
>>             as a domain indicator?
>>
>>
>>         We can say "select only the first 'meta' element", or select
>>         only the ones which have not  a specific "scheme" attribute, e.g.
>>
>>         <its:domainRule selector="/html/body"
>>         domainPointer="/html/head/meta[@name='DC.subject' and
>>         not(starts-with(@scheme,'DC'))]/@content"
>>         <mailto:/html/head/meta%5B@name=%27DC.subject%27andnot%28starts-with%28@scheme,%27DC%27%29%29%5D/@content>/>
>>         Best,
>>
>>         Felix
>>
>>
>>             Regards,
>>             Dave
>>
>>
>>             On 10/07/2012 13:44, Yves Savourel wrote:
>>
>>                 Hi Felix, Dave, all,
>>
>>                 One more question on Domain:
>>
>>                 There is no "Default, inheritance, overriding of data
>>                 category" for Domain:
>>                 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc
>>
>>                 I assume:
>>
>>                 - Default is none
>>                 - Inheritance is " Textual content of element,
>>                 including attributes and child elements"
>>                 - Overriding is Yes
>>
>>                 Just like locNote. Is that correct?
>>
>>                 Thanks,
>>                 -yves
>>
>>
>>
>>
>>                 -----Original Message-----
>>                 From: Yves Savourel [mailto:ysavourel@enlaso.com
>>                 <mailto:ysavourel@enlaso.com>]
>>                 Sent: Tuesday, July 10, 2012 10:48 AM
>>                 To: public-multilingualweb-lt@w3.org
>>                 <mailto:public-multilingualweb-lt@w3.org>
>>                 Subject: Re: [All] domain data category section
>>                 proposal, please review
>>
>>                 Hi Felix, Dave, all,
>>
>>                 I'm working on implementing the Domain data category.
>>                 And I have a clarification question:
>>
>>                 My understanding is that for the domainMapping
>>                 attribute, the left part of the pair is unique within
>>                 the mapping. And several left parts can map to a
>>                 single right part. Is that correct?
>>
>>                 That is, we could have:
>>
>>                 domainMapping="automotive auto, medical medicine,
>>                 'criminal law' law, 'property law' law"
>>
>>
>>                 Note for the editors:
>>
>>                 By the way, the current definition in the draft is
>>                 not very specific on which part is in the document
>>                 and which part is not. I know it's rather logical,
>>                 but it may be more clear to say so explicitly, rather
>>                 than just in the example?
>>
>>                 Also in "The values may contain spaces; in that case
>>                 they MUST be delimited by quotation marks." Maybe
>>                 stating explicitely which characters can be used as
>>                 quotation marks would be more clear? The example uses
>>                 single quotes, but I assume double ones are also OK
>>                 (any other?).
>>
>>                 thanks,
>>                 -yves
>>
>>
>>
>>
>>
>>
>>
>>
>>         -- 
>>         Felix Sasaki
>>         DFKI / W3C Fellow
>>
>
>
>
>
>     -- 
>     Felix Sasaki
>     DFKI / W3C Fellow
>
>
>
>
> -- 
> Dr. Declan Groves
> Research Integration Officer
> Centre for Next Generation Localisation (CNGL)
> Dublin City University
>
> email: dgroves@computing.dcu.ie 
> <mailto:dgroves@computing.dcu.ie><mailto:dgroves@computing.dcu.ie>
> phone: +353 (0)1 700 6906

Received on Tuesday, 24 July 2012 11:12:51 UTC