Re: [All] domain data category section proposal, please review

Hi Dave,

Thanks for the clarification re. precedence!

> However, my understanding from Declan is that he's referring to how domain
> annotation usually have some precedence in importance, i.e. the text is
> usually assumed to be largely in one domain (hence parallel text and
> language model training data is drawn mostly from that domain), and other
> domains are added to indicate further sources of training data might be
> drawn to get better covering of this particular input. Have i got that
> right Declan?

Yes - that's what I'm referring to.

> So i don't thing therefore the ITS rule precedence helps with Declan's
> domain precedence issue, and the data category as currently defined doesn't
> support communication of domain precedence to the consumer tool.
> Now, we already indicate in the note for the domain data category that the
> consumer tool may chose of ignore or make its own decision about the
> importance of the domains specified, for instance based on the relative
> volume of content annotated with a domain tag. So we could just rely on
> that to handle relative domain significance, though that doesn't help in
> cases where the meta-data tags pointers to are all in the header - but I'm
> not sure how common a use case that is?
I would have assumed (although it must be noted my relative lack of
experience with processing this type of metadata!) that a particular
document to be translated would more than likely be of the same domain -
hence why I had assumed that the meta-data tags would be primarily in the
header. If smaller pieces of content were annotated with domain
information, then yes, allowing the tool to make the decision using, as you
suggested, frequency/volume statistics, would be the best way to go. Domain
precedence could also be based upon ordering within an existing ontology.

> However, if we _do_ want to have a mean of communicating the relative
> significance of domains couple of options to do this might be:
> A)
> add a new optional attribute 'domainPrecedence' which containsone or more
> of local tags that match the selector that are considered the 'primary'
> domain of the document, but without the order of those provided being
> significant (essentially providing a two tier domain precendence annotation)
> <its:domainRule selector="/html/body" domainPointer="/html/head/meta[@name='DC.subject']/@content" </html/head/meta%5B@name=%27DC.subject%27%5D/@content>
>    domainPrecendence="criminal law, medical"
>    domainMapping="automotive auto, medical medicine, 'criminal law' law, 'property law' law"/>
> </its:rules>
> B)
> Overload the domainMapping so that the order also represents the
> significance of the domain. This is a bit messy however, since we would
> need to accommodate local annotation that may be more significant but don't
> require a mapping (e.g. by omitting the RHS of the pair or repeating the
> LHS).
> I'm just laying out some options. I think we need a steer from the MT guys
> on whether the need to convey (rather than calculate based on volume) the
> relative significance of domains is a common enough a use case to be
> accommodated by such a change and, of course, implemented?
Options A) would be more preferable I think, and is a great suggestion, as
it would allow local tools, where necessary, to augment the
domainPrecedence fairly easily.

I'd be interested to hear if any of the other MT users/providers have a
view on this.


Dr. Declan Groves
Research Integration Officer
Centre for Next Generation Localisation (CNGL)
Dublin City University

email: <>
 phone: +353 (0)1 700 6906

Received on Thursday, 26 July 2012 11:38:27 UTC