W3C home > Mailing lists > Public > public-multilingualweb-lt@w3.org > July 2012

RE: [All] domain data category section proposal, please review

From: Yves Savourel <ysavourel@enlaso.com>
Date: Mon, 30 Jul 2012 13:14:40 +0200
To: 'Yves Savourel' <ysavourel@enlaso.com>, 'Felix Sasaki' <fsasaki@w3.org>
CC: <public-multilingualweb-lt@w3.org>, "'Lieske, Christian'" <christian.lieske@sap.com>
Message-ID: <assp.05586169c3.assp.05581120a7.009b01cd6e44$84ded540$8e9c7fc0$@com>
Hi Felix, (CCing Christian to keep him in the loop),

Looking further into this:

I can create a custom element that would use the DC subject property to store the ITS information extracted with the Domain rules.

But I don't think I should use a single attribute that would allow a list of domain values, like like myNS:domain="epub, CompLang". because then I couldn't use the Domain rule to map that output to the Domain data category (since domainPointer would point to an item with several values rather than a single one).

And, I think, changing the definition of domainPointer to allow a list rather than a single value in the pointed node, would not work because the local representations may have things incompatible with a list.

So, I suppose using something like the following would be ok:

<trans-unit id="1">
 <source xml:lang="en-us">Example of subjectset</source>
 <okp:itsDomains xmlns:dc="http://purl.org/dc/elements/1.1/">
  <okp:item dc:subject="epub"/>
  <okp:item dc:subject="CompLang"/>
 </okp:itsDomains>
</trans-unit>

- it allows several values
- it can be mapped back to Domain using domainRule
- I think (but you should correct me) that is uses DC-subject is a way that works.

-ys


-----Original Message-----
From: Yves Savourel [mailto:ysavourel@enlaso.com] 
Sent: Monday, July 30, 2012 6:28 AM
To: 'Felix Sasaki'
Cc: 'public-multilingualweb-lt@w3.org'
Subject: RE: [All] domain data category section proposal, please review

Hi Felix,

> would it help to resolve this by saying "if there are several values 
> for 'Domain' available in the source content, applications SHOULD 
> concatenate their string values by comma &#2C;".

Actually the problem I mentioned doesn't exist: I didn't read the specification well enough. We do not have a local representation for Domain, so I cannot use its:domain because it doesn't exist.

This means I'll have to come up with some other notation for XLIFF that can be mapped using its:domainPointer.

-ys



BTW, if we go that way, we would have to change 


From: Felix Sasaki [mailto:fsasaki@w3.org]
Sent: Monday, July 30, 2012 5:42 AM
To: Yves Savourel
Cc: public-multilingualweb-lt@w3.org
Subject: Re: [All] domain data category section proposal, please review

Hi Yves, all,

would it help to resolve this by saying "if there are several values for 'Domain' available in the source content, applications SHOULD concatenate their string values by comma &#2C;".

I know what you said we shouldn't do anything about this - but having authored a few DocBook and other technical documentation formats, I have the impression that the order of the "Domain" values in such formats is not always significant. So if you just take the first one or leave it to the applications which one to take, some useful information might be lost.

Felix
2012/7/29 Yves Savourel <ysavourel@enlaso.com> Hi Felix, Dave, Declan, all,

Just a minor note:

While implementing Domain I ran into the interesting case of an its:domain limitation. I don't think we should do anything about it, but it may be worth noting.

When looking for an XML example I turn to DocBook, and looked for the elements equivalent to the DC:subject property. The first example I found used <subjectterm> for the domain value, and you could have several of them. So we end up with a possible case like this:

<article xmlns='http://docbook.org/ns/docbook'>
 <info>
  <its:rules xmlns:its="http://www.w3.org/2005/11/its" version="2.0"
   xmlns:d="http://docbook.org/ns/docbook">
   <its:translateRule selector="//d:subjectset" translate="no"/>
   <its:domainRule selector="/d:article"
    domainPointer="d:info/d:subjectset/d:subject/d:subjectterm"
    domainMapping="'Electronic Publishing' epub, 'SGML (Computer program language)' CompLang"
   />
  </its:rules>
  <title>Example of subjectset</title>
  <subjectset scheme="libraryofcongress">
   <subject>
    <subjectterm>Electronic Publishing</subjectterm>
   </subject>
   <subject>
    <subjectterm>SGML (Computer program language)</subjectterm>
   </subject>
  </subjectset>
 </info>
 <para>Text of the document</para>
</article>
No problem, we just implement the ITS processor so it handle the result as a list of domain values rather than a single per selected node. So I've done that.

Then I tried to implement Domain in XLIFF 1.2. We can use non-XLIFF attributes in the <trans-unit> element, so it is logical to just use its:domain to store the domain extracted from the original document there.

The problem: its:domain holds a single value and we have several values to pass on. So either we just pass on a single value, or we have to use a non-ITS element or attribute to hold a list. I choose to just pass on the first value.

<xliff version="1.2" xmlns="urn:oasis:names:tc:xliff:document:1.2">
<file original="/EX-domain-docbook.xml" source-language="en-us" target-language="fr-fr" datatype="xml"> <body> <trans-unit id="1" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" its:domain="epub"> <source xml:lang="en-us">Example of subjectset</source> <target xml:lang="fr-fr">Example of subjectset</target> </trans-unit> <trans-unit id="2" xmlns:its="http://www.w3.org/2005/11/its" its:version="2.0" its:domain="epub"> <source xml:lang="en-us">Text of the document</source> <target xml:lang="fr-fr">Text of the document</target> </trans-unit> </body> </file> </xliff>

My guess is that multiple domain values for a single selected node will not be frequent enough to outweigh using directly the ITS attribute. Hopefully that will do.

Cheers,
-yves






--
Felix Sasaki
DFKI / W3C Fellow
Received on Monday, 30 July 2012 11:15:24 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:31:47 UTC