Re: [All] domain data category section proposal, please review

Hi Yves, all,

thanks for your comments. I tried to implement the comments at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0112.html
via edits at
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#domain
see the CVS diff here:

--- WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2012/07/07
11:52:03       1.25

+++ WWW/International/multilingualweb/lt/drafts/its20/its20.odd
2012/07/10 15:03:51       1.26

@@ -2718,7 +2718,7 @@ spotted coats.</code></p>

             <list type="unordered">

               <item>A required <att>selector</att> attribute. It contains
an XPath expression which selects the nodes to which this rule
applies.</item>

               <item>A required <att>domainPointer</att> attribute that
contains a relative XPath expression pointing to a node that contains the
domain information.</item>

-              <item>An optional <att>domainMapping</att> attribute that
contains a comma separated list of mappings between values in the content
and workflow specific values. The values may contain spaces; in that case
they <ref target="#rfc-keywords">MUST</ref> be delimited by quotation
marks.</item>

+              <item>An optional <att>domainMapping</att> attribute that
contains a comma separated list of mappings between values in the content
and workflow specific values. The left part of the pair is part of the
source content and unique within the mapping. The right part of the mapping
belongs to the workflow. Several left parts can map to a single right part.
The values in the left or the right part of the mapping may contain spaces;
in that case they <ref target="#rfc-keywords">MUST</ref> be delimited by
quotation marks, that is pairs of APOSTROPHE (Unicode code point U+0027) or
QUOTATION MARK (U+0023).</item>

             </list>

             <note>

               <p>Although the <att>domainMapping</att> attribute it is
optional, its usage is recommended. Many commercial machine translation
systems use their own domain definitions; the <att>domainMapping</att>
attribute will foster interoperability between these definitions and
metadata items like <code>DC.subject</code> in Web pages or other types of
content.</p>

@@ -3135,4 +3135,4 @@ documents with ITS markup.</p>

       </div>

     </back>

   </text>

-</TEI><!-- timestamp $Id: its20.odd,v 1.25 2012/07/07 11:52:03 fsasaki Exp
$ -->

+</TEI>

Does that address your comments?

Regarding to your questions at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0115.html
Yes, you are right, and I updated the table at
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc

Regarding your question below:

2012/7/10 Yves Savourel <ysavourel@enlaso.com>

> Hi Felix, Dave, all,
>
> Sorry, one more question related to the implementation of Domain:
>
> I was looking for example and run into this DocBook one:
>
> <article xmlns='http://docbook.org/ns/docbook'>
>  <info>
>   <title>Example of subjectset</title>
>   <subjectset scheme="libraryofcongress">
>    <subject>
>     <subjectterm>Electronic Publishing</subjectterm>
>    </subject>
>    <subject>
>     <subjectterm>SGML (Computer program language)</subjectterm>
>    </subject>
>   </subjectset>
>  </info>
>  <para>Text of the document</para>
> </article>
>
> Where they explain that the //subjectset/subjectterm element indicates the
> DC subject (so it falls into our domain data category). See
> http://www.docbook.org/tdg5/publishers/5.1b3/en/html/ch02.html#ch-gsxml.3.8
>
> As you can see, there are actually two entries in the example, so two
> domains.
> The question is: Can we have more than one domain associated with a
> content?
>


I don't know.



>
> Just wondering what the implications are for the tools downstream like MT.
>

I don't know either - we very likely need feedback from Thomas and Declan
on this. My feeling is that whatever we solution we take, this might lead
to a best practice about how to make domain information in source content
"digestible" for MT or other downstream tools.

Best,

Felix


>
> If the answer is 'no'. Then how do we know which one to use? We just leave
> that decision to the author (i.e. s/he is responsible to provide only a
> mapping to a single entry per document)?
>
> Or do we provide some kind of default behavior, like: the first or last
> one wins?
>
> Thanks,
> -yves
>
>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Tuesday, 10 July 2012 15:16:24 UTC