- From: Yves Savourel via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 10 Oct 2012 18:22:19 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20
In directory hutz:/tmp/cvs-serv11585
Modified Files:
its20.html its20.odd
Log Message:
Added algorithm for Domain value.
Added wording about the dot-all assuption for Allowed Characters.
Index: its20.odd
===================================================================
RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd,v
retrieving revision 1.176
retrieving revision 1.177
diff -u -d -r1.176 -r1.177
--- its20.odd 10 Oct 2012 12:03:10 -0000 1.176
+++ its20.odd 10 Oct 2012 18:22:17 -0000 1.177
@@ -2812,20 +2812,23 @@
<head>Domain</head>
<div xml:id="domain-definition">
<head>Definition</head>
- <p>The <ref target="#domain">Domain</ref> data category is used to identify
- the domain of content.</p>
+ <p>The <ref target="#domain">Domain</ref> data category is used to identify the topic or subject of a given content.
+ Such information allows to make more relevant lingusitic choices during various processes.</p>
+ <p>Examples of usage include:</p>
+ <list type="unordered">
+ <item>Allowing machine translation systems to select the most appropriate engine and rules to translate the content.</item>
+ <item>Providing a general indication of what terminology collection should be used by a translator.</item>
+ </list>
<p>This data category addresses various challenges:</p>
<list type="unordered">
- <item>Often domain related information in content does exist, e.g.
- keywords in the HTML <code>meta</code> element. The <ref
- target="#domain">Domain</ref> data category addresses this by
- providing a mechanism to point to this information.</item>
+ <item>Often domain-related information already exist in the document (e.g.
+ keywords in the HTML <code>meta</code> element). The <ref
+ target="#domain">Domain</ref> data category provides a mechanism to point to this information.</item>
<item>There are many flat or structured lists of domain related values,
- keywords, key phrases, classification codes, ontologies. The <ref
- target="#domain">Domain</ref> data category does not propose a
- given list; rather it provides a mapping mechanism to associate
- values in content with consumer tool specific values needed for
- processing domain information.</item>
+ keywords, key phrases, classification codes, ontologies, etc. The <ref
+ target="#domain">Domain</ref> data category does not propose its own
+ given list. Instead it provides a mapping mechanism to associate
+ the values in the document with the values used by the consumer tool.</item>
</list>
</div>
<div xml:id="domain-implementation">
@@ -2835,6 +2838,60 @@
target="#def-inheritance">inherits</ref> to the textual content of
the element, <emph>including</emph> child elements and attributes. There
is no default.</p>
+
+ <p>The information provided by this data category is a comma-separated list of one or more values which is obtained by applying the following algorithm:</p>
+ <list type="ordered">
+ <item>Set the initial value of the resulting string as a empty string.</item>
+ <item>Get the list of nodes resulting of the evaluation of the <att>domainPointer</att> attribute.</item>
+ <item>For each node:
+ <list type="ordered">
+ <item>If the node value contains a COMMA (U+002C):
+ <list type="ordered">
+ <item>Split the node value into separate strings using the COMMA (U+002C) as separator.</item>
+ <item>For each string:
+ <list type="ordered">
+ <item>Trim the leading and trailing white spaces of the string.</item>
+ <item>Check if there is a mapping for the string:
+ <list type="ordered">
+ <item>If one if found:
+ <list type="ordered" >
+ <item>Add the corresponding value to the result string.</item>
+ </list>
+ </item>
+ <item>Otherwise (if no mapping is found):
+ <list type="ordered" >
+ <item>Add the string to the result string.</item>
+ </list>
+ </item>
+ </list>
+ </item>
+ </list>
+ </item>
+ </list>
+ </item>
+ <item>If the node value does not contain a COMMA (U+002C)):
+ <list type="ordered">
+ <item>Trim the leading and trailing white spaces of the string.</item>
+ <item>Check if there is a mapping for the string:
+ <list type="ordered">
+ <item>If one if found:
+ <list type="ordered" >
+ <item>Add the corresponding value to the result string.</item>
+ </list>
+ </item>
+ <item>Otherwise (if no mapping is found):
+ <list type="ordered" >
+ <item>Add the string to the result string.</item>
+ </list>
+ </item>
+ </list>
+ </item>
+ </list>
+ </item>
+ </list>
+ </item>
+ <item>Return the resulting string.</item>
+ </list>
<p xml:id="domain-global">GLOBAL: The <gi>domainRule</gi> element contains
the following:</p>
@@ -4163,7 +4220,8 @@
<p>The regular expression is a character class construct as defined in the
section <ref target="http://www.w3.org/TR/xmlschema-2/#charcter-classes"
>Character Classes</ref> of XML Schema <ptr target="#xmlschema2"
- type="bibref"/>.</p>
+ type="bibref"/>, with the assumption that the <code>.</code> metacharacter matches also CARRIAGE RETURN (U+000D) and LINE FEED (U+000F).
+ That is with the <emph>dot-all</emph> option set.</p>
<p>Example of expressions (shown as XML source):</p>
<list type="unordered">
<item><code>[abc]</code> : allows the characters 'a', 'b' and
Index: its20.html
===================================================================
RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html,v
retrieving revision 1.179
retrieving revision 1.180
diff -u -d -r1.179 -r1.180
--- its20.html 10 Oct 2012 12:03:10 -0000 1.179
+++ its20.html 10 Oct 2012 18:22:17 -0000 1.180
@@ -125,7 +125,7 @@
</div>
</div>
<div class="toc1">7 <a href="#html5-markup" shape="rect">Using ITS Markup in HTML5</a><div class="toc2">7.1 <a href="#html5-local-attributes" shape="rect">Mapping of Local Data Categories to HTML5</a></div>
-<div class="toc2">7.2 <a href="#d3e7499" shape="rect">Inline Global Rules in HTML5</a></div>
+<div class="toc2">7.2 <a href="#d3e7590" shape="rect">Inline Global Rules in HTML5</a></div>
</div>
<div class="toc1">8 <a href="#xhtml5-markup" shape="rect">Using ITS Markup in XHTML</a></div>
</div>
@@ -1929,18 +1929,26 @@
</body>
</html></pre></div><p>[Source file: <a href="examples/html5/EX-within-text-local-html5-1.html" shape="rect">examples/html5/EX-within-text-local-html5-1.html</a>]</p></div></div></div><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain" id="domain" shape="rect"/>6.9 Domain</h3><div class="div3">
-<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-definition" id="domain-definition" shape="rect"/>6.9.1 Definition</h4><p>The <a href="#domain" shape="rect">Domain</a> data category is used to identify
- the domain of content.</p><p>This data category addresses various challenges:</p><ul><li><p>Often domain related information in content does exist, e.g.
- keywords in the HTML <code>meta</code> element. The <a href="#domain" shape="rect">Domain</a> data category addresses this by
- providing a mechanism to point to this information.</p></li><li><p>There are many flat or structured lists of domain related values,
- keywords, key phrases, classification codes, ontologies. The <a href="#domain" shape="rect">Domain</a> data category does not propose a
- given list; rather it provides a mapping mechanism to associate
- values in content with consumer tool specific values needed for
- processing domain information.</p></li></ul></div><div class="div3">
+<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-definition" id="domain-definition" shape="rect"/>6.9.1 Definition</h4><p>The <a href="#domain" shape="rect">Domain</a> data category is used to identify the topic or subject of a given content.
+ Such information allows to make more relevant lingusitic choices during various processes.</p><p>Examples of usage include:</p><ul><li><p>Allowing machine translation systems to select the most appropriate engine and rules to translate the content.</p></li><li><p>Providing a general indication of what terminology collection should be used by a translator.</p></li></ul><p>This data category addresses various challenges:</p><ul><li><p>Often domain-related information already exist in the document (e.g.
+ keywords in the HTML <code>meta</code> element). The <a href="#domain" shape="rect">Domain</a> data category provides a mechanism to point to this information.</p></li><li><p>There are many flat or structured lists of domain related values,
+ keywords, key phrases, classification codes, ontologies, etc. The <a href="#domain" shape="rect">Domain</a> data category does not propose its own
+ given list. Instead it provides a mapping mechanism to associate
+ the values in the document with the values used by the consumer tool.</p></li></ul></div><div class="div3">
<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-implementation" id="domain-implementation" shape="rect"/>6.9.2 Implementation</h4><p>The <a href="#domain" shape="rect">Domain</a> data category can be expressed
only with global rules. For elements, the data category information <a href="#def-inheritance" shape="rect">inherits</a> to the textual content of
the element, <em>including</em> child elements and attributes. There
- is no default.</p><p id="domain-global">GLOBAL: The <code>domainRule</code> element contains
+ is no default.</p><p>The information provided by this data category is a comma-separated list of one or more values which is obtained by applying the following algorithm:</p><ol class="depth1"><li><p>Set the initial value of the resulting string as a empty string.</p></li><li><p>Get the list of nodes resulting of the evaluation of the <code>domainPointer</code> attribute.</p></li><li><p>For each node:
+ </p><ol class="depth2"><li><p>If the node value contains a COMMA (U+002C):
+ </p><ol class="depth3"><li><p>Split the node value into separate strings using the COMMA (U+002C) as separator.</p></li><li><p>For each string:
+ </p><ol class="depth4"><li><p>Trim the leading and trailing white spaces of the string.</p></li><li><p>Check if there is a mapping for the string:
+ </p><ol class="depth5"><li><p>If one if found:
+ </p><ol class="depth1"><li><p>Add the corresponding value to the result string.</p></li></ol><p/></li><li><p>Otherwise (if no mapping is found):
+ </p><ol class="depth1"><li><p>Add the string to the result string.</p></li></ol><p/></li></ol><p/></li></ol><p/></li></ol><p/></li><li><p>If the node value does not contain a COMMA (U+002C)):
+ </p><ol class="depth3"><li><p>Trim the leading and trailing white spaces of the string.</p></li><li><p>Check if there is a mapping for the string:
+ </p><ol class="depth4"><li><p>If one if found:
+ </p><ol class="depth5"><li><p>Add the corresponding value to the result string.</p></li></ol><p/></li><li><p>Otherwise (if no mapping is found):
+ </p><ol class="depth5"><li><p>Add the string to the result string.</p></li></ol><p/></li></ol><p/></li></ol><p/></li></ol><p/></li><li><p>Return the resulting string.</p></li></ol><p id="domain-global">GLOBAL: The <code>domainRule</code> element contains
the following:</p><ul><li><p>A required <code>selector</code> attribute. It contains an <a href="#selectors" shape="rect">absolute selector</a> which selects the
nodes to which this rule applies.</p></li><li><p>A required <code>domainPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node
that contains the domain information.</p></li><li><p>An optional <code>domainMapping</code> attribute that contains a
@@ -2026,8 +2034,7 @@
levels. For instance, the level of lexical concepts disambiguates
individual word surface forms, the level of ontology concepts
disambiguates into deeper semantics, and the entity disambiguation
- works on the level of concrete instances. For instance, the word
- "<span class="quote">City</span>" in "<span class="quote">I am going to the City</span>" may
+ works on the level of concrete instances. For instance, the word"<span class="quote">City</span>" in "<span class="quote">I am going to the City</span>" may
be disambiguated in one of the WordNet synsets that can be
represented by "<span class="quote">city</span>", an RDF ontology concept of a
City that could represent a subclass of a PopulatedPlace, or the
@@ -3047,7 +3054,8 @@
login name in a content.</p></li></ul><p>The set of characters that are allowed is specified using a regular
expression. That is, each character in the selected content <a href="#rfc-keywords" shape="rect">MUST</a> be included in the set specified
by the regular expression.</p><p>The regular expression is a character class construct as defined in the
- section <a href="http://www.w3.org/TR/xmlschema-2/#charcter-classes" shape="rect">Character Classes</a> of XML Schema <a title="XML
								Schema Part 2: Datatypes Second Edition" href="#xmlschema2" shape="rect">[XML Schema Part 2]</a>.</p><p>Example of expressions (shown as XML source):</p><ul><li><p><code>[abc]</code> : allows the characters 'a', 'b' and
+ section <a href="http://www.w3.org/TR/xmlschema-2/#charcter-classes" shape="rect">Character Classes</a> of XML Schema <a title="XML
								Schema Part 2: Datatypes Second Edition" href="#xmlschema2" shape="rect">[XML Schema Part 2]</a>, with the assumption that the <code>.</code> metacharacter matches also CARRIAGE RETURN (U+000D) and LINE FEED (U+000F).
+ That is with the <em>dot-all</em> option set.</p><p>Example of expressions (shown as XML source):</p><ul><li><p><code>[abc]</code> : allows the characters 'a', 'b' and
'c'.</p></li><li><p><code>[a-c]</code> : allows the characters 'a', 'b' and
'c'.</p></li><li><p><code>[a-zA-Z]</code> : allows the characters from 'a' to 'z' and
from 'A' to 'Z'.</p></li><li><p><code>[^[abc]</code> : allows any characters except 'a', 'b', and
@@ -3235,7 +3243,7 @@
</html></pre></div><p>[Source file: <a href="examples/html5/EX-storageSize-html5-local-1.html" shape="rect">examples/html5/EX-storageSize-html5-local-1.html</a>]</p></div></div></div></div><div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-markup" id="html5-markup" shape="rect"/>7 Using ITS Markup in HTML5</h2><div class="div2">
<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-local-attributes" id="html5-local-attributes" shape="rect"/>7.1 Mapping of Local Data Categories to HTML5</h3><span class="editor-note">[Ed. note: camelCase -> its-*; special mapping of @lang, @translate and @dir]</span><span class="editor-note">[Ed. note: Case sensitivity]</span></div><div class="div2">
-<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d3e7499" id="d3e7499" shape="rect"/>7.2 Inline Global Rules in HTML5</h3><span class="editor-note">[Ed. note: Constraints on using rules inside script]</span></div></div><div class="div1">
+<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d3e7590" id="d3e7590" shape="rect"/>7.2 Inline Global Rules in HTML5</h3><span class="editor-note">[Ed. note: Constraints on using rules inside script]</span></div></div><div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="xhtml5-markup" id="xhtml5-markup" shape="rect"/>8 Using ITS Markup in XHTML</h2><span class="editor-note">[Ed. note: Guidance about using camelCase/its-camel-case w/respect to DOM representation and consistency with HTML parsing]</span><span class="editor-note">[Ed. note: Guidance about inline global rules]</span></div></div><div class="back"><div class="div1">
<h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="normative-references" id="normative-references" shape="rect"/>A References</h2><dl><dt class="label"><a name="bcp47" id="bcp47" shape="rect"/>BCP47</dt><dd>Addison Phillips, Mark Davis. <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt" shape="rect"><cite>Tags for
Identifying Languages</cite></a>, September 2009. Available at <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt" shape="rect">
@@ -3571,7 +3579,7 @@
<em>This section is informative.</em>
</p><p>Several constraints of ITS markup cannot be validated with ITS schemas. The
following <a title="Rule-based validation
							-- Schematron" href="#schematron" shape="rect">[Schematron]</a> document allows for
- validating some of these constraints.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8492" id="d3e8492" shape="rect"/>Example 94: Testing constraints in ITS markup</div><div class="exampleInner"><pre xml:space="preserve">
+ validating some of these constraints.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8583" id="d3e8583" shape="rect"/>Example 94: Testing constraints in ITS markup</div><div class="exampleInner"><pre xml:space="preserve">
<sch:schema
xmlns:sch="http://www.ascc.net/xml/schematron" >
<!-- Schematron document to test constraints for global and local ITS markup.
@@ -3639,7 +3647,7 @@
</p><p>The following <a title="Namespace-based Validation
							Dispatching Language (NVDL)" href="#nvdl" shape="rect">[NVDL]</a> document allows validation of
ITS markup which has been added to a host vocabulary. Only ITS elements and
attributes are checked. Elements and attributes of host language are ignored
- during validation against this NVDL document/schema.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8514" id="d3e8514" shape="rect"/>Example 95: NVDL schema for ITS</div><div class="exampleInner"><pre xml:space="preserve">
+ during validation against this NVDL document/schema.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8605" id="d3e8605" shape="rect"/>Example 95: NVDL schema for ITS</div><div class="exampleInner"><pre xml:space="preserve">
<nvdl:rules
xmlns:nvdl="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" >
<nvdl:namespace ns="http://www.w3.org/2005/11/its">
Received on Wednesday, 10 October 2012 18:22:24 UTC