- From: Yves Savourel via cvs-syncmail <cvsmail@w3.org>
- Date: Wed, 10 Oct 2012 18:22:19 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20 In directory hutz:/tmp/cvs-serv11585 Modified Files: its20.html its20.odd Log Message: Added algorithm for Domain value. Added wording about the dot-all assuption for Allowed Characters. Index: its20.odd =================================================================== RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd,v retrieving revision 1.176 retrieving revision 1.177 diff -u -d -r1.176 -r1.177 --- its20.odd 10 Oct 2012 12:03:10 -0000 1.176 +++ its20.odd 10 Oct 2012 18:22:17 -0000 1.177 @@ -2812,20 +2812,23 @@ <head>Domain</head> <div xml:id="domain-definition"> <head>Definition</head> - <p>The <ref target="#domain">Domain</ref> data category is used to identify - the domain of content.</p> + <p>The <ref target="#domain">Domain</ref> data category is used to identify the topic or subject of a given content. + Such information allows to make more relevant lingusitic choices during various processes.</p> + <p>Examples of usage include:</p> + <list type="unordered"> + <item>Allowing machine translation systems to select the most appropriate engine and rules to translate the content.</item> + <item>Providing a general indication of what terminology collection should be used by a translator.</item> + </list> <p>This data category addresses various challenges:</p> <list type="unordered"> - <item>Often domain related information in content does exist, e.g. - keywords in the HTML <code>meta</code> element. The <ref - target="#domain">Domain</ref> data category addresses this by - providing a mechanism to point to this information.</item> + <item>Often domain-related information already exist in the document (e.g. + keywords in the HTML <code>meta</code> element). The <ref + target="#domain">Domain</ref> data category provides a mechanism to point to this information.</item> <item>There are many flat or structured lists of domain related values, - keywords, key phrases, classification codes, ontologies. The <ref - target="#domain">Domain</ref> data category does not propose a - given list; rather it provides a mapping mechanism to associate - values in content with consumer tool specific values needed for - processing domain information.</item> + keywords, key phrases, classification codes, ontologies, etc. The <ref + target="#domain">Domain</ref> data category does not propose its own + given list. Instead it provides a mapping mechanism to associate + the values in the document with the values used by the consumer tool.</item> </list> </div> <div xml:id="domain-implementation"> @@ -2835,6 +2838,60 @@ target="#def-inheritance">inherits</ref> to the textual content of the element, <emph>including</emph> child elements and attributes. There is no default.</p> + + <p>The information provided by this data category is a comma-separated list of one or more values which is obtained by applying the following algorithm:</p> + <list type="ordered"> + <item>Set the initial value of the resulting string as a empty string.</item> + <item>Get the list of nodes resulting of the evaluation of the <att>domainPointer</att> attribute.</item> + <item>For each node: + <list type="ordered"> + <item>If the node value contains a COMMA (U+002C): + <list type="ordered"> + <item>Split the node value into separate strings using the COMMA (U+002C) as separator.</item> + <item>For each string: + <list type="ordered"> + <item>Trim the leading and trailing white spaces of the string.</item> + <item>Check if there is a mapping for the string: + <list type="ordered"> + <item>If one if found: + <list type="ordered" > + <item>Add the corresponding value to the result string.</item> + </list> + </item> + <item>Otherwise (if no mapping is found): + <list type="ordered" > + <item>Add the string to the result string.</item> + </list> + </item> + </list> + </item> + </list> + </item> + </list> + </item> + <item>If the node value does not contain a COMMA (U+002C)): + <list type="ordered"> + <item>Trim the leading and trailing white spaces of the string.</item> + <item>Check if there is a mapping for the string: + <list type="ordered"> + <item>If one if found: + <list type="ordered" > + <item>Add the corresponding value to the result string.</item> + </list> + </item> + <item>Otherwise (if no mapping is found): + <list type="ordered" > + <item>Add the string to the result string.</item> + </list> + </item> + </list> + </item> + </list> + </item> + </list> + </item> + <item>Return the resulting string.</item> + </list> <p xml:id="domain-global">GLOBAL: The <gi>domainRule</gi> element contains the following:</p> @@ -4163,7 +4220,8 @@ <p>The regular expression is a character class construct as defined in the section <ref target="http://www.w3.org/TR/xmlschema-2/#charcter-classes" >Character Classes</ref> of XML Schema <ptr target="#xmlschema2" - type="bibref"/>.</p> + type="bibref"/>, with the assumption that the <code>.</code> metacharacter matches also CARRIAGE RETURN (U+000D) and LINE FEED (U+000F). + That is with the <emph>dot-all</emph> option set.</p> <p>Example of expressions (shown as XML source):</p> <list type="unordered"> <item><code>[abc]</code> : allows the characters 'a', 'b' and Index: its20.html =================================================================== RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html,v retrieving revision 1.179 retrieving revision 1.180 diff -u -d -r1.179 -r1.180 --- its20.html 10 Oct 2012 12:03:10 -0000 1.179 +++ its20.html 10 Oct 2012 18:22:17 -0000 1.180 @@ -125,7 +125,7 @@ </div> </div> <div class="toc1">7 <a href="#html5-markup" shape="rect">Using ITS Markup in HTML5</a><div class="toc2">7.1 <a href="#html5-local-attributes" shape="rect">Mapping of Local Data Categories to HTML5</a></div> -<div class="toc2">7.2 <a href="#d3e7499" shape="rect">Inline Global Rules in HTML5</a></div> +<div class="toc2">7.2 <a href="#d3e7590" shape="rect">Inline Global Rules in HTML5</a></div> </div> <div class="toc1">8 <a href="#xhtml5-markup" shape="rect">Using ITS Markup in XHTML</a></div> </div> @@ -1929,18 +1929,26 @@ </body> </html></pre></div><p>[Source file: <a href="examples/html5/EX-within-text-local-html5-1.html" shape="rect">examples/html5/EX-within-text-local-html5-1.html</a>]</p></div></div></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain" id="domain" shape="rect"/>6.9 Domain</h3><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-definition" id="domain-definition" shape="rect"/>6.9.1 Definition</h4><p>The <a href="#domain" shape="rect">Domain</a> data category is used to identify - the domain of content.</p><p>This data category addresses various challenges:</p><ul><li><p>Often domain related information in content does exist, e.g. - keywords in the HTML <code>meta</code> element. The <a href="#domain" shape="rect">Domain</a> data category addresses this by - providing a mechanism to point to this information.</p></li><li><p>There are many flat or structured lists of domain related values, - keywords, key phrases, classification codes, ontologies. The <a href="#domain" shape="rect">Domain</a> data category does not propose a - given list; rather it provides a mapping mechanism to associate - values in content with consumer tool specific values needed for - processing domain information.</p></li></ul></div><div class="div3"> +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-definition" id="domain-definition" shape="rect"/>6.9.1 Definition</h4><p>The <a href="#domain" shape="rect">Domain</a> data category is used to identify the topic or subject of a given content. + Such information allows to make more relevant lingusitic choices during various processes.</p><p>Examples of usage include:</p><ul><li><p>Allowing machine translation systems to select the most appropriate engine and rules to translate the content.</p></li><li><p>Providing a general indication of what terminology collection should be used by a translator.</p></li></ul><p>This data category addresses various challenges:</p><ul><li><p>Often domain-related information already exist in the document (e.g. + keywords in the HTML <code>meta</code> element). The <a href="#domain" shape="rect">Domain</a> data category provides a mechanism to point to this information.</p></li><li><p>There are many flat or structured lists of domain related values, + keywords, key phrases, classification codes, ontologies, etc. The <a href="#domain" shape="rect">Domain</a> data category does not propose its own + given list. Instead it provides a mapping mechanism to associate + the values in the document with the values used by the consumer tool.</p></li></ul></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="domain-implementation" id="domain-implementation" shape="rect"/>6.9.2 Implementation</h4><p>The <a href="#domain" shape="rect">Domain</a> data category can be expressed only with global rules. For elements, the data category information <a href="#def-inheritance" shape="rect">inherits</a> to the textual content of the element, <em>including</em> child elements and attributes. There - is no default.</p><p id="domain-global">GLOBAL: The <code>domainRule</code> element contains + is no default.</p><p>The information provided by this data category is a comma-separated list of one or more values which is obtained by applying the following algorithm:</p><ol class="depth1"><li><p>Set the initial value of the resulting string as a empty string.</p></li><li><p>Get the list of nodes resulting of the evaluation of the <code>domainPointer</code> attribute.</p></li><li><p>For each node: + </p><ol class="depth2"><li><p>If the node value contains a COMMA (U+002C): + </p><ol class="depth3"><li><p>Split the node value into separate strings using the COMMA (U+002C) as separator.</p></li><li><p>For each string: + </p><ol class="depth4"><li><p>Trim the leading and trailing white spaces of the string.</p></li><li><p>Check if there is a mapping for the string: + </p><ol class="depth5"><li><p>If one if found: + </p><ol class="depth1"><li><p>Add the corresponding value to the result string.</p></li></ol><p/></li><li><p>Otherwise (if no mapping is found): + </p><ol class="depth1"><li><p>Add the string to the result string.</p></li></ol><p/></li></ol><p/></li></ol><p/></li></ol><p/></li><li><p>If the node value does not contain a COMMA (U+002C)): + </p><ol class="depth3"><li><p>Trim the leading and trailing white spaces of the string.</p></li><li><p>Check if there is a mapping for the string: + </p><ol class="depth4"><li><p>If one if found: + </p><ol class="depth5"><li><p>Add the corresponding value to the result string.</p></li></ol><p/></li><li><p>Otherwise (if no mapping is found): + </p><ol class="depth5"><li><p>Add the string to the result string.</p></li></ol><p/></li></ol><p/></li></ol><p/></li></ol><p/></li><li><p>Return the resulting string.</p></li></ol><p id="domain-global">GLOBAL: The <code>domainRule</code> element contains the following:</p><ul><li><p>A required <code>selector</code> attribute. It contains an <a href="#selectors" shape="rect">absolute selector</a> which selects the nodes to which this rule applies.</p></li><li><p>A required <code>domainPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node that contains the domain information.</p></li><li><p>An optional <code>domainMapping</code> attribute that contains a @@ -2026,8 +2034,7 @@ levels. For instance, the level of lexical concepts disambiguates individual word surface forms, the level of ontology concepts disambiguates into deeper semantics, and the entity disambiguation - works on the level of concrete instances. For instance, the word - "<span class="quote">City</span>" in "<span class="quote">I am going to the City</span>" may + works on the level of concrete instances. For instance, the word"<span class="quote">City</span>" in "<span class="quote">I am going to the City</span>" may be disambiguated in one of the WordNet synsets that can be represented by "<span class="quote">city</span>", an RDF ontology concept of a City that could represent a subclass of a PopulatedPlace, or the @@ -3047,7 +3054,8 @@ login name in a content.</p></li></ul><p>The set of characters that are allowed is specified using a regular expression. That is, each character in the selected content <a href="#rfc-keywords" shape="rect">MUST</a> be included in the set specified by the regular expression.</p><p>The regular expression is a character class construct as defined in the - section <a href="http://www.w3.org/TR/xmlschema-2/#charcter-classes" shape="rect">Character Classes</a> of XML Schema <a title="XML
								Schema Part 2: Datatypes Second Edition" href="#xmlschema2" shape="rect">[XML Schema Part 2]</a>.</p><p>Example of expressions (shown as XML source):</p><ul><li><p><code>[abc]</code> : allows the characters 'a', 'b' and + section <a href="http://www.w3.org/TR/xmlschema-2/#charcter-classes" shape="rect">Character Classes</a> of XML Schema <a title="XML
								Schema Part 2: Datatypes Second Edition" href="#xmlschema2" shape="rect">[XML Schema Part 2]</a>, with the assumption that the <code>.</code> metacharacter matches also CARRIAGE RETURN (U+000D) and LINE FEED (U+000F). + That is with the <em>dot-all</em> option set.</p><p>Example of expressions (shown as XML source):</p><ul><li><p><code>[abc]</code> : allows the characters 'a', 'b' and 'c'.</p></li><li><p><code>[a-c]</code> : allows the characters 'a', 'b' and 'c'.</p></li><li><p><code>[a-zA-Z]</code> : allows the characters from 'a' to 'z' and from 'A' to 'Z'.</p></li><li><p><code>[^[abc]</code> : allows any characters except 'a', 'b', and @@ -3235,7 +3243,7 @@ </html></pre></div><p>[Source file: <a href="examples/html5/EX-storageSize-html5-local-1.html" shape="rect">examples/html5/EX-storageSize-html5-local-1.html</a>]</p></div></div></div></div><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-markup" id="html5-markup" shape="rect"/>7 Using ITS Markup in HTML5</h2><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-local-attributes" id="html5-local-attributes" shape="rect"/>7.1 Mapping of Local Data Categories to HTML5</h3><span class="editor-note">[Ed. note: camelCase -> its-*; special mapping of @lang, @translate and @dir]</span><span class="editor-note">[Ed. note: Case sensitivity]</span></div><div class="div2"> -<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d3e7499" id="d3e7499" shape="rect"/>7.2 Inline Global Rules in HTML5</h3><span class="editor-note">[Ed. note: Constraints on using rules inside script]</span></div></div><div class="div1"> +<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d3e7590" id="d3e7590" shape="rect"/>7.2 Inline Global Rules in HTML5</h3><span class="editor-note">[Ed. note: Constraints on using rules inside script]</span></div></div><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="xhtml5-markup" id="xhtml5-markup" shape="rect"/>8 Using ITS Markup in XHTML</h2><span class="editor-note">[Ed. note: Guidance about using camelCase/its-camel-case w/respect to DOM representation and consistency with HTML parsing]</span><span class="editor-note">[Ed. note: Guidance about inline global rules]</span></div></div><div class="back"><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="normative-references" id="normative-references" shape="rect"/>A References</h2><dl><dt class="label"><a name="bcp47" id="bcp47" shape="rect"/>BCP47</dt><dd>Addison Phillips, Mark Davis. <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt" shape="rect"><cite>Tags for Identifying Languages</cite></a>, September 2009. Available at <a href="http://www.rfc-editor.org/rfc/bcp/bcp47.txt" shape="rect"> @@ -3571,7 +3579,7 @@ <em>This section is informative.</em> </p><p>Several constraints of ITS markup cannot be validated with ITS schemas. The following <a title="Rule-based validation
							-- Schematron" href="#schematron" shape="rect">[Schematron]</a> document allows for - validating some of these constraints.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8492" id="d3e8492" shape="rect"/>Example 94: Testing constraints in ITS markup</div><div class="exampleInner"><pre xml:space="preserve"> + validating some of these constraints.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8583" id="d3e8583" shape="rect"/>Example 94: Testing constraints in ITS markup</div><div class="exampleInner"><pre xml:space="preserve"> <sch:schema xmlns:sch="http://www.ascc.net/xml/schematron" > <!-- Schematron document to test constraints for global and local ITS markup. @@ -3639,7 +3647,7 @@ </p><p>The following <a title="Namespace-based Validation
							Dispatching Language (NVDL)" href="#nvdl" shape="rect">[NVDL]</a> document allows validation of ITS markup which has been added to a host vocabulary. Only ITS elements and attributes are checked. Elements and attributes of host language are ignored - during validation against this NVDL document/schema.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8514" id="d3e8514" shape="rect"/>Example 95: NVDL schema for ITS</div><div class="exampleInner"><pre xml:space="preserve"> + during validation against this NVDL document/schema.</p><div class="exampleOuter"><div class="exampleHeader"><a name="d3e8605" id="d3e8605" shape="rect"/>Example 95: NVDL schema for ITS</div><div class="exampleInner"><pre xml:space="preserve"> <nvdl:rules xmlns:nvdl="http://purl.oclc.org/dsdl/nvdl/ns/structure/1.0" > <nvdl:namespace ns="http://www.w3.org/2005/11/its">
Received on Wednesday, 10 October 2012 18:22:24 UTC