- From: CVS User ysavoure <cvsmail@w3.org>
- Date: Wed, 28 Nov 2012 14:48:43 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20 In directory gil:/tmp/cvs-serv24583 Modified Files: its20.html its20.odd Log Message: First set of changes from editing call Nov-28 --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2012/11/27 18:26:03 1.269 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2012/11/28 14:48:43 1.270 @@ -1202,7 +1202,7 @@ applications, creating for example named entity annotations. A non-normative algorithm to integrate these annotations into the original input document is given in <a class="section-ref" href="#nif-backconversion" shape="rect">Appendix H: Conversion NIF2ITS</a>. The algorithm in that appendix is non-normative since many choices depend on the actual NLP application.</p></div></div><div class="div2"> -<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="its-tool-annotation" id="its-tool-annotation" shape="rect"/>5.8 ITS Tools Annotation</h3><p>In some cases, it may be important for instances of data categories to be associated with information about the processor that generated them. For example, the score of the <a href="#mtconfidence" shape="rect">MT Confidence</a> data category (provided via the <code class="its-attr-markup">mtConfidence</code> attribute) is meaningful only when the consumer of the information also knows what MT engine produced it, because the score provides the relative confidence of translations from the same MT engine but does not provide a score that can be reliably compared between MT engines. The same is true for confidence provided for the <a href="#Disambiguation" shape="rect">Disambiguation</a> data category, providing confidence informaton via the <code class="its-attr-markup">disambigConfidence</code> attribute, or the <a href="#terminology" shape="rect">Terminology</a> data category, providing confidence information via the <code class="its-attr-markup">termConfidence</code> attribute.</p><p>ITS 2.0 provides a mechanism to associate such processor information with the use of individual data categories in a document, independently from data category annotations themselves.</p><p>The attribute <code class="its-attr-markup">toolsRef</code> provides a way to associate all the annotations of a given data category within the element with information about the processor that generated those data category annotations.</p><p>The value of <code class="its-attr-markup">toolsRef</code> is a space-separated list of references where each reference is composed of two parts: a data category identifier and an IRI. These two parts are separated by a character <code>|</code> VERTICAL LINE (U+007C).</p><ul><li><p>The data category identifier <a href="#rfc219" shape="rect">MUST</a> be one of the following identifiers: <code>allowed-characters</code>, <code>directionality</code>, <code>disambiguation</code>, <code>domain</code>, <code>elements-within-text</code>, <code>external-resource</code>, <code>id-value</code>, <code>language-information</code>, <code>locale-filter</code>, <code>localization-note</code>, <code>lq-issue</code>, <code>lq-precis</code>, <code>mt-confidence</code>, <code>provenance</code>, <code>ruby</code>, <code>storage-size</code>, <code>target-pointer</code>, <code>terminology</code>, <code>translate</code>.</p></li><li><p>The IRI indicates information about the processor used to generate the data category annotation. No single means is specified for how this IRI should be used to indicate processor information. Possible mechanisms are: to encode information directly in the IRI, e.g. as parameters; to reference an external resource that provides such information, e.g. an XML file or an RDF declaration; or to reference another part of the ocument that provides such information.</p></li></ul><p>In HTML5 documents, the mechanism is implemented with the <code class="its-attr-markup">its-tools-ref</code> attribute.</p><p>The attribute applies to the content of the element where it is declared (including its children elements) and to the attributes of that element.</p><p>On any given node, the information provided by this mechanism is a space-separated list of the accumulated references found it the <code class="its-attr-markup">toolsRef</code> attributes +<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="its-tool-annotation" id="its-tool-annotation" shape="rect"/>5.8 ITS Tools Annotation</h3><p>In some cases, it may be important for instances of data categories to be associated with information about the processor that generated them. For example, the score of the <a href="#mtconfidence" shape="rect">MT Confidence</a> data category (provided via the <code class="its-attr-markup">mtConfidence</code> attribute) is meaningful only when the consumer of the information also knows what MT engine produced it, because the score provides the relative confidence of translations from the same MT engine but does not provide a score that can be reliably compared between MT engines. The same is true for confidence provided for the <a href="#Disambiguation" shape="rect">Disambiguation</a> data category, providing confidence informaton via the <code class="its-attr-markup">disambigConfidence</code> attribute, or the <a href="#terminology" shape="rect">Terminology</a> data category, providing confidence information via the <code class="its-attr-markup">termConfidence</code> attribute.</p><p>ITS 2.0 provides a mechanism to associate such processor information with the use of individual data categories in a document, independently from data category annotations themselves.</p><p>The attribute <code class="its-attr-markup">toolsRef</code> provides a way to associate all the annotations of a given data category within the element with information about the processor that generated those data category annotations.</p><p>The value of <code class="its-attr-markup">toolsRef</code> is a space-separated list of references where each reference is composed of two parts: a data category identifier and an IRI. These two parts are separated by a character <code>|</code> VERTICAL LINE (U+007C).</p><ul><li><p>The data category identifier <a href="#rfc219" shape="rect">MUST</a> be one of the following identifiers: <code>allowed-characters</code>, <code>directionality</code>, <code>disambiguation</code>, <code>domain</code>, <code>elements-within-text</code>, <code>external-resource</code>, <code>id-value</code>, <code>language-information</code>, <code>locale-filter</code>, <code>localization-note</code>, <code>lq-issue</code>, <code>lq-rating</code>, <code>mt-confidence</code>, <code>provenance</code>, <code>ruby</code>, <code>storage-size</code>, <code>target-pointer</code>, <code>terminology</code>, <code>translate</code>.</p></li><li><p>The IRI indicates information about the processor used to generate the data category annotation. No single means is specified for how this IRI should be used to indicate processor information. Possible mechanisms are: to encode information directly in the IRI, e.g. as parameters; to reference an external resource that provides such information, e.g. an XML file or an RDF declaration; or to reference another part of the ocument that provides such information.</p></li></ul><p>In HTML5 documents, the mechanism is implemented with the <code class="its-attr-markup">its-tools-ref</code> attribute.</p><p>The attribute applies to the content of the element where it is declared (including its children elements) and to the attributes of that element.</p><p>On any given node, the information provided by this mechanism is a space-separated list of the accumulated references found it the <code class="its-attr-markup">toolsRef</code> attributes declared in the enclosing elements and sorted by data category identifiers. For each data category, the IRI part is the one of the inner-most declarartion.</p><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-1" id="EX-its-tool-annotation-1" shape="rect"/>Example 25: Accumulation and Overriding of the <code class="its-attr-markup">toolsRef</code> Values</div><p>In this example, the text shows the computed tools reference information for the given node. Note that the references are ordered alphabetically and that the IRI values are always the ones of the inner-most declaration.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><doc</strong> <span class="hl-attribute" style="color: #F5844C">its:version</span>=<span class="hl-value" style="color: #993300">"2.0"</span> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span>
 <span class="hl-attribute" style="color: #F5844C">its:toolsRef</span>=<span class="hl-value" style="color: #993300">"mt-confidence|MT1"</span><strong class="hl-tag" style="color: #000096">
 @@ -1433,7 +1433,7 @@ <strong class="hl-tag" style="color: #000096"><prolog</strong> <span class="hl-attribute" style="color: #F5844C">its:translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">></strong>
 <strong class="hl-tag" style="color: #000096"><revision></strong>Sep-07-2006<strong class="hl-tag" style="color: #000096"></revision></strong>
 <strong class="hl-tag" style="color: #000096"><its:rules</strong> <span class="hl-attribute" style="color: #F5844C">version</span>=<span class="hl-value" style="color: #993300">"2.0"</span><strong class="hl-tag" style="color: #000096">></strong>
 - <strong class="hl-tag" style="color: #000096"><its:translateRule</strong> <span class="hl-attribute" style="color: #F5844C">selector</span>=<span class="hl-value" style="color: #993300">"//msg/notes"</span> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">/></strong>
 + <strong class="hl-tag" style="color: #000096"><its:translateRule</strong> <span class="hl-attribute" style="color: #F5844C">selector</span>=<span class="hl-value" style="color: #993300">"//msg/type"</span> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">/></strong>
 <strong class="hl-tag" style="color: #000096"><its:locNoteRule</strong> <span class="hl-attribute" style="color: #F5844C">locNoteType</span>=<span class="hl-value" style="color: #993300">"description"</span> <span class="hl-attribute" style="color: #F5844C">selector</span>=<span class="hl-value" style="color: #993300">"//msg/data"</span><strong class="hl-tag" style="color: #000096">></strong>
 <strong class="hl-tag" style="color: #000096"><its:locNote></strong>The variable {0} is the name of the host.<strong class="hl-tag" style="color: #000096"></its:locNote></strong>
 <strong class="hl-tag" style="color: #000096"></its:locNoteRule></strong>
 @@ -1441,12 +1441,15 @@ <strong class="hl-tag" style="color: #000096"></prolog></strong>
 <strong class="hl-tag" style="color: #000096"><body></strong>
 <strong class="hl-tag" style="color: #000096"><msg</strong> <span class="hl-attribute" style="color: #F5844C">id</span>=<span class="hl-value" style="color: #993300">"HostNotFound"</span><strong class="hl-tag" style="color: #000096">></strong>
 + <strong class="hl-tag" style="color: #000096"><type></strong>Error<strong class="hl-tag" style="color: #000096"></type></strong>
 <strong class="hl-tag" style="color: #000096"><data></strong>Host {0} cannot be found.<strong class="hl-tag" style="color: #000096"></data></strong>
 <strong class="hl-tag" style="color: #000096"></msg></strong>
 <strong class="hl-tag" style="color: #000096"><msg</strong> <span class="hl-attribute" style="color: #F5844C">id</span>=<span class="hl-value" style="color: #993300">"HostDisconnected"</span><strong class="hl-tag" style="color: #000096">></strong>
 + <strong class="hl-tag" style="color: #000096"><type></strong>Error<strong class="hl-tag" style="color: #000096"></type></strong>
 <strong class="hl-tag" style="color: #000096"><data></strong>The connection with {0} has been lost.<strong class="hl-tag" style="color: #000096"></data></strong>
 <strong class="hl-tag" style="color: #000096"></msg></strong>
 <strong class="hl-tag" style="color: #000096"><msg</strong> <span class="hl-attribute" style="color: #F5844C">id</span>=<span class="hl-value" style="color: #993300">"FileNotFound"</span><strong class="hl-tag" style="color: #000096">></strong>
 + <strong class="hl-tag" style="color: #000096"><type></strong>Error<strong class="hl-tag" style="color: #000096"></type></strong>
 <strong class="hl-tag" style="color: #000096"><data</strong> <span class="hl-attribute" style="color: #F5844C">its:locNote</span>=<span class="hl-value" style="color: #993300">"{0} is a filename"</span><strong class="hl-tag" style="color: #000096">></strong>{0} not found.<strong class="hl-tag" style="color: #000096"></data></strong>
 <strong class="hl-tag" style="color: #000096"></msg></strong>
 <strong class="hl-tag" style="color: #000096"></body></strong>
 @@ -1502,7 +1505,8 @@ communicate notes to localizers about a particular item of content.</p><p>This data category can be used for several purposes, including, but not limited to:</p><ul><li><p>Tell the translator how to translate parts of the content</p></li><li><p>Expand on the meaning or contextual usage of a specific element, such as what a variable refers to or how a string will be used in the user interface</p></li><li><p>Clarify ambiguity and show relationships between items sufficiently to allow - correct translation (e.g., in many languages it is impossible to translate the word"<span class="quote">enabled</span>" in isolation without knowing the gender, number and case of + correct translation (e.g., in many languages it is impossible to translate the word "<span class="quote">enabled</span>" + in isolation without knowing the gender, number and case of the thing it refers to.)</p></li><li><p>Indicate why a piece of text is emphasized (important, sarcastic, etc.)</p></li></ul><p>Two types of informative notes are needed:</p><ul><li><p>An alert contains information that the translator must read before translating a piece of text. Example: an instruction to the translator to leave parts of the text in the source language.</p></li><li><p>A description provides useful background information that the translator will @@ -1516,8 +1520,7 @@ content of the element, <em>including</em> child elements, but <em>excluding</em> attributes.</p><p id="localizationnote-global">GLOBAL: The <code class="its-elem-markup">locNoteRule</code> element contains the following:</p><ul><li><p>A required <code class="its-attr-markup">selector</code> attribute. It contains an <a href="#selectors" shape="rect">absolute selector</a> which selects the nodes to which this - rule applies.</p></li><li><p>A required <code class="its-attr-markup">locNoteType</code> attribute with the value - "description" or "alert".</p></li><li><p>Exactly one of the following:</p><ul><li><p>A <code class="its-elem-markup">locNote</code> element that contains the note itself and allows for <a href="#selection-local" shape="rect">local ITS markup</a>.</p></li><li><p>A <code class="its-attr-markup">locNotePointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node that holds the + rule applies.</p></li><li><p>A required <code class="its-attr-markup">locNoteType</code> attribute with the value "description" or "alert".</p></li><li><p>Exactly one of the following:</p><ul><li><p>A <code class="its-elem-markup">locNote</code> element that contains the note itself and allows for <a href="#selection-local" shape="rect">local ITS markup</a>.</p></li><li><p>A <code class="its-attr-markup">locNotePointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node that holds the localization note.</p></li><li><p>A <code class="its-attr-markup">locNoteRef</code> attribute that contains an IRI referring to the location of the localization note.</p></li><li><p>A <code class="its-attr-markup">locNoteRefPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node that holds the IRI referring to the location of the localization note.</p></li></ul></li></ul><div class="exampleOuter"><div class="exampleHeader"><a name="EX-locNote-element-1" id="EX-locNote-element-1" shape="rect"/>Example 32: The <code class="its-elem-markup">locNote</code> element</div><p>The <code class="its-elem-markup">locNoteRule</code> element associates the content of the <code class="its-elem-markup">locNote</code> @@ -1814,9 +1817,8 @@ <strong class="hl-tag" style="color: #000096"></its:rules></strong>
 </pre></div><p>[Source file: <a href="examples/xml/EX-lang-definition-1.xml" shape="rect">examples/xml/EX-lang-definition-1.xml</a>]</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p>The <a href="#language-information" shape="rect">Language Information</a> data category only provides for rules to be expressed at a global level. Locally users are able to - use <code>xml:lang</code> (which is defined by XML) or an attribute specific to the - format in question (as in <a href="#EX-lang-definition-1" shape="rect">Example 49</a>).</p><p> - <code>xml:lang</code> is the preferable means of language identification. To ease the + use <code>xml:lang</code> (which is defined by XML), or <code>lang</code> in HTML, or an attribute specific to the + format in question (as in <a href="#EX-lang-definition-1" shape="rect">Example 49</a>).</p><p>In XML <code>xml:lang</code> is the preferable means of language identification. To ease the usage of <code>xml:lang</code>, a declaration for this attribute is part of the non-normative XML DTD and XML Schema document for ITS markup declarations. There is no declaration of <code>xml:lang</code> in the non-normative RELAX NG document for @@ -1826,7 +1828,7 @@ since <code>xml:lang</code> is the standard way to specify language information in XML. <code>xml:lang</code> is defined in terms of <a href="http://www.w3.org/TR/2006/REC-xml-20060816/#sec-lang-tag" shape="rect">RFC 3066 or its successor</a> (<a title="Tags for Identifying
 Languages" href="#bcp47" shape="rect">[BCP47]</a> is the "Best Common - Practice" for language identification and encompasses <a title="Tags for the Identification of
 Languages" href="#rfc3066" shape="rect">[RFC 3066]</a> and its successors.)</p></div><span class="editor-note">[Ed. note: Add example for HTML5 and lang.]</span></div><div class="div3"> + Practice" for language identification and encompasses <a title="Tags for the Identification of
 Languages" href="#rfc3066" shape="rect">[RFC 3066]</a> and its successors.)</p><p>In HTML <code>lang</code> is the mandated means of language identification.</p></div></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="langinfo-implementation" id="langinfo-implementation" shape="rect"/>8.7.2 Implementation</h4><p>The <a href="#language-information" shape="rect">Language Information</a> data category can be expressed only with global rules. For elements, the data category information <a href="#def-inheritance" shape="rect">inherits</a> to the textual content of the element, <em>including</em> child elements and attributes. There is no default.</p><p id="languageinformation-global">GLOBAL: The <code class="its-elem-markup">langRule</code> element contains @@ -1938,21 +1940,20 @@ <code>body</code> element is in the domain expressed by associated values. The <code class="its-attr-markup">domainPointer</code> attribute points to the values in the source content. In this case it points to the <code>meta</code> elements with the <code>name</code> - attributes set to <code>keywords</code> or <code>dcterms.subject</code> hold the + attribute set to "keywords" or to "dcterms.subject". These elements hold the values in their <code>content</code> attributes. The <code class="its-attr-markup">domainMapping</code> - attribute contains the comma separated list of mappings. In the example, - <code>automotive</code> is available in the source content, and <code>auto</code> - is used within the consumer tool, e.g. a machine translation system.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><its:rules</strong> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span> <span class="hl-attribute" style="color: #F5844C">version</span>=<span class="hl-value" style="color: #993300">"2.0"</span>
 + attribute contains the comma separated list of mappings. In the example, "automotive" is + available in the source content, and "auto" is used within the consumer tool, e.g. a machine translation system.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><its:rules</strong> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span> <span class="hl-attribute" style="color: #F5844C">version</span>=<span class="hl-value" style="color: #993300">"2.0"</span>
 <span class="hl-attribute" style="color: #F5844C">xmlns:h</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/1999/xhtml"</span><strong class="hl-tag" style="color: #000096">></strong>
 <strong class="hl-tag" style="color: #000096"><its:domainRule</strong> <span class="hl-attribute" style="color: #F5844C">selector</span>=<span class="hl-value" style="color: #993300">"/h:html/h:body"</span>
 <span class="hl-attribute" style="color: #F5844C">domainPointer</span>=<span class="hl-value" style="color: #993300">"/h:html/h:head/h:meta[@name='dcterms.subject' or @name='keywords']/@content"</span>
 <span class="hl-attribute" style="color: #F5844C">domainMapping</span>=<span class="hl-value" style="color: #993300">"automotive auto, medical medicine, 'criminal law' law, 'property law' law"</span><strong class="hl-tag" style="color: #000096">/></strong>
 <strong class="hl-tag" style="color: #000096"></its:rules></strong>
 </pre></div><p>[Source file: <a href="examples/xml/EX-domain-2.xml" shape="rect">examples/xml/EX-domain-2.xml</a>]</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p>In HTML5 the preferred way to express domain information is a <code>meta</code> - element with the <code>name</code> attribute set to <code>keywords</code>, see <a href="http://www.w3.org/TR/html5/single-page.html#standard-metadata-names" shape="rect">standard metadata names in HTML5</a>. Alternatively, following the process for + element with the <code>name</code> attribute set to "keywords", see <a href="http://www.w3.org/TR/html5/single-page.html#standard-metadata-names" shape="rect">standard metadata names in HTML5</a>. Alternatively, following the process for <a href="http://www.w3.org/TR/html5/single-page.html#other-metadata-names" shape="rect">other metadata names</a> the <a href="http://wiki.whatwg.org/wiki/MetaExtensions" shape="rect">extension value</a> of - <code>dcterms.subject</code> can be used. The usage of both <code>keywords</code> - and <code>dcterms.subject</code> is shown in example <a href="#EX-domain-2" shape="rect">Example 54</a>.</p><p>In the area of machine translation (e.g. machine translation systems or systems + "dcterms.subject" can be used. The usage of both "keywords" + and "dcterms.subject" is shown in example <a href="#EX-domain-2" shape="rect">Example 54</a>.</p><p>In the area of machine translation (e.g. machine translation systems or systems harvesting content for machine translation training), there is no agreed upon set of value sets for domain. Nevertheless it is recommended to use a small set of values both in source content and within consumer tools, to foster interoperability. If @@ -1966,10 +1967,10 @@ machine translation engine. </p><p>The consumer machine translation engine might choose to ignore the domain and take a one size fits all approach, or may be selective in which domains to use, based on the range of content marked with domain. For example, if the content has hundreds of - sentences marked with domain 'automotive' and 'medical', but only a couple of - sentences marked with additional domains 'criminal law' and 'property law', the - consumer tool may opt to include its domains 'auto' and 'medicine', but not 'law', - since the extra training resources does not justify the improvement in the + sentences marked with domain "automotive" and "medical", but only a couple of + sentences marked with additional domains "criminal law" and "property law", the + consumer tool may opt to include its domains "auto" and "medicine", but not + "law", since the extra training resources does not justify the improvement in the output.</p></div></div></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="Disambiguation" id="Disambiguation" shape="rect"/>8.10 Disambiguation</h3><span class="editor-note">[Ed. note: This data category is not completely stable yet.]</span><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="Disambiguation-definition" id="Disambiguation-definition" shape="rect"/>8.10.1 Definition</h4><p>The <a href="#Disambiguation" shape="rect">Disambiguation</a> data category is used to @@ -1994,14 +1995,14 @@ accessing the intended meaning or lexical choice of the fragment, and thereby contributing to its correct translation.</p><p>A fragment of text is disambiguated at different granularities: (1) lexical type, (2) ontological concept, or (3) named entity.</p><p>In the case of lexical type, the external resource may provide appropriate synonyms - and example usage, such as, for example, the WordNet services do.</p><p>In the case of ontological concept, the external resource may provide a formalized + and example usage, such as what WordNet services do.</p><p>In the case of ontological concept, the external resource may provide a formalized conceptual definition arranged in a hierarchical framework of related concepts.</p><p>In the case of a named entity, the external resource may provide a fully fledged description of the associated real world entity. For instance, the word 'City' in the fragment 'I am going to the City' may be disambiguated on the basis of one of WordNet's synsets that can be represented by 'city', an ontological concept of 'City' that could represent a subclass of 'Populated Place' at the conceptual granularity level, or the central area of a particular city, e.g. 'City of London', as interpreted - at the entity granularity level.</p><p>Emerging linked data networks, such as DBpedia, further increase the interlinking of + at the entity granularity level.</p><p>Linked data networks, such as DBpedia, further increase the interlinking of ontological concepts and named entity definitions for same things and in different languages, thereby offering the possibility to directly facilitate translation through a source language description.</p><p>Two types of disambiguation are possible:</p><ul><li><p>Disambiguation for target type class, which explicitly describes the type class @@ -2012,7 +2013,7 @@ this information, or employ it to index their content. Machine translation services may use this information for optimizing their language and translation models.</p></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="Disambiguation-implementation" id="Disambiguation-implementation" shape="rect"/>8.10.2 Implementation</h4><p>The <a href="#Disambiguation" shape="rect">Disambiguation</a> data category can be expressed - with global rules, or locally on an individual element. There is no inheritance.</p><span class="editor-note">[Ed. note: Below will need a test case in the test suite.]</span><p id="disambiguation-use-cases">When using disambiguation specifying the target + with global rules, or locally on an individual element. There is no inheritance.</p><p id="disambiguation-use-cases">When using disambiguation specifying the target identity, the user <a href="#rfc2119" shape="rect">MUST</a> use only one of the two addressing modes:</p><ol class="depth1"><li><p>Using <code class="its-attr-markup">disambigSource</code> and one of <code class="its-attr-markup">disambigIdent</code> or <code class="its-attr-markup">disambigIdentPointer</code> (at a global rule) to specify the collection and @@ -2021,7 +2022,7 @@ contains the following:</p><ul><li><p>A required <code class="its-attr-markup">selector</code> attribute that contains an <a href="#selectors" shape="rect">absolute selector</a> which selects the nodes to which this rule applies.</p></li><li><p>An optional <code class="its-attr-markup">disambigGranularity</code> attribute that contains a string, specifying the granularity level of the disambiguation. The value <a href="#rfc2119" shape="rect">MUST</a> be one of the following identifiers: - <code>lexicalConcept</code>, <code>ontologyConcept</code>, or <code>entity</code>. + "lexicalConcept", "ontologyConcept", or "entity". The default value is <code>entity</code>.</p></li><li><p>At least one of the following: </p><ul><li><p>To specify the target type class, exactly one of the following: </p><ul><li><p>A <code class="its-attr-markup">disambigClassPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node specifying the type of entity or concept class behind the selector.</p></li><li><p>A <code class="its-attr-markup">disambigClassRefPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a> pointing to a node that @@ -2032,7 +2033,7 @@ target.</p></li></ul></li></ul></li></ul><p>For an example, see <a href="#EX-disambiguation-html5-rdfa-companion-document" shape="rect">Example 57</a>.</p><p id="disambiguation-local">LOCAL: The following local markup is available for the <a href="#Disambiguation" shape="rect">Disambiguation</a> data category:</p><ul><li><p>An optional <code class="its-attr-markup">disambigConfidence</code> attribute with the value of a rational number in the interval 0 to 1 (inclusive). The value follows the <a href="http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#decimal" shape="rect">XML Schema decimal data type</a> with the constraining facets <a href="http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#rf-minInclusive" shape="rect">minInclusive</a> set to 0 and <a href="http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#rf-maxInclusive" shape="rect">maxInclusive</a> set to 1. <code class="its-attr-markup">disambigConfidence</code> represents the confidence of the agents producing the annotation that the union of the values for the other disambiguation attributes in this instance are accurate. 1 represents the highest level of confidence.</p></li><li><p>An optional <code class="its-attr-markup">disambigGranularity</code> attribute tat contains a string, specifying the granularity level of the disambiguation. The value <a href="#rfc2119" shape="rect">MUST</a> be one of the following identifiers: - <code>lexicalConcept</code>, <code>ontologyConcept</code>, or <code>entity</code>. + "lexicalConcept", "ontologyConcept", or "entity". The default value is <code>entity</code>.</p></li><li><p>At least one of the following: </p><ul><li><p>To specify the target type class: </p><ul><li><p>A <code class="its-attr-markup">disambigClassRef</code> attribute that contains an IRI, specifying the type of entity or concept class behind the selector.</p></li></ul></li><li><p>To specify the target identity, exactly one of the following: </p><ul><li><p>When using the addressing <a href="#disambiguation-use-cases" shape="rect">mode 1</a>:</p><ul><li><p>A <code class="its-attr-markup">disambigSource</code> attribute that contains a string representing the disambiguation identifier collection source.</p></li><li><p>A <code class="its-attr-markup">disambigIdent</code> attribute that contains a string, --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2012/11/27 18:26:03 1.266 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2012/11/28 14:48:43 1.267 @@ -1715,7 +1715,7 @@ <p>The value of <att>toolsRef</att> is a space-separated list of references where each reference is composed of two parts: a data category identifier and an IRI. These two parts are separated by a character <code>|</code> VERTICAL LINE (U+007C).</p> <list> - <item><p>The data category identifier <ref target="#rfc2119">MUST</ref> be one of the following identifiers: <code>allowed-characters</code>, <code>directionality</code>, <code>disambiguation</code>, <code>domain</code>, <code>elements-within-text</code>, <code>external-resource</code>, <code>id-value</code>, <code>language-information</code>, <code>locale-filter</code>, <code>localization-note</code>, <code>lq-issue</code>, <code>lq-precis</code>, <code>mt-confidence</code>, <code>provenance</code>, <code>ruby</code>, <code>storage-size</code>, <code>target-pointer</code>, <code>terminology</code>, <code>translate</code>.</p></item> + <item><p>The data category identifier <ref target="#rfc2119">MUST</ref> be one of the following identifiers: <code>allowed-characters</code>, <code>directionality</code>, <code>disambiguation</code>, <code>domain</code>, <code>elements-within-text</code>, <code>external-resource</code>, <code>id-value</code>, <code>language-information</code>, <code>locale-filter</code>, <code>localization-note</code>, <code>lq-issue</code>, <code>lq-rating</code>, <code>mt-confidence</code>, <code>provenance</code>, <code>ruby</code>, <code>storage-size</code>, <code>target-pointer</code>, <code>terminology</code>, <code>translate</code>.</p></item> <item><p>The IRI indicates information about the processor used to generate the data category annotation. No single means is specified for how this IRI should be used to indicate processor information. Possible mechanisms are: to encode information directly in the IRI, e.g. as parameters; to reference an external resource that provides such information, e.g. an XML file or an RDF declaration; or to reference another part of the document that provides such information.</p></item> </list> @@ -2322,8 +2322,8 @@ <item>Expand on the meaning or contextual usage of a specific element, such as what a variable refers to or how a string will be used in the user interface</item> <item>Clarify ambiguity and show relationships between items sufficiently to allow - correct translation (e.g., in many languages it is impossible to translate the word - <quote>enabled</quote> in isolation without knowing the gender, number and case of + correct translation (e.g., in many languages it is impossible to translate the word <quote>enabled</quote> + in isolation without knowing the gender, number and case of the thing it refers to.)</item> <item>Indicate why a piece of text is emphasized (important, sarcastic, etc.)</item> </list> @@ -2355,8 +2355,7 @@ <item>A required <att>selector</att> attribute. It contains an <ref target="#selectors">absolute selector</ref> which selects the nodes to which this rule applies.</item> - <item>A required <att type="element">locNoteType</att> attribute with the value - <val>description</val> or <val>alert</val>.</item> + <item>A required <att type="element">locNoteType</att> attribute with the value <val>description</val> or <val>alert</val>.</item> <item><p>Exactly one of the following:</p> <list type="unordered"> <item>A <gi>locNote</gi> element that contains the note itself and allows for <ref @@ -2735,9 +2734,9 @@ <note> <p>The <ref target="#language-information">Language Information</ref> data category only provides for rules to be expressed at a global level. Locally users are able to - use <code>xml:lang</code> (which is defined by XML) or an attribute specific to the + use <code>xml:lang</code> (which is defined by XML), or <code>lang</code> in HTML, or an attribute specific to the format in question (as in <ptr target="#EX-lang-definition-1" type="exref"/>).</p> - <p><code>xml:lang</code> is the preferable means of language identification. To ease the + <p>In XML <code>xml:lang</code> is the preferable means of language identification. To ease the usage of <code>xml:lang</code>, a declaration for this attribute is part of the non-normative XML DTD and XML Schema document for ITS markup declarations. There is no declaration of <code>xml:lang</code> in the non-normative RELAX NG document for @@ -2751,8 +2750,8 @@ successor</ref> (<ptr target="#bcp47" type="bibref"/> is the "Best Common Practice" for language identification and encompasses <ptr type="bibref" target="#rfc3066"/> and its successors.)</p> + <p>In HTML <code>lang</code> is the mandated means of language identification.</p> </note> - <note type="ed">Add example for HTML5 and lang.</note> </div> <div xml:id="langinfo-implementation"> <head>Implementation</head> @@ -2983,24 +2982,23 @@ <code>body</code> element is in the domain expressed by associated values. The <att>domainPointer</att> attribute points to the values in the source content. In this case it points to the <code>meta</code> elements with the <code>name</code> - attributes set to <code>keywords</code> or <code>dcterms.subject</code> hold the + attribute set to <val>keywords</val> or to <val>dcterms.subject</val>. These elements hold the values in their <code>content</code> attributes. The <att>domainMapping</att> - attribute contains the comma separated list of mappings. In the example, - <code>automotive</code> is available in the source content, and <code>auto</code> - is used within the consumer tool, e.g. a machine translation system.</p> + attribute contains the comma separated list of mappings. In the example, <val>automotive</val> is + available in the source content, and <val>auto</val> is used within the consumer tool, e.g. a machine translation system.</p> <egXML xmlns="http://www.tei-c.org/ns/Examples" target="examples/xml/EX-domain-2.xml" /> </exemplum> <note> <p>In HTML5 the preferred way to express domain information is a <code>meta</code> - element with the <code>name</code> attribute set to <code>keywords</code>, see <ref + element with the <code>name</code> attribute set to <val>keywords</val>, see <ref target="http://www.w3.org/TR/html5/single-page.html#standard-metadata-names" >standard metadata names in HTML5</ref>. Alternatively, following the process for <ref target="http://www.w3.org/TR/html5/single-page.html#other-metadata-names" >other metadata names</ref> the <ref target="http://wiki.whatwg.org/wiki/MetaExtensions">extension value</ref> of - <code>dcterms.subject</code> can be used. The usage of both <code>keywords</code> - and <code>dcterms.subject</code> is shown in example <ptr target="#EX-domain-2" + <val>dcterms.subject</val> can be used. The usage of both <val>keywords</val> + and <val>dcterms.subject</val> is shown in example <ptr target="#EX-domain-2" type="exref"/>.</p> <p>In the area of machine translation (e.g. machine translation systems or systems harvesting content for machine translation training), there is no agreed upon set of @@ -3018,10 +3016,10 @@ <p>The consumer machine translation engine might choose to ignore the domain and take a one size fits all approach, or may be selective in which domains to use, based on the range of content marked with domain. For example, if the content has hundreds of - sentences marked with domain 'automotive' and 'medical', but only a couple of - sentences marked with additional domains 'criminal law' and 'property law', the - consumer tool may opt to include its domains 'auto' and 'medicine', but not 'law', - since the extra training resources does not justify the improvement in the + sentences marked with domain <val>automotive</val> and <val>medical</val>, but only a couple of + sentences marked with additional domains <val>criminal law</val> and <val>property law</val>, the + consumer tool may opt to include its domains <val>auto</val> and <val>medicine</val>, but not + <val>law</val>, since the extra training resources does not justify the improvement in the output.</p></note> </div> </div> @@ -3062,7 +3060,7 @@ <p>A fragment of text is disambiguated at different granularities: (1) lexical type, (2) ontological concept, or (3) named entity.</p> <p>In the case of lexical type, the external resource may provide appropriate synonyms - and example usage, such as, for example, the WordNet services do.</p> + and example usage, such as what WordNet services do.</p> <p>In the case of ontological concept, the external resource may provide a formalized conceptual definition arranged in a hierarchical framework of related concepts.</p> <p>In the case of a named entity, the external resource may provide a fully fledged @@ -3072,7 +3070,7 @@ that could represent a subclass of 'Populated Place' at the conceptual granularity level, or the central area of a particular city, e.g. 'City of London', as interpreted at the entity granularity level.</p> - <p>Emerging linked data networks, such as DBpedia, further increase the interlinking of + <p>Linked data networks, such as DBpedia, further increase the interlinking of ontological concepts and named entity definitions for same things and in different languages, thereby offering the possibility to directly facilitate translation through a source language description.</p> @@ -3094,7 +3092,6 @@ <head>Implementation</head> <p>The <ref target="#Disambiguation">Disambiguation</ref> data category can be expressed with global rules, or locally on an individual element. There is no inheritance.</p> - <note type="ed">Below will need a test case in the test suite.</note> <p xml:id="disambiguation-use-cases">When using disambiguation specifying the target identity, the user <ref target="#rfc2119">MUST</ref> use only one of the two addressing modes:</p> @@ -3114,7 +3111,7 @@ <item>An optional <att>disambigGranularity</att> attribute that contains a string, specifying the granularity level of the disambiguation. The value <ref target="#rfc2119">MUST</ref> be one of the following identifiers: - <code>lexicalConcept</code>, <code>ontologyConcept</code>, or <code>entity</code>. + <val>lexicalConcept</val>, <val>ontologyConcept</val>, or <val>entity</val>. The default value is <code>entity</code>.</item> <item><p>At least one of the following: </p><list> <item><p>To specify the target type class, exactly one of the following: </p><list> @@ -3150,7 +3147,7 @@ <item><p>An optional <att>disambigGranularity</att> attribute that contains a string, specifying the granularity level of the disambiguation. The value <ref target="#rfc2119">MUST</ref> be one of the following identifiers: - <code>lexicalConcept</code>, <code>ontologyConcept</code>, or <code>entity</code>. + <val>lexicalConcept</val>, <val>ontologyConcept</val>, or <val>entity</val>. The default value is <code>entity</code>.</p></item> <item><p>At least one of the following: </p><list>
Received on Wednesday, 28 November 2012 14:48:50 UTC