- From: CVS User fsasaki <cvsmail@w3.org>
- Date: Fri, 16 Aug 2013 15:57:46 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20 In directory gil:/tmp/cvs-serv16081 Modified Files: its20.html its20.odd Log Message: changes related to NIF announced at http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Aug/0038.html --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2013/07/23 10:44:43 1.485 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2013/08/16 15:57:45 1.486 @@ -59,16 +59,15 @@ <div class="toc3">5.2.2 <a href="#selection-local" shape="rect">Local Selection in an XML Document</a></div> </div> <div class="toc2">5.3 <a href="#selectors" shape="rect">Query Language of Selectors</a><div class="toc3">5.3.1 <a href="#queryLanguage" shape="rect">Choosing Query Language</a></div> -<div class="toc3">5.3.2 <a href="#d0e2542" shape="rect">XPath 1.0</a></div> +<div class="toc3">5.3.2 <a href="#d0e2530" shape="rect">XPath 1.0</a></div> <div class="toc3">5.3.3 <a href="#css-selectors" shape="rect">CSS Selectors</a></div> -<div class="toc3">5.3.4 <a href="#d0e2790" shape="rect">Additional query languages</a></div> +<div class="toc3">5.3.4 <a href="#d0e2778" shape="rect">Additional query languages</a></div> <div class="toc3">5.3.5 <a href="#its-param" shape="rect">Variables in selectors</a></div> </div> <div class="toc2">5.4 <a href="#link-external-rules" shape="rect">Link to External Rules</a></div> <div class="toc2">5.5 <a href="#selection-precedence" shape="rect">Precedence between Selections</a></div> <div class="toc2">5.6 <a href="#associating-its-with-existing-markup" shape="rect">Associating ITS Data Categories with Existing Markup</a></div> -<div class="toc2">5.7 <a href="#conversion-to-nif" shape="rect">Conversion to NIF</a></div> -<div class="toc2">5.8 <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a></div> +<div class="toc2">5.7 <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a></div> </div> <div class="toc1">6 <a href="#html5-markup" shape="rect">Using ITS Markup in HTML</a><div class="toc2">6.1 <a href="#html5-local-attributes" shape="rect">Mapping of Local Data Categories to HTML</a></div> <div class="toc2">6.2 <a href="#html5-global-rules" shape="rect">Global rules</a></div> @@ -141,10 +140,11 @@ <div class="toc1">C <a href="#lqissue-typevalues" shape="rect">Values for the Localization Quality Issue Type</a></div> <div class="toc1">D <a href="#its-schemas" shape="rect">Schemas for ITS</a></div> <div class="toc1">E <a href="#informative-references" shape="rect">References</a> (Non-Normative)</div> -<div class="toc1">F <a href="#nif-backconversion" shape="rect">Conversion NIF2ITS</a> (Non-Normative)</div> -<div class="toc1">G <a href="#list-of-elements-and-attributes" shape="rect">List of ITS 2.0 Global Elements and Local Attributes</a> (Non-Normative)</div> -<div class="toc1">H <a href="#revisionlog" shape="rect">Revision Log</a> (Non-Normative)</div> -<div class="toc1">I <a href="#acknowledgements" shape="rect">Acknowledgements</a> (Non-Normative)</div> +<div class="toc1">F <a href="#conversion-to-nif" shape="rect">Conversion to NIF</a> (Non-Normative)</div> +<div class="toc1">G <a href="#nif-backconversion" shape="rect">Conversion NIF2ITS</a> (Non-Normative)</div> +<div class="toc1">H <a href="#list-of-elements-and-attributes" shape="rect">List of ITS 2.0 Global Elements and Local Attributes</a> (Non-Normative)</div> +<div class="toc1">I <a href="#revisionlog" shape="rect">Revision Log</a> (Non-Normative)</div> +<div class="toc1">J <a href="#acknowledgements" shape="rect">Acknowledgements</a> (Non-Normative)</div> </div><hr/><div class="body"><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="introduction" id="introduction" shape="rect"/>1 Introduction</h2><p> <em>This section is informative.</em> @@ -291,7 +291,7 @@ metadata in localization workflows</p></li></ul><p>One example outcome of the resulting synergies is the <a href="#its-tool-annotation" shape="rect">ITS Tool Annotation</a> mechanism. It addresses the provenance-related requirement by allowing ITS processors to leave a trace: ITS processors can basically say “It is me that generated this bit of - information”. Another example are the <a title="" href="#nif-reference" shape="rect">[NIF]</a> related details of ITS 2.0, which help to couple Natural Language + information”. Another example are the <a title="" href="#nif-reference" shape="rect">[NIF]</a> related details of ITS 2.0, which provide a non-normative approach to couple Natural Language Processing with concepts of the Semantic Web.</p></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="usage-scenarios" id="usage-scenarios" shape="rect"/>1.3 Usage Scenarios</h3><p>The <a title="
Internationalization Tag Set (ITS) Version 1.0
" href="#its10" shape="rect">[ITS 1.0]</a> <a href="http://www.w3.org/TR/2007/REC-its-20070403/#introduction" shape="rect">introduction</a> states: “ITS is a technology to easily create XML, which is internationalized and can be localized effectively”. In order to make this tangible, ITS 1.0 provided examples for <a href="http://www.w3.org/TR/2007/REC-its-20070403/#users-usage" shape="rect">users and usages</a>. Implicitly, these examples carried the information that ITS covers two areas: one that is related to the static dimension of mono-lingual content, and one that is related to the dynamic dimension of multilingual production.</p><ul><li><p>Static mono-lingual (for example, the area of content authors): This part of the @@ -322,8 +322,8 @@ settled, the Ruby data category possibly will be reintroduced, in a subsequent version of ITS.</p></li><li><p>The <a href="#directionality" shape="rect">Directionality</a> data category reflects directionality markup in <a title="HTML 4.01" href="#html4" shape="rect">[HTML 4.01]</a>. The reason is that enhancements are being discussed in the context of HTML5 that are expected to change the approach to marking up directionality, in particular to support content whose directionality needs to be isolated from that of surrounding content. However, these enhancements are not finalized yet. They will be reflected in a future revision of ITS.</p></li></ul><p> <em>Additional or modified mechanisms:</em> The following mechanisms from ITS 1.0 have been modified or added to ITS 2.0:</p><ul><li><p id="query-language-on-rules-element">ITS 1.0 used only XPath as the mechanism for selecting nodes in <a href="#basic-concepts-selection-global" shape="rect">global rules</a>. ITS 2.0 allows for choosing the <a href="#selectors" shape="rect">query language of selectors</a>. The default is XPath 1.0. An ITS 2.0 processor is free to support other selection mechanisms, like CSS selectors or other versions of XPath.</p></li><li><p id="parameters-in-selector">In global rules it is now possible to set <a href="#its-param" shape="rect">variables for the selectors</a> (XPath expression). The <code class="its-elem-markup">param</code> element serves this purpose.</p></li><li><p>ITS 2.0 has an <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a> mechanism to associate processor information with the use of individual data categories. See <a class="sectin-ref" href="#traceability" shape="rect">Section 2.6: Traceability</a> for details.</p></li></ul><p> - <em>Mappings:</em> ITS 2.0 provides a normative algorithm to convert ITS 2.0 information into <a title="" href="#nif-reference" shape="rect">[NIF]</a> and links to guidance about how to relate ITS 2.0 to XLIFF. See <a class="section-ref" href="#mapping-conversion" shape="rect">Section 2.7: Mapping and conversion</a> for details.</p><p> - <em>Changes to the conformance section</em>: The <a class="section-ref" href="#conformance" shape="rect">Section 4: Conformance</a> tells implementers how to implement ITS. For ITS 2.0, the conformance statements related to Ruby have been removed, and a conformance clause related to processing <a title="" href="#nif-reference" shape="rect">[NIF]</a> has been added. For <a title="HTML5" href="#html5" shape="rect">[HTML5]</a>, a dedicated conformance section has been created. Finally, a conformance clause related to Non-ITS elements and attributes has been added.</p></div><div class="div2"> + <em>Mappings:</em> ITS 2.0 provides a non-normative algorithm to convert ITS 2.0 information into <a title="" href="#nif-reference" shape="rect">[NIF]</a> and links to guidance about how to relate ITS 2.0 to XLIFF. See <a class="section-ref" href="#mapping-conversion" shape="rect">Section 2.7: Mapping and conversion</a> for details.</p><p> + <em>Changes to the conformance section</em>: The <a class="section-ref" href="#conformance" shape="rect">Section 4: Conformance</a> tells implementers how to implement ITS. For ITS 2.0, the conformance statements related to Ruby have been removed. For <a title="HTML5" href="#html5" shape="rect">[HTML5]</a>, a dedicated conformance section has been created. Finally, a conformance clause related to Non-ITS elements and attributes has been added.</p></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="extended-implementation-hints" id="extended-implementation-hints" shape="rect"/>1.5 Extended implementation hints</h3><p id="unicode-normalization">As a general guidance, implementations of ITS 2.0 are encouraged to use a <a href="http://www.w3.org/TR/2012/WD-charmod-norm-20120501/#sec-NormalizingTranscoder" shape="rect">normalizing transcoder</a>. It converts from a legacy encoding to a Unicode encoding form and ensures that the result is in Unicode Normalization Form C. Further information on the topic of Unicode normalization is provided in <a title="Character Model for the World Wide Web 1.0: Normalization" href="#charmod-norm" shape="rect">[Charmod Norm]</a>.</p></div></div><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="basic-concepts" id="basic-concepts" shape="rect"/>2 Basic Concepts</h2><p> <em>This section is informative.</em> @@ -495,7 +495,7 @@ <strong class="hl-tag" style="color: #000096"></html></strong></pre></div><p>[Source file: <a href="examples/html5/EX-translate-html5-inline-global-1.html" shape="rect">examples/html5/EX-translate-html5-inline-global-1.html</a>]</p></div></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-its-local-markup" id="html5-its-local-markup" shape="rect"/>2.5.2 Local approach</h4><p>In HTML, an ITS 2.0 local data category is realized with the prefix <code>its-</code>. The general mapping of the XML based ITS 2.0 attributes to their HTML counterparts is defined in - <a class="section-ref" href="#html5-local-attributes" shape="rect">Section 6.1: Mapping of Local Data Categories to HTML</a>. An informative table in <a class="section-ref" href="#list-of-elements-and-attributes" shape="rect">Appendix G: List of ITS 2.0 Global Elements and Local Attributes</a> + <a class="section-ref" href="#html5-local-attributes" shape="rect">Section 6.1: Mapping of Local Data Categories to HTML</a>. An informative table in <a class="section-ref" href="#list-of-elements-and-attributes" shape="rect">Appendix H: List of ITS 2.0 Global Elements and Local Attributes</a> provides an overview of the mapping for all data categories.</p></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-existing-markup-versus-its" id="html5-existing-markup-versus-its" shape="rect"/>2.5.3 HTML markup with ITS 2.0 counterparts</h4><p>There are four ITS 2.0 data categories, which have counterparts in HTML markup. In these cases, native HTML markup provides some information in terms of ITS 2.0 data categories. For these data categories, ITS 2.0 defines the following:</p><ul><li><p>The <a href="#language-information" shape="rect">Language Information</a> data category has the HTML <code>lang</code> @@ -529,18 +529,18 @@ <strong class="hl-tag" style="color: #000096"><img</strong> <span class="hl-attribute" style="color: #F5844C">src</span>=<span class="hl-value" style="color: #993300">"http://example.com/myimg.png"</span> <span class="hl-attribute" style="color: #F5844C">alt</span>=<span class="hl-value" style="color: #993300">"My image"</span><strong class="hl-tag" style="color: #000096">/></strong>.<strong class="hl-tag" style="color: #000096"></p></strong> <strong class="hl-tag" style="color: #000096"></body></strong> <strong class="hl-tag" style="color: #000096"></html></strong></pre></div><p>[Source file: <a href="examples/html5/EX-its-and-existing-HTML5-markup.html" shape="rect">examples/html5/EX-its-and-existing-HTML5-markup.html</a>]</p></div><p>There are also some HTML markup elements that have or can have similar, but not necessarily identical, roles and behaviors as certain ITS 2.0 data categories. For example, the HTML <code>dfn</code> element could be used to identify a term in the sense of the <a href="#terminology" shape="rect">Terminology</a> data category. However, this is not always the case and it depends on the intentions of the HTML content author. To accommodate this situation, users of ITS 2.0 are encouraged to specify the semantics of existing HTML markup in an ITS 2.0 context with a dedicated global rules file. For example, a rule can be used to define that the HTML <code>dfn</code> has the semantics of ITS <code>term="yes"</code>. For additional examples, see the <a href="http://www.w3.org/T/2008/NOTE-xml-i18n-bp-20080213/#relating-its-plus-xhtml" shape="rect">XML I18N Best Practices</a> document.</p></div><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-standoff-markup-explanation" id="html5-standoff-markup-explanation" shape="rect"/>2.5.4 Standoff markup in HTML5</h4><p>The <a href="#provenance" shape="rect">Provenance</a> and the <a href="#lqissue" shape="rect">Localization Quality Issue</a> data categories allow for using so-called standoff markup, see the XML <a href="#EX-provenance-global-1" shape="rect">Example 59</a>. In HTML such standoff markup is placed into a <code>script</code> element. If this is done, the constraints for <a href="#provenance-records-in-html5-constraint" shape="rect">Provenance standoff</a> markup in HTML and <a href="#loc-quality-issues-in-html5-constraint" shape="rect">Localization quality issue</a> markup in HTML need to be taken into account. Examples of standoff markup in HTML for the two data categories are <a href="#EX-proveance-html5-local-2" shape="rect">Example 62</a> and <a href="#EX-locQualityIssue-html5-local-2" shape="rect">Example 77</a>.</p></div><div class="div3"> +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="html5-standoff-markup-explanation" id="html5-standoff-markup-explanation" shape="rect"/>2.5.4 Standoff markup in HTML5</h4><p>The <a href="#provenance" shape="rect">Provenance</a> and the <a href="#lqissue" shape="rect">Localization Quality Issue</a> data categories allow for using so-called standoff markup, see the XML <a href="#EX-provenance-global-1" shape="rect">Example 58</a>. In HTML such standoff markup is placed into a <code>script</code> element. If this is done, the constraints for <a href="#provenance-records-in-html5-constraint" shape="rect">Provenance standoff</a> markup in HTML and <a href="#loc-quality-issues-in-html5-constraint" shape="rect">Localization quality issue</a> markup in HTML need to be taken into account. Examples of standoff markup in HTML for the two data categories are <a href="#EX-proveance-html5-local-2" shape="rect">Example 61</a> and <a href="#EX-locQualityIssue-html5-local-2" shape="rect">Example 76</a>.</p></div><div class="div3"> <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="usage-in-legacy-html" id="usage-in-legacy-html" shape="rect"/>2.5.5 Version of HTML</h4><p>ITS 2.0 does not define how to use ITS in HTML versions prior to version 5. Users are thus encouraged to migrate their content to <a title="HTML5" href="#html5" shape="rect">[HTML5]</a> or XHTML. While it is possible to use <code>its-*</code> attributes introduced for <a title="HTML5" href="#html5" shape="rect">[HTML5]</a> in older versions of HTML (such as 3.2 or 4.01) and pages using these attributes will work without any problems, <code>its-*</code> attributes will be marked as invalid by validators.</p></div></div><div class="div2"> -<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="traceability" id="traceability" shape="rect"/>2.6 Traceability</h3><p>The <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a> mechanism allows processor information to be associated with individual data categories in a document, independently from data category annotations themselves (e.g. the Entity Type related to Text Analysis). The mechanism associates identifiers for tools with data categories via the <code class="its-attr-markup">annotatorsRef</code> attribute (or <a href="" shape="rect">annotators-ref</a> in <a title="HTML5" href="#html5" shape="rect">[HTML5]</a>) and is mandatory for the <a href="#mtconfidence" shape="rect">MT Confidence</a> data category. For the <a href="#terminology" shape="rect">Terminology</a> and <a href="#textanalysis" shape="rect">Text Analysis</a> data categories the TS Tools Annotation is mandatory if the data categories provide confidence information. Nevertheless, <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a> can be used for all data categories. <a href="#EX-its-tool-annotation-2" shape="rect">Example 24</a> demonstrates the usage in the context of several data categories. +<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="traceability" id="traceability" shape="rect"/>2.6 Traceability</h3><p>The <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a> mechanism allows processor information to be associated with individual data categories in a document, independently from data category annotations themselves (e.g. the Entity Type related to Text Analysis). The mechanism associates identifiers for tools with data categories via the <code class="its-attr-markup">annotatorsRef</code> attribute (or <a href="" shape="rect">annotators-ref</a> in <a title="HTML5" href="#html5" shape="rect">[HTML5]</a>) and is mandatory for the <a href="#mtconfidence" shape="rect">MT Confidence</a> data category. For the <a href="#terminology" shape="rect">Terminology</a> and <a href="#textanalysis" shape="rect">Text Analysis</a> data categories the TS Tools Annotation is mandatory if the data categories provide confidence information. Nevertheless, <a href="#its-tool-annotation" shape="rect">ITS Tools Annotation</a> can be used for all data categories. <a href="#EX-its-tool-annotation-2" shape="rect">Example 23</a> demonstrates the usage in the context of several data categories. </p></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="mapping-conversion" id="mapping-conversion" shape="rect"/>2.7 Mapping and conversion</h3><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="mapping-NIF" id="mapping-NIF" shape="rect"/>2.7.1 ITS and RDF/NIF</h4><p>ITS 2.0 defines an algorithm to convert XML or HTML documents (or their DOM - representations) that contain ITS metadata to the RDF format based on <a title="" href="#nif-reference" shape="rect">[NIF]</a>. NIF is an RDF/OWL-based format that aims at interoperability between Natural Language Processing (NLP) tools, language resources and annotations.</p><p>The conversion from <a href="#conversion-to-nif" shape="rect">ITS 2.0 to NIF</a> results in RDF triples. These triples represent the textual content of the original document as RDF typed information. The ITS annotation is represented as properties of content-related triples and relies on an <a href="http://www.w3.org/2005/11/its/rdf#" shape="rect">ITS RDF vocabulary</a>.</p><p>The back conversion from <a href="#nif-backconversion" shape="rect">NIF to ITS 2.0</a> is defined informatively. One motivation for the back conversion is a roundtrip workflow like: 1) conversion to NIF 2) in NIF representation detection of named entities using NLP tools 3) back conversion to HTML and generation of <a href="#textanalysis" shape="rect"Text Analysis</a> markup. The outcome are HTML documents with linked information, see <a href="#EX-text-analysis-html5-local-1" shape="rect">Example 53</a>.</p></div><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="mapping-XLIFF" id="mapping-XLIFF" shape="rect"/>2.7.2 ITS and XLIFF</h4><p>The XML Localization Interchange File Format <a title="XLIFF Version 1.2" href="#xliff1.2" shape="rect">[XLIFF 1.2]</a> is an OASIS standard that enables translatable source text and its translation to be passed between different tools within localization and translation workflows. <a title="XLIFF Version 2.0" href="#xliff2.0" shape="rect">[XLIFF 2.0]</a> is the successor of <a title="XLIFF Version 1.2" href="#xliff1.2" shape="rect">[XLIFF 1.2]</a> and under development. XLIFF has been widely implemented in various translation management systems, computer aided translation tools and in utilities for extracting translatable content from source documents and merging back the content in the target language..</p><p>The mapping between ITS and XLIFFtherefore unpins several important ITS 2.0 usage scenarios <a title="Metadata for the Multilingual Web - Usage Scenarios and Implementations " href="#mlw-metadata-us-impl" shape="rect">[MLW US IMPL]</a>. These usage scenarios involve:</p><ul><li><p>the extraction of ITS metadata from a source language file into XLIFF</p></li><li><p>the addition of ITS metadata into an XLIFF file by translation tools</p></li><li><p>the mapping of ITS metadata in an XLIFF file into ITS metadata in the resulting target language files.</p></li></ul><p>ITS 2.0 has no normative dependency on XLIFF, however a <a href="http://www.w3.org/International/its/wiki/XLIFF_Mapping" shape="rect">non-normative definition of how to represent ITS 2.0 data categories in XLIFF 1.2 or XLIFF 2.0</a> is being defined within the <a href="http://www.w3.org/International/its/ig/" shape="rect">Internationalization Tag Set Interest Group</a>.</p></div></div><div class="div2"> +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="mapping-NIF" id="mapping-NIF" shape="rect"/>2.7.1 ITS and RDF/NIF</h4><p>ITS 2.0 provides a non-normative algorithm to convert XML or HTML documents (or their DOM + representations) that contain ITS metadata to the RDF format based on <a title="" href="#nif-reference" shape="rect">[NIF]</a>. NIF is an RDF/OWL-based format that aims at interoperability between Natural Language Processing (NLP) tools, language resources and annotations.</p><p>The conversion from <a href="#conversion-to-nif" shape="rect">ITS 2.0 to NIF</a> results in RDF triples. These triples represent the textual content of the original document as RDF typed information. The ITS annotation is represented as properties of content-related triples and relies on an <a href="http://www.w3.org/2005/11/its/rdf#" shape="rect">ITS RDF vocabulary</a>.</p><p>The back conversion from <a href="#nif-backconversion" shape="rect">NIF to ITS 2.0</a> is defined informatively as well. One motivation for the back conversion is a roundtrip workflow like: 1) conversion to NIF 2) in NIF representation detection of named entities using NLP tools 3) back conversion to HTML and generation of <a href="#textanalysis" shap="rect">Text Analysis</a> markup. The outcome are HTML documents with linked information, see <a href="#EX-text-analysis-html5-local-1" shape="rect">Example 52</a>.</p></div><div class="div3"> +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="mapping-XLIFF" id="mapping-XLIFF" shape="rect"/>2.7.2 ITS and XLIFF</h4><p>The XML Localization Interchange File Format <a title="XLIFF Version 1.2" href="#xliff1.2" shape="rect">[XLIFF 1.2]</a> is an OASIS standard that enables translatable source text and its translation to be passed between different tools within localization and translation workflows. <a title="XLIFF Version 2.0" href="#xliff2.0" shape="rect">[XLIFF 2.0]</a> is the successor of <a title="XLIFF Version 1.2" href="#xliff1.2" shape="rect">[XLIFF 1.2]</a> and under development. XLIFF has been widely implemented in various translation management systems, computer aided translation tools and in utilities for extracting translatable content from source documents and merging back the content in the target language.</p><p>The mapping between ITS and XLIFF herefore unpins several important ITS 2.0 usage scenarios <a title="Metadata for the Multilingual Web - Usage Scenarios and Implementations " href="#mlw-metadata-us-impl" shape="rect">[MLW US IMPL]</a>. These usage scenarios involve:</p><ul><li><p>the extraction of ITS metadata from a source language file into XLIFF</p></li><li><p>the addition of ITS metadata into an XLIFF file by translation tools</p></li><li><p>the mapping of ITS metadata in an XLIFF file into ITS metadata in the resulting target language files.</p></li></ul><p>ITS 2.0 has no normative dependency on XLIFF, however a <a href="http://www.w3.org/International/its/wiki/XLIFF_Mapping" shape="rect">non-normative definition of how to represent ITS 2.0 data categories in XLIFF 1.2 or XLIFF 2.0</a> is being defined within the <a href="http://www.w3.org/International/its/ig/" shape="rect">Internationalization Tag Set Interest Group</a>.</p></div></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="implementing-its20" id="implementing-its20" shape="rect"/>2.8 ITS 2.0 Implementations and Conformance</h3><p>What does it mean to implement ITS 2.0? This specification provides several conformance clauses as the normative answer (see <a class="section-ref" href="#conformance" shape="rect">Section 4: Conformance</a>). The clauses target different types of implementers:</p><ul><li><p>Conformance clauses in <a class="section-ref" href="#conformance-product-schema" shape="rect">Section 4.1: Conformance Type 1: ITS Markup Declarations</a> tell markup vocabulary developers how to add ITS 2.0 markup declarations to their schemas.</p></li><li><p>Conformance clauses in <a class="section-ref" href="#conformance-product-processing-expectations" shape="rect">Section 4.2: Conformance Type 2: The Processing Expectations for ITS arkup</a> tell implementers how to process XML content according to ITS 2.0 data categories.</p></li><li><p>Conformance clauses in <a class="section-ref" href="#conformance-product-html-processing-expectations" shape="rect">Section 4.3: Conformance Type 3: Processing Expectations for ITS Markup in HTML</a> tell implementers how to process <a title="HTML5" href="#html5" shape="rect">[HTML5]</a> content.</p></li><li><p>Conformance clauses in <a class="section-ref" href="#conformance-product-html5-its" shape="rect">Section 4.4: Conformance Type 4: Markup conformance for HTML5+ITS documents</a> tell implementers how ITS 2.0 markup is integrated into <a title="HTML5" href="#html5" shape="rect">[HTML5]</a>.</p></li></ul><p>The conformance clauses in <a class="section-ref" href="#conformance-product-processing-expectations" shape="rect">Section 4.2: Conformance Type 2: The Processing Expectations for ITS Markup</a> and <a class="section-ref" href="#conformance-product-html-processing-expectations" shape="rect">ection 4.3: Conformance Type 3: Processing Expectations for ITS Markup in HTML</a> clarify how information needs to be made available for given pieces of markup when processing a dedicated ITS 2.0 data category. To allow for flexibility, an implementation can choose whether it wants to support only ITS 2.0 global or local information, or XML or HTML content. These choices are reflected in separate conformance clauses and also in the <a href="https://github.com/finnle/ITS-2.0-Testsuite/" shape="rect">ITS 2.0 test suite</a>.</p><p>ITS 2.0 processing expectations only define which information needs to be made available. They do not define how that information actually is to be used. This is due to the fact that there is a wide variety of usage scenarios for ITS 2.0, and a wide variety of tools for working with ITS 2.0 is possible. Each of these tools may have its own way of using ITS 2.0 data categories (see <a title="Metadata for the Multilingual Web - Usage Scenarios and Implementations " href="#mlw-metadat-us-impl" shape="rect">[MLW US IMPL]</a> for more information).</p></div></div><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="notation-terminology" id="notation-terminology" shape="rect"/>3 Notation and Terminology</h2><p> <em>This section is normative.</em> @@ -563,7 +563,7 @@ and localization of XML schemas and documents.] The concept of a data category is independent of its implementation in an XML and HTML environment (e.g., using an element or attribute).</p><p>For each data category, ITS distinguishes between the following:</p><ul><li><p>the prose description, see <a class="section-ref" href="#datacategory-description" shape="rect">Section 8: Description of Data Categories</a></p></li><li><p>schema language-independent formalization, see the "implementation" subsections in - <a class="section-ref" href="#datacategory-description" shape="rect">Section 8: Description of Data Categories</a></p></li><li><p>schema language-specific implementations, see <a class="section-ref" href="#its-schemas" shape="rect">Appendix D: Schemas for ITS</a></p></li></ul><div class="exampleOuter"><div class="exampleHeader"><a name="d0e1576" id="d0e1576" shape="rect"/>Example 10: A data category and its implementation</div><p>The <a href="#trans-datacat" shape="rect">Translate</a> data category conveys information as + <a class="section-ref" href="#datacategory-description" shape="rect">Section 8: Description of Data Categories</a></p></li><li><p>schema language-specific implementations, see <a class="section-ref" href="#its-schemas" shape="rect">Appendix D: Schemas for ITS</a></p></li></ul><div class="exampleOuter"><div class="exampleHeader"><a name="d0e1628" id="d0e1628" shape="rect"/>Example 10: A data category and its implementation</div><p>The <a href="#trans-datacat" shape="rect">Translate</a> data category conveys information as to whether a piece of content is intended for translation or not.</p><p>The simplest formalization of this prose description on a schema language-independent level is a <code class="its-attr-markup">translate</code> attribute with two possible values: "yes" and "no". An implementation on a schema language-specific @@ -691,26 +691,17 @@ <em>2-3:</em> If an application claims to process ITS markup implementing the conformance clauses 2-2 and 2-3, it <a href="#rfc-keywords" shape="rect">MUST</a> process that markup with XML documents.</p></li><li><p id="its-conformance-2-4"> - <em>2-4:</em> After processing ITS information - on the basis of conformance clauses <a href="#its-conformance-2-1" shape="rect">2-1</a>, - <a href="#its-conformance-2-2" shape="rect">2-2</a> and <a href="#its-conformance-2-3" shape="rect">2-3</a>, an application <a href="#rfc-keywords" shape="rect">MAY</a> convert an XML document to <a title="" href="#nif-reference" shape="rect">[NIF]</a>, using the - algorithm described in <a class="section-ref" href="#conversion-to-nif" shape="rect">Section 5.7: Conversion to NIF</a>.</p></li><li><p id="its-conformance-2-5"> - <em>2-5:</em> Non-ITS elements and attributes found in ITS elements <a href="#rfc2119" shape="rect">MAY</a> be ignored.</p></li></ul><div class="note"><p class="prefix"><b>Note:</b></p><p id="nif-optional-feature">The conformance clause <a href="#its-conformance-2-4" shape="rect">2-4</a> essentially - means that the conversion to NIF is an optional feature of ITS 2.0, and that the - conversion is independent of whether ITS information has been made available via the - global or local selection mechanisms, see conformance clause <a href="#its-conformance-2-1-1" shape="rect">2-1-1</a>.</p></div><p id="its-processing-conformance-claims">Statements related to this conformance type + <em>2-4:</em> Non-ITS elements and attributes found in ITS elements <a href="#rfc2119" shape="rect">MAY</a> be ignored.</p></li></ul><p id="its-processing-conformance-claims">Statements related to this conformance type <a href="#rfc-keywords" shape="rect">MUST</a> list all <a href="#def-datacat" shape="rect">data categories</a> they implement, and for each <a href="#def-datacat" shape="rect">data category</a>, which type of selection they support, whether they support processing - of XML. If the implementation provides the conversion to NIF (see conformance clause - <a href="#its-conformance-2-4" shape="rect">2-4</a>), this <a href="#rfc-keywords" shape="rect">MUST</a> be stated.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The above conformance clauses are directly reflected in the <a href="https://github.com/finnle/ITS-2.0-Testsuite/" shape="rect">ITS 2.0 test suite</a>. All + of XML.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The above conformance clauses are directly reflected in the <a href="https://github.com/finnle/ITS-2.0-Testsuite/" shape="rect">ITS 2.0 test suite</a>. All tests specify which data category is processed (clause <a href="#its-conformance-2-1" shape="rect">2-1</a>); they are relevant for (clause <a href="#its-conformance-2-1-1" shape="rect">2-1-1</a>) global or local selection, or both; they require the processing of defaults and precedence of selections (clauses <a href="#its-conformance-2-1-2" shape="rect">2-1-2</a> and <a href="#its-conformance-2-1-3" shape="rect">2-1-3</a>); for each data category there are tests with linked rules (<a href="#its-conformance-2-2" shape="rect">2-2</a>); and all types of tests are given for - XML (clause <a href="#its-conformance-2-3" shape="rect">2-3</a>). In addition, there are test cases for conversion to NIF (clause - <a href="#its-conformance-2-4" shape="rect">2-4</a>). Implementers are encouraged to organize their documentation in a similar way, so + XML (clause <a href="#its-conformance-2-3" shape="rect">2-3</a>). Implementers are encouraged to organize their documentation in a similar way, so that users of ITS 2.0 easily can understand the processing capabilities available.</p></div></div><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="conformance-product-html-processing-expectations" id="conformance-product-html-processing-expectations" shape="rect"/>4.3 Conformance Type 3: Processing Expectations for ITS Markup in HTML</h3><p> <em>Description:</em> Processors need to compute the ITS information that pertains @@ -743,13 +734,7 @@ <code>rel</code> attribute with the value <code>its-rules</code>.</p></li><li><p id="its-conformance-3-3"> <em>3-3:</em> If an application claims to process ITS markup implementing the conformance clauses 3-1 and 3-2, it <a href="#rfc-keywords" shape="rect">MUST</a> process that markup within HTML - documents.</p></li><li><p id="its-conformance-3-4"> - <em>3-4:</em> After processing ITS information - on the basis of conformance clauses <a href="#its-conformance-3-1" shape="rect">3-1</a>, - <a href="#its-conformance-3-2" shape="rect">3-2</a> and - <a href="#its-conformance-3-3" shape="rect">3-3</a>, an application <a href="#rfc-keywords" shape="rect">MAY</a> convert an <a title="HTML5" href="#html5" shape="rect">[HTML5]</a> document to - <a title="" href="#nif-reference" shape="rect">[NIF]</a>, using the - algorithm described in <a class="section-ref" href="#conversion-to-nif" shape="rect">Section 5.7: Conversion to NIF</a>.</p></li></ul><p id="its-html-processing-conformance-claims">Statements related to this conformance + documents.</p></li></ul><p id="its-html-processing-conformance-claims">Statements related to this conformance type <a href="#rfc-keywords" shape="rect">MUST</a> list all <a href="#def-datacat" shape="rect">data categories</a> they implement and, for each <a href="#def-datacat" shape="rect">data category</a>, which type of selection they support.</p></div><div class="div2"> @@ -835,9 +820,9 @@ actual query language. The query language is set by <code class="its-attr-markup">queryLanguage</code> attribute on <code class="its-elem-markup">rules</code> element. If <code class="its-attr-markup">queryLanguge</code> is not specified XPath 1.0 is used as a default query language.</p></div><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d0e2542" id="d0e2542" shape="rect"/>5.3.2 XPath 1.0</h4><p>XPath 1.0 is identified by <code>xpath</code> value in <code class="its-attr-markup">queryLanguage</code> +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d0e2530" id="d0e2530" shape="rect"/>5.3.2 XPath 1.0</h4><p>XPath 1.0 is identified by <code>xpath</code> value in <code class="its-attr-markup">queryLanguage</code> attribute.</p><div class="div4"> -<h5><a name="d0e2553" id="d0e2553" shape="rect"/>5.3.2.1 Absolute selector</h5><p>The absolute selector <a href="#rfc-keywords" shape="rect">MUST</a> be an XPath expression +<h5><a name="d0e2541" id="d0e2541" shape="rect"/>5.3.2.1 Absolute selector</h5><p>The absolute selector <a href="#rfc-keywords" shape="rect">MUST</a> be an XPath expression that starts with "<code>/</code>". That is, it <a href="#rfc-keywords" shape="rect">MUST</a> be an <a href="http://www.w3.org/TR/xpath/#NT-AbsoluteLocationPath" shape="rect"> AbsoluteLocationPath</a> or union of <a href="http://www.w3.org/TR/xpath/#NT-AbsoluteLocationPath" shape="rect"> AbsoluteLocationPath</a>s as described in <a href="#xpath" shape="rect">XPath 1.0</a>. @@ -882,14 +867,14 @@ be used.</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p id="css-selectors-and-attributes">CSS selectors have no ability to point to attributes.</p></div><p>CSS Selectors are identified by the value <code>css</code> in the <code class="its-attr-markup">queryLanguage</code> attribute.</p><div class="div4"> -<h5><a name="d0e2767" id="d0e2767" shape="rect"/>5.3.3.1 Absolute selector</h5><p>An absolute selector <a href="#rfc-keywords" shape="rect">MUST</a> be interpreted as a +<h5><a name="d0e2755" id="d0e2755" shape="rect"/>5.3.3.1 Absolute selector</h5><p>An absolute selector <a href="#rfc-keywords" shape="rect">MUST</a> be interpreted as a selector as defined in <a title="Selectors Level
 3" href="#css3-selectors" shape="rect">[Selectors Level 3]</a>. Both simple selectors and groups of selectors can be used.</p></div><div class="div4"> -<h5><a name="d0e2777" id="d0e2777" shape="rect"/>5.3.3.2 Relative selector</h5><p>A relative selector <a href="#rfc-keywords" shape="rect">MUST</a> be interpreted as a +<h5><a name="d0e2765" id="d0e2765" shape="rect"/>5.3.3.2 Relative selector</h5><p>A relative selector <a href="#rfc-keywords" shape="rect">MUST</a> be interpreted as a selector as defined in <a title="Selectors Level
 3" href="#css3-selectors" shape="rect">[Selectors Level 3]</a>. A selector is not evaluated against the complete document tree but only against subtrees rooted at nodes selected by the selector in the <code class="its-attr-markup">selector</code> attribute.</p></div></div><div class="div3"> -<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d0e2790" id="d0e2790" shape="rect"/>5.3.4 Additional query languages</h4><p>ITS processors <a href="#rfc-keywords" shape="rect">MAY</a> support additional query +<h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="d0e2778" id="d0e2778" shape="rect"/>5.3.4 Additional query languages</h4><p>ITS processors <a href="#rfc-keywords" shape="rect">MAY</a> support additional query languages. For each additional query language the processor <a href="#rfc-keywords" shape="rect">MUST</a> define:</p><ul><li><p>the identifier of the query language used in <code class="its-attr-markup">queryLanguage</code>;</p></li><li><p>rules for evaluating an absolute selector to a collection of nodes;</p></li><li><p>rules for evaluating a relative selector to a collection of nodes.</p></li></ul><p>Because future versions of this specification are likely to define additional query languages, the following query language identifiers are reserved: <code>xpath</code>, <code>css</code>, <code>xpath2</code>, <code>xpath3</code>, <code>xquery</code>, @@ -1062,114 +1047,7 @@ attribute, as shown in <a href="#EX-link-external-rules-1" shape="rect">Example 16</a></p></li></ul></li><li><p>By associating the rules and the document through a tool-specific mechanism. For example, in the case of a command-line tool by providing the paths of both the XML document to process and its corresponding external rules file.</p></li></ul></div><div class="div2"> -<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="conversion-to-nif" id="conversion-to-nif" shape="rect"/>5.7 Conversion to NIF</h3><p>This section defines an algorithm to convert XML or HTML documents (or their DOM - representations) that contain ITS metadata to the RDF format based on <a title="" href="#nif-reference" shape="rect">[NIF]</a>. The conversion results in RDF triples.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can - produce a lot of "<span class="quote">phantom</span>" predicates from excessive whitespace, which 1) - increases the size of the intermediate mapping and 2) extracts this whitespace as - text, and therefore might decrease NLP performance. It is strongly recommended to - normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom - predicates. A normalized example is given below. Since the whitespace normalization - algorithm itself is format dependent (for example, it differs for HTML compared to - general XML), no normative algorithm for whitespace normalization is given as part of - this specification.</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p id="its-rdf-ontology-status">The output of the algorithm shown below uses the ITS RDF ontology <a title="ITS RDF Ontology" href="#its-rdf-ontology" shape="rect">[ITS RDF]</a> and its namespace<br clear="none"/><a href="http://www.w3.org/2005/11/its/rdf#" shape="rect">http://www.w3.org/2005/11/its/rdf#</a> - <br clear="none"/>This ontology is not a normative part of the ITS 2.0 specification and is being discussed in the <a href="http://www.w3.org/International/its/wiki/ITS-RDF_mapping" shape="rect">ITS Interest Group</a>.</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-HTML-whitespace-normalization" id="EX-HTML-whitespace-normalization" shape="rect"/>Example 22: Example (see <a href="examples/html5/EX-HTML-whitespace-normalization.html" shape="rect">source code</a>) of an HTML document with whitespace character normalization as preparation for the conversion to NIF</div><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><html></strong><strong class="hl-tag" style="color: #000096"><body></strong><strong class="hl-tag" style="color: #000096"><h2</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"yes"</span><strong class="hl-tag" style=color: #000096">></strong>Welcome to <strong class="hl-tag" style="color: #000096"><span</strong> - <span class="hl-attribute" style="color: #F5844C">its-ta-ident-ref</span>=<span class="hl-value" style="color: #993300">"http://dbpedia.org/resource/Dublin"</span> <span class="hl-attribute" style="color: #F5844C">its-within-text</span>=<span class="hl-value" style="color: #993300">"yes"</span> - <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">></strong>Dublin<strong class="hl-tag" style="color: #000096"></span></strong> in <strong class="hl-tag" style="color: #000096"><b</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span> <span class="hl-attribute" style="color: #F5844C">its-within-text</span>=<span class="hl-value" style="color: #993300">"yes"</span><strong class="hl-tag" style="color: #000096">></strong>Ireland<strong class="hl-tag" style="color: #000096"></b></strong>!<strong class="hl-tag" style="color: #000096"></h2></strong><strong class="hl-tag" style="color: #000096"></body></strong><strong class="hl-tag" style="color: #000096"></html></strong></pre></div></div><p id="its2nif-algorithm">The conversion algorithm to generate NIF consists of seven - steps:</p><ul><li><p id="its2nif-algorithm-step1">STEP 1: Get an ordered list of all text nodes - of the document.</p></li><li><p id="its2nif-algorithm-step2">STEP 2: Generate an XPath expression for each non-empty text node of all leaf elements and memorize them.</p></li><li><p id="its2nif-algorithm-step3">STEP 3: Get the text for each text node and make a tuple with the corresponding XPath expression (X,T). Since the text nodes have a certain order we - now have a list of ordered tuples ((x0,t0), (x1,t1), ..., (xn,tn)).</p></li><li><p id="its2nif-algorithm-step4">STEP 4 (optional): Serialize as XML or as RDF. - The list with the XPath-to-text mapping can also be kept in memory. Part of a - serialization example is given below. The upper part is in RDF Turtle Syntax while the lower part - is in XML (the <code>mappings</code> element).</p></li></ul><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve"># Turtle example: -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . -@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -<http://example.com/exampledoc.html#char=b0,e0> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x0)> . -<http://example.com/exampledoc.html#char=b1,e1> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x1)> . -# ... -<http://example.com/exampledoc.html#char=bn,en> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(xn)> . -<!-- XML Example --> -<mappings> - <mapping x="xpath(x0)" b="b0" e="e0" /> - <mapping x="xpath(x1)" b="b1" e="e1" /> - <!-- ... --> - <mapping x="xpath(xn)" b="bn" e="en" /> -</mappings> -</pre></div></div><p>where</p><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve">b0 = 0 -e0 = b0 + (Number of characters of t0) -b1 = e0 -e1 = b1 + (Number of characters of t1) -... -bn = e(n-1) -en = bn + (Number of characters of tn) -</pre></div></div><p>Example (continued)</p><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve"># Turtle example: -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . -@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -# "Welcome to " -<http://example.com/exampledoc.html#char=0,11> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[1])>. -# "Dublin" -<http://example.com/exampledoc.html#char=11,17> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/span[1]/text()[1])>. -# " in " -<http://example.com/exampledoc.html#char=17,21> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[2])> . -# "Ireland" -<http://example.com/exampledoc.html#char=21,28> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/b[1]/text()[1])> . -# "!" -<http://example.com/exampledoc.html#char=28,29> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[3])> . -# "Welcome to Dublin Ireland!" -<http://example.com/exampledoc.html#char=0,29> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text())> . -<!-- XML Example --> -<mappings> - <mapping x="xpath(/html/body[1]/h2[1]/text()[1])" b="0" e="11" /> - <mapping x="xpath(/html/body[1]/h2[1]/span[1]/text()[1])" b="11" e="17" /> - <mapping x="xpath(/html/body[1]/h2[1]/text()[2])" b="17" e="21" /> - <mapping x="xpath(/html/body[1]/h2[1]/b[1]/text()[1])" b="21" e="28" /> - <mapping x="xpath(/html/body[1]/h2[1]/text()[3])" b="28" e="29" /> - <mapping x="xpath(/html/body[1]/h2[1])" b="0" e="29" /> -</mappings></pre></div></div><ul><li><p id="its2nif-algorithm-step5">STEP 5: Create a context URI and attach the - whole concatenated text <code>$(t0+t1+t2+...+tn)</code> of the document as reference.</p></li><li><p id="its2nif-algorithm-step6">STEP 6: Attach any ITS metadata annotations from the XML/HTML/DOM input to the respective NIF URIs.</p></li><li><p id="its2nif-algorithm-step7">STEP 7: Omit all URIs that do not carry annotations (to avoid - bloating the data).</p></li></ul><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve">@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> -<http://example.com/exampledoc.html#char=0,29> - rdf:type nif:Context ; - rdf:type nif:RFC5147String ; -# concatenate the whole text - nif:isString "$(t0+t1+t2+...+tn)" ; - nif:beginIndex "0" ; - nif:endIndex "29" ; - itsrdf:translate "yes"; - nif:sourceUrl <http://example.com/exampledoc.html> . -<http://example.com/exampledoc.html#char=11,17> - rdf:type nif:RFC5147String ; - nif:beginIndex "11" ; - nif:endIndex "17" ; - itsrdf:translate "no"; - itsrdf:taIdentRef <http://dbpedia.org/resource/Dublin> ; - nif:referenceContext <http://example.com/exampledoc.html#char=0,29> . -<http://example.com/exampledoc.html#char=21,28> - rdf:type nif:RFC5147String ; - nif:beginIndex "21" ; - nif:endIndex "28" ; - itsrdf:translate "no"; - nif:referenceContext <http://example.com/exampledoc.html#char=0,29> . -</pre></div></div><p>A complete sample output in RDF/XML format after step 7, given the input document <a href="#EX-HTML-whitespace-normalization" shape="rect">Example 22</a>, is available at <a href="examples/nif/EX-nif-conversion-output.xml" shape="rect">examples/nif/EX-nif-conversion-output.xml</a>.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The conversion to NIF is a possible basis for a natural language processing (NLP) application - that creates, for example, named entity annotations. A non-normative algorithm to - integrate these annotations into the original input document is given in <a class="section-ref" href="#nif-backconversion" shape="rect">Appendix F: Conversion NIF2ITS</a>. This algorithm is non-normative - because many decisions depend on the particular NLP application being used.</p></div></div><div class="div2"> -<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="its-tool-annotation" id="its-tool-annotation" shape="rect"/>5.8 ITS Tools Annotation</h3><p>In some cases, it may be important for instances of data categories to be associated +<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="its-tool-annotation" id="its-tool-annotation" shape="rect"/>5.7 ITS Tools Annotation</h3><p>In some cases, it may be important for instances of data categories to be associated with information about the processor that generated them. For example, the score of the <a href="#mtconfidence" shape="rect">MT Confidence</a> data category (provided via the <code class="its-attr-markup">mtConfidence</code> attribute) is meaningful only when the consumer of the @@ -1185,7 +1063,7 @@ generated those data category annotations.</p><div class="note"><p class="prefix"><b>Note:</b></p><ul><li><p id="annotators-ref-usage-scenarios">Three cases of providing tool information can be expected:</p><ol class="depth1"><li><p>information about tools used for creating or modifying the textual content;</p></li><li><p>information about tools that do 1), but also create ITS annotations, see - <a class="section-ref" href="#list-of-elements-and-attributes" shape="rect">Appendix G: List of ITS 2.0 Global Elements and Local Attributes</a>; </p></li><li><p>information about tools that don’t modify or create content, but just + <a class="section-ref" href="#list-of-elements-and-attributes" shape="rect">Appendix H: List of ITS 2.0 Global Elements and Local Attributes</a>; </p></li><li><p>information about tools that don’t modify or create content, but just create ITS annotations.</p></li></ol><p> <code class="its-attr-markup">annotatorsRef</code> is only meant to be used when actual ITS annotation is involved, that is for 2) and 3). To express tool information related @@ -1211,7 +1089,7 @@ children elements) and to the attributes of that element.</p><p>On any given node, the information provided by this mechanism is a space-separated list of the accumulated references found in the <code class="its-attr-markup">annotatorsRef</code> attributes declared in the enclosing elements and sorted by data category identifiers. For each data - category, the IRI part is the one of the inner-most declaration.</p><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-1" id="EX-its-tool-annotation-1" shape="rect"/>Example 23: Accumulation and Overriding of the <code class="its-attr-markup">annotatorsRef</code> Values</div><p>In this example, the text shows the computed tools reference information for the + category, the IRI part is the one of the inner-most declaration.</p><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-1" id="EX-its-tool-annotation-1" shape="rect"/>Example 22: Accumulation and Overriding of the <code class="its-attr-markup">annotatorsRef</code> Values</div><p>In this example, the text shows the computed tools reference information for the given node. Note that the references are ordered alphabetically and that the IRI values are always the ones of the inner-most declaration.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><doc</strong> <span class="hl-attribute" style="color: #F5844C">its:version</span>=<span class="hl-value" style="color: #993300">"2.0"</span> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span> <span class="hl-attribute" style="color: #F5844C">its:annotatorsRef</span>=<span class="hl-value" style="color: #993300">"mt-confidence|MT1"</span><strong class="hl-tag" style="color: #000096"> @@ -1229,7 +1107,7 @@ <strong class="hl-tag" style="color: #000096"><p</strong> <span class="hl-attribute" style="color: #F5844C">its:annotatorsRef</span>=<span class="hl-value" style="color: #993300">"text-analysis|XYZ"</span><strong class="hl-tag" style="color: #000096"> ></strong>This p node: "text-analysis|XYZ mt-confidence|MT1"<strong class="hl-tag" style="color: #000096"></p></strong> <strong class="hl-tag" style="color: #000096"></doc></strong> -</pre></div><p>[Source file: <a href="examples/xml/EX-its-tool-annotation-1.xml" shape="rect">examples/xml/EX-its-tool-annotation-1.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-2" id="EX-its-tool-annotation-2" shape="rect"/>Example 24: Example of ITS Tools Annotation</div><p>The <code class="its-attr-markup">annotatorsRef</code> attribute is used in this XML document to indicate that +</pre></div><p>[Source file: <a href="examples/xml/EX-its-tool-annotation-1.xml" shape="rect">examples/xml/EX-its-tool-annotation-1.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-2" id="EX-its-tool-annotation-2" shape="rect"/>Example 23: Example of ITS Tools Annotation</div><p>The <code class="its-attr-markup">annotatorsRef</code> attribute is used in this XML document to indicate that information about the processor that generated the <code class="its-attr-markup">mtConfidence</code> values for the first two <code>p</code> elements are found in element with <code>id="T1"</code> in the external document tools.xml, while that information for the third @@ -1248,7 +1126,7 @@ <span class="hl-attribute" style="color: #F5844C">its:annotatorsRef</span>=<span class="hl-value" style="color: #993300">"mt-confidence|file:///tools.xml#T2"</span><strong class="hl-tag" style="color: #000096">></strong> Text translated with tool T2<strong class="hl-tag" style="color: #000096"></p></strong> <strong class="hl-tag" style="color: #000096"></doc></strong> -</pre></div><p>[Source file: <a href="examples/xml/EX-its-tool-annotation-2.xml" shape="rect">examples/xml/EX-its-tool-annotation-2.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-html5-1" id="EX-its-tool-annotation-html5-1" shape="rect"/>Example 25: Example of ITS Tool Annotation</div><p>The <code class="its-attr-markup">its-annotators-ref</code> attributes are used in this HTML document to +</pre></div><p>[Source file: <a href="examples/xml/EX-its-tool-annotation-2.xml" shape="rect">examples/xml/EX-its-tool-annotation-2.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-its-tool-annotation-html5-1" id="EX-its-tool-annotation-html5-1" shape="rect"/>Example 24: Example of ITS Tool Annotation</div><p>The <code class="its-attr-markup">its-annotators-ref</code> attributes are used in this HTML document to indicate that the <a href="#mtconfidence" shape="rect">MT Confidence</a> annotation on the first two <code>span</code> elements come from one MT (French to English) engine, while the annotation on the third comes from another (Italian to English) engine. Both @@ -1284,8 +1162,8 @@ the following rules:</p><ol class="depth1"><li><p>The attribute name is prefixed with <code>its-</code></p></li><li><p>Each uppercase letter in the attribute name is replaced by <code>-</code> (U+002D) followed by a lowercase variant of the letter.</p></li></ol><p> </p><p> - <a href="#EX-within-text-local-1" shape="rect">Example 49</a> demonstrates the <a href="#elements-within-text" shape="rect">Elements Within Text</a> data category with the local - XML attribute <code class="its-attr-markup">withinText</code>. <a href="#EX-within-text-local-html5-1" shape="rect">Example 50</a> demonstrates the counterpart in HTML, i.e., + <a href="#EX-within-text-local-1" shape="rect">Example 48</a> demonstrates the <a href="#elements-within-text" shape="rect">Elements Within Text</a> data category with the local + XML attribute <code class="its-attr-markup">withinText</code>. <a href="#EX-within-text-local-html5-1" shape="rect">Example 49</a> demonstrates the counterpart in HTML, i.e., the local attribute <code class="its-attr-markup">its-within-text</code>.</p><p>Values of attributes, which corresponds to data categories with a predefined set of values, <a href="#rfc2119" shape="rect">MUST</a> be matched ASCII-case-insensitively. </p><div class="note"><p class="prefix"><b>Note:</b></p><p>Case of attribute names is also irrelevant given the nature of HTML syntax. So in HTML the <a href="#terminology" shape="rect">terminology data category</a> can be stored as <code class="its-attr-markup">its-term</code>, <code>ITS-TERM</code>, <code>its-Term</code> etc. All of those @@ -1334,7 +1212,7 @@ <em>This section is normative.</em> </p><p>XHTML documents aimed at public consumption by Web browsers, including HTML5 documents in XHTML syntax, <a href="#rfc2119" shape="rect">SHOULD</a> use the syntax described in <a class="section-ref" href="#html5-markup" shape="rect">Section 6: Using ITS Markup in HTML</a> in order to adhere to <a href="http://www.w3.org/TR/html-design-principles/#dom-consistency" shape="rect">DOM Consistency - HTML Design Principle</a>.</p><div class="exampleOuter"><div class="exampleHeader"><a name="EX-xhtml-markup-1" id="EX-xhtml-markup-1" shape="rect"/>Example 26: Using ITS 2.0 markup in XHTML</div><p>This example illustrates the use of ITS 2.0 local markup in XHTML.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: blue"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" + HTML Design Principle</a>.</p><div class="exampleOuter"><div class="exampleHeader"><a name="EX-xhtml-markup-1" id="EX-xhtml-markup-1" shape="rect"/>Example 25: Using ITS 2.0 markup in XHTML</div><p>This example illustrates the use of ITS 2.0 local markup in XHTML.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: blue"><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"></strong> <strong class="hl-tag" style="color: #000096"><html</strong> <span class="hl-attribute" style="color: #F5844C">xmlns</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/1999/xhtml"</span> <span class="hl-attribute" style="color: #F5844C">xml:lang</span>=<span class="hl-value" style="color: #993300">"en"</span><strong class="hl-tag" style="color: #000096">></strong> <strong class="hl-tag" style="color: #000096"><head></strong> @@ -1358,7 +1236,7 @@ </p><div class="div2"> <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="datacategories-defaults-etc" id="datacategories-defaults-etc" shape="rect"/>8.1 Position, Defaults, Inheritance, and Overriding of Data Categories</h3><p>The following table summarizes for each data category which selection, default value, and inheritance and overriding behavior apply. It also provides data category - identifiers used in <a class="section-ref" href="#its-tool-annotation" shape="rect">Section 5.8: ITS Tools Annotation</a>:</p><ul><li><p id="def-default-values"> + identifiers used in <a class="section-ref" href="#its-tool-annotation" shape="rect">Section 5.7: ITS Tools Annotation</a>:</p><ul><li><p id="def-default-values"> <em>Default values</em> apply if both local and global selection are absent. The default value for the <a href="#trans-datacat" shape="rect">Translate</a> data category, for example, mandates that elements are translatable, and attributes @@ -1382,7 +1260,7 @@ via <a href="#idvalue" shape="rect">ID Value</a> pertains only to the <code>p</code> element. It cannot be used to identify nested elements or attributes.</p></li><li><p>Using <a href="#target-pointer" shape="rect">target pointer</a>, selected <code>source</code> elements have the ITS information that their translation is - available in a <code>target</code> element; see <a href="#EX-target-pointer-global-1" shape="rect">Example 66</a>. This information does not + available in a <code>target</code> element; see <a href="#EX-target-pointer-global-1" shape="rect">Example 65</a>. This information does not inherit to child elements of <code>target pointer</code>. E.g., the translation of a <code>span</code> element nested in <code>source</code> is not available in a specific <code>target</code> element. Nevertheless, an application is free to use @@ -1474,7 +1352,7 @@ </td></tr><tr><td rowspan="1" colspan="1"> <a href="#storagesize" shape="rect">Storage Size</a> (<code>storage-size</code>) </td><td rowspan="1" colspan="1">Yes</td><td rowspan="1" colspan="1">Yes</td><td rowspan="1" colspan="1">Yes</td><td rowspan="1" colspan="1">Yes</td><td rowspan="1" colspan="1">None</td><td rowspan="1" colspan="1">None</td><td rowspan="1" colspan="1"> <a href="#EX-storageSize-local-1" shape="rect">local</a>, <a href="#EX-storageSize-global-1" shape="rect">global</a> - </td></tr></tbody></table><div class="exampleOuter"><div class="exampleHeader"><a name="EX-datacat-behavior-1" id="EX-datacat-behavior-1" shape="rect"/>Example 27: Defaults, inheritance and overriding behavior of data categories</div><p>In this example, the content of all the <code>data</code> elements is translatable and none of the attributes are translatable, because the default for the <a href="#trans-datacat" shape="rect">Translate</a> data category in elements is "yes" and in attributes is "no", and neither of their values are overridden at all. The first <code class="its-elem-markup">translateRule</code> is overridden by the local <code>its:translate="no"</code> attribute. The content of <code>revision</code>, <code>profile</code>, <code>reviser</code> and <code>locNote</code> elements are not translatable. This is because the default is overridden by the same <code>its:translate="no"</code> that these elements inherit from the local ITS markup in the <code>prolog</code> elemen. The exception is the <code>field</code> element where the second <code class="its-elem-markup">translateRule</code> takes precedence over the inherited value. The last <code class="its-elem-markup">translateRule</code> indicates that the content of <code>type</code> is not translatable because the global rule takes precedence over the default value.</p><p>The localization note for the two first <code>data</code> elements is the text defined globally with the <code class="its-elem-markup">locNoteRule</code> element. This note is overridden for the last <code>data</code> element by the local <code class="its-attr-markup">locNote</code> attribute.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><Res</strong> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span> <span class="hl-attribute" style="color: #F5844C">its:version</span>=<span class="hl-value" styl="color: #993300">"2.0"</span><strong class="hl-tag" style="color: #000096">></strong> + </td></tr></tbody></table><div class="exampleOuter"><div class="exampleHeader"><a name="EX-datacat-behavior-1" id="EX-datacat-behavior-1" shape="rect"/>Example 26: Defaults, inheritance and overriding behavior of data categories</div><p>In this example, the content of all the <code>data</code> elements is translatable and none of the attributes are translatable, because the default for the <a href="#trans-datacat" shape="rect">Translate</a> data category in elements is "yes" and in attributes is "no", and neither of their values are overridden at all. The first <code class="its-elem-markup">translateRule</code> is overridden by the local <code>its:translate="no"</code> attribute. The content of <code>revision</code>, <code>profile</code>, <code>reviser</code> and <code>locNote</code> elements are not translatable. This is because the default is overridden by the same <code>its:translate="no"</code> that these elements inherit from the local ITS markup in the <code>prolog</code> elemen. The exception is the <code>field</code> element where the second <code class="its-elem-markup">translateRule</code> takes precedence over the inherited value. The last <code class="its-elem-markup">translateRule</code> indicates that the content of <code>type</code> is not translatable because the global rule takes precedence over the default value.</p><p>The localization note for the two first <code>data</code> elements is the text defined globally with the <code class="its-elem-markup">locNoteRule</code> element. This note is overridden for the last <code>data</code> element by the local <code class="its-attr-markup">locNote</code> attribute.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><Res</strong> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span> <span class="hl-attribute" style="color: #F5844C">its:version</span>=<span class="hl-value" styl="color: #993300">"2.0"</span><strong class="hl-tag" style="color: #000096">></strong> <strong class="hl-tag" style="color: #000096"><prolog</strong> <span class="hl-attribute" style="color: #F5844C">its:translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">></strong> <strong class="hl-tag" style="color: #000096"><revision></strong>Sep-07-2006<strong class="hl-tag" style="color: #000096"></revision></strong> <strong class="hl-tag" style="color: #000096"><profile></strong> @@ -1523,7 +1401,7 @@ <code><its:translateRule selector=""//h:img" translate="yes"/></code> will set the <code>img</code> element and its translatable attributes like <code>alt</code> to "yes".</p></div><p id="translate-global">GLOBAL: The <code class="its-elem-markup">translateRule</code> element contains the following:</p><ul><li><p>A required <code class="its-attr-markup">selector</code> attribute. It contains an <a href="#selectors" shape="rect">absolute selector</a> that selects the nodes to which this rule applies.</p></li><li><p>A required <code class="its-attr-markup">translate</code> attribute with the value - "yes" or "no".</p></li></ul><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-selector-1" id="EX-translate-selector-1" shape="rect"/>Example 28: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed + "yes" or "no".</p></li></ul><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-selector-1" id="EX-translate-selector-1" shape="rect"/>Example 27: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed globally</div><p>The <code class="its-elem-markup">translateRule</code> element specifies that the elements <code>code</code> is not to be translated.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><its:rules</strong> <span class="hl-attribute" style="color: #F5844C">version</span>=<span class="hl-value" style="color: #993300">"2.0"</span> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span><strong class="hl-tag" style="color: #000096">></strong> <strong class="hl-tag" style="color: #000096"><its:translateRule</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span> <span class="hl-attribute" style="color: #F5844C">selector</span>=<span class="hl-value" style="color: #993300">"//code"</span><strong class="hl-tag" style="color: #000096">/></strong> @@ -1535,13 +1413,13 @@ data category settings of attributes using local markup. This limitation is consistent with the advised practice of not using translatable attributes. If attributes need to be translatable, then - this has to be declared globally. Note that this restriction does not apply to <a href="#html5-translate-handling" shape="rect">HTML5</a>.</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-selector-2" id="EX-translate-selector-2" shape="rect"/>Example 29: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed + this has to be declared globally. Note that this restriction does not apply to <a href="#html5-translate-handling" shape="rect">HTML5</a>.</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-selector-2" id="EX-translate-selector-2" shape="rect"/>Example 28: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed locally</div><p>The local <code>its:translate="no"</code> specifies that the content of <code>panelmsg</code> is not to be translated.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096"><messages</strong> <span class="hl-attribute" style="color: #F5844C">its:version</span>=<span class="hl-value" style="color: #993300">"2.0"</span> <span class="hl-attribute" style="color: #F5844C">xmlns:its</span>=<span class="hl-value" style="color: #993300">"http://www.w3.org/2005/11/its"</span><strong class="hl-tag" style="color: #000096">></strong> <strong class="hl-tag" style="color: #000096"><msg</strong> <span class="hl-attribute" style="color: #F5844C">num</span>=<span class="hl-value" style="color: #993300">"123"</span><strong class="hl-tag" style="color: #000096">></strong>Click Resume Button on Status Display or <strong class="hl-tag" style="color: #000096"><panelmsg</strong> <span class="hl-attribute" style="color: #F5844C">its:translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096"> ></strong>CONTINUE<strong class="hl-tag" style="color: #000096"></panelmsg></strong> Button on printer panel<strong class="hl-tag" style="color: #000096"></msg></strong> <strong class="hl-tag" style="color: #000096"></messages></strong> -</pre></div><p>[Source file: <a href="examples/xml/EX-translate-selector-2.xml" shape="rect">examples/xml/EX-translate-selector-2.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-html5" id="EX-translate-html5" shape="rect"/>Example 30: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed locally +</pre></div><p>[Source file: <a href="examples/xml/EX-translate-selector-2.xml" shape="rect">examples/xml/EX-translate-selector-2.xml</a>]</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-translate-html5" id="EX-translate-html5" shape="rect"/>Example 29: The <a href="#trans-datacat" shape="rect">Translate</a> data category expressed locally in HTML</div><p>The local <code>translate="no"</code> attribute specifies that the content of <code>span</code> is not to be translated.</p><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: blue"><!DOCTYPE html></strong> <strong class="hl-tag" style="color: #000096"><html></strong> @@ -1578,7 +1456,7 @@ [895 lines skipped] --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2013/07/30 02:42:24 1.498 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2013/08/16 15:57:45 1.499 @@ -6,20 +6,20 @@ <header xmlns="http://example.com/xmlspec"> <title>Internationalization Tag Set (ITS) Version 2.0</title> <w3c-designation>ITS20</w3c-designation> - <w3c-doctype>W3C Working Draft</w3c-doctype> + <w3c-doctype>W3C Last Call Working Draft</w3c-doctype> <pubdate> - <day>@@</day> - <month>@@</month> + <day>20</day> + <month>August</month> <year>2013</year> </pubdate> <publoc> - <loc href="http://www.w3.org/TR/2013/WD-its20-2013@@@@/"> - http://www.w3.org/TR/2013/WD-its20-2013@@@@/</loc> + <loc href="http://www.w3.org/TR/2013/WD-its20-20130820/"> + http://www.w3.org/TR/2013/WD-its20-20130820/</loc> </publoc> <altlocs> <loc href="its20.odd">ODD/XML document</loc> <loc href="itstagset20.zip">self-contained zipped archive</loc> - <loc href="diffs/diff-wd20130730-wd20130521.html">XHTML Diff markup to previous publication + <loc href="diffs/diff-wd20130820-wd20130521.html">XHTML Diff markup to previous publication 2013-05-21</loc> </altlocs> <prevlocs> @@ -92,6 +92,18 @@ Web content. ITS 2.0 focuses on HTML, XML-based formats in general, and can leverage processing based on the XML Localization Interchange File Format (XLIFF), as well as the Natural Language Processing Interchange Format (NIF).</p> + + <p>This document was published by the <loc href="http://www.w3.org/International/multilingualweb/lt/">MultilingualWeb-LT Working Group</loc> as a Last Call Working Draft. The Last Call period ends 10 September 2013. The publication reflects changes made since the previous Last Call publication 21 May 2013. The Working Group expects to advance this document to Recommendation status (see <loc href="http://www.w3.org/2004/02/Process-20040205/tr.html#maturity-levels">W3C document maturity levels</loc>).</p> + + <p>All <loc href="http://www.w3.org/International/multilingualweb/lt/drafts/its20/disposition-of-comments-2nd-last-call.html">last call issues</loc> in the normative sections (from <loc href="#notation-terminology">Section 3: Notation and Terminology</loc> to <loc href="#datacategory-description">Section 8: Description of Data Categories</loc> and <loc href="#normative-references">Appendix A: References</loc> to <loc href="#its-schemas">Appendix D: Schemas for ITS</loc>) have been resolved. As announced in the <loc href="http://www.w3.org/TR/2013/WD-its20-20130521/#status">previous draft</loc>, the other, non-normative sections have been updated with explanatory material. The Working Group encourages feedback until 10 September 2013.</p> + + <p>One substantive change was made that requires a third last call draft: the <loc href="#conversion-to-nif">conversion to NIF</loc> was categorized as a non-normative feature (this was a <loc href="http://www.w3.org/TR/2013/WD-its20-20130521/#conversion-to-nif">normative feature in the previous draft</loc>). The working group encourages especially feedback on this change from the RDF community.</p> + + <p>Since the ITS 2.0 test suite already has a high coverage for normative features of this specification, the Working Group expects to advance the specification directly to Proposed Recommendation status.</p> + + <p>To give feedback send your comments to <loc href="mailto:public-multilingualweb-lt-comments@w3.org">public-multilingualweb-lt-comments@w3.org</loc>. Use "Comment on ITS 2.0 specification WD" in the subject line of your email. The <loc href="http://lists.w3.org/Archives/Public/public-multilingualweb-lt-comments/">archives for this list</loc> are publicly available. See also <loc href="https://www.w3.org/International/multilingualweb/lt/track/issues/">issues discussed within the Working Group</loc> and the <loc href="#changelog-since-20130521">list of changes</loc> since the previous publication.</p> + + <p>Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.</p> <p>This document was produced by a group operating under the <loc href="http://www.w3.org/Consortium/Patent-Policy-20040205/">5 February 2004 W3C Patent Policy</loc>. W3C maintains a <loc href="http://www.w3.org/2004/01/pp-impl/53116/status">public list of any patent disclosures</loc> made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#def-essential">Essential Claim(s)</a> must disclose the information in accordance with <a href="http://www.w3.org/Consortium/Patent-Policy-20040205/#sec-Disclosure">section 6 of the W3C Patent Policy</a>. </p> </status> @@ -299,7 +311,7 @@ provenance-related requirement by allowing ITS processors to leave a trace: ITS processors can basically say <q>It is me that generated this bit of information</q>. Another example are the <ptr target="#nif-reference" - type="bibref"/> related details of ITS 2.0, which help to couple Natural Language + type="bibref"/> related details of ITS 2.0, which provide a non-normative approach to couple Natural Language Processing with concepts of the Semantic Web.</p> </div> @@ -398,9 +410,9 @@ <item>ITS 2.0 has an <ref target="#its-tool-annotation">ITS Tools Annotation</ref> mechanism to associate processor information with the use of individual data categories. See <ptr target="#traceability" type="specref"/> for details.</item> </list> - <p><emph>Mappings:</emph> ITS 2.0 provides a normative algorithm to convert ITS 2.0 information into <ptr target="#nif-reference" type="bibref"/> and links to guidance about how to relate ITS 2.0 to XLIFF. See <ptr target="#mapping-conversion" type="specref"/> for details.</p> + <p><emph>Mappings:</emph> ITS 2.0 provides a non-normative algorithm to convert ITS 2.0 information into <ptr target="#nif-reference" type="bibref"/> and links to guidance about how to relate ITS 2.0 to XLIFF. See <ptr target="#mapping-conversion" type="specref"/> for details.</p> - <p><emph>Changes to the conformance section</emph>: The <ptr target="#conformance" type="specref"/> tells implementers how to implement ITS. For ITS 2.0, the conformance statements related to Ruby have been removed, and a conformance clause related to processing <ptr type="bibref" target="#nif-reference"/> has been added. For <ptr target="#html5" type="bibref"/>, a dedicated conformance section has been created. Finally, a conformance clause related to Non-ITS elements and attributes has been added.</p> + <p><emph>Changes to the conformance section</emph>: The <ptr target="#conformance" type="specref"/> tells implementers how to implement ITS. For ITS 2.0, the conformance statements related to Ruby have been removed. For <ptr target="#html5" type="bibref"/>, a dedicated conformance section has been created. Finally, a conformance clause related to Non-ITS elements and attributes has been added.</p> </div> @@ -751,15 +763,15 @@ <head>Mapping and conversion</head> <div xml:id="mapping-NIF"><head>ITS and RDF/NIF</head> - <p>ITS 2.0 defines an algorithm to convert XML or HTML documents (or their DOM + <p>ITS 2.0 provides a non-normative algorithm to convert XML or HTML documents (or their DOM representations) that contain ITS metadata to the RDF format based on <ptr target="#nif-reference" type="bibref"/>. NIF is an RDF/OWL-based format that aims at interoperability between Natural Language Processing (NLP) tools, language resources and annotations.</p> <p>The conversion from <ref target="#conversion-to-nif">ITS 2.0 to NIF</ref> results in RDF triples. These triples represent the textual content of the original document as RDF typed information. The ITS annotation is represented as properties of content-related triples and relies on an <ref target="http://www.w3.org/2005/11/its/rdf#">ITS RDF vocabulary</ref>.</p> - <p>The back conversion from <ref target="#nif-backconversion">NIF to ITS 2.0</ref> is defined informatively. One motivation for the back conversion is a roundtrip workflow like: 1) conversion to NIF 2) in NIF representation detection of named entities using NLP tools 3) back conversion to HTML and generation of <ref target="#textanalysis">Text Analysis</ref> markup. The outcome are HTML documents with linked information, see <ptr target="#EX-text-analysis-html5-local-1" type="exref"/>.</p></div> + <p>The back conversion from <ref target="#nif-backconversion">NIF to ITS 2.0</ref> is defined informatively as well. One motivation for the back conversion is a roundtrip workflow like: 1) conversion to NIF 2) in NIF representation detection of named entities using NLP tools 3) back conversion to HTML and generation of <ref target="#textanalysis">Text Analysis</ref> markup. The outcome are HTML documents with linked information, see <ptr target="#EX-text-analysis-html5-local-1" type="exref"/>.</p></div> <div xml:id="mapping-XLIFF"><head>ITS and XLIFF</head> - <p>The XML Localization Interchange File Format <ptr target="#xliff1.2" type="bibref"/> is an OASIS standard that enables translatable source text and its translation to be passed between different tools within localization and translation workflows. <ptr target="#xliff2.0" type="bibref"/> is the successor of <ptr target="#xliff1.2" type="bibref"/> and under development. XLIFF has been widely implemented in various translation management systems, computer aided translation tools and in utilities for extracting translatable content from source documents and merging back the content in the target language..</p> + <p>The XML Localization Interchange File Format <ptr target="#xliff1.2" type="bibref"/> is an OASIS standard that enables translatable source text and its translation to be passed between different tools within localization and translation workflows. <ptr target="#xliff2.0" type="bibref"/> is the successor of <ptr target="#xliff1.2" type="bibref"/> and under development. XLIFF has been widely implemented in various translation management systems, computer aided translation tools and in utilities for extracting translatable content from source documents and merging back the content in the target language.</p> <p>The mapping between ITS and XLIFF therefore unpins several important ITS 2.0 usage scenarios <ptr target="#mlw-metadata-us-impl" type="bibref"/>. These usage scenarios involve:</p> @@ -1046,25 +1058,13 @@ process ITS markup implementing the conformance clauses 2-2 and 2-3, it <ref target="#rfc-keywords">MUST</ref> process that markup with XML documents.</p></item> - <item><p xml:id="its-conformance-2-4"><emph>2-4:</emph> After processing ITS information - on the basis of conformance clauses <ref target="#its-conformance-2-1">2-1</ref>, - <ref target="#its-conformance-2-2">2-2</ref> and <ref target="#its-conformance-2-3">2-3</ref>, an application <ref - target="#rfc-keywords">MAY</ref> convert an XML document to <ptr target="#nif-reference" type="bibref"/>, using the - algorithm described in <ptr target="#conversion-to-nif" type="specref"/>.</p></item> - <item><p xml:id="its-conformance-2-5"><emph>2-5:</emph> Non-ITS elements and attributes found in ITS elements <ref target="#rfc2119">MAY</ref> be ignored.</p></item> + <item><p xml:id="its-conformance-2-4"><emph>2-4:</emph> Non-ITS elements and attributes found in ITS elements <ref target="#rfc2119">MAY</ref> be ignored.</p></item> </list> - <note><p xml:id="nif-optional-feature">The conformance clause <ref target="#its-conformance-2-4">2-4</ref> essentially - means that the conversion to NIF is an optional feature of ITS 2.0, and that the - conversion is independent of whether ITS information has been made available via the - global or local selection mechanisms, see conformance clause <ref - target="#its-conformance-2-1-1">2-1-1</ref>.</p></note> <p xml:id="its-processing-conformance-claims">Statements related to this conformance type <ref target="#rfc-keywords">MUST</ref> list all <ref target="#def-datacat">data categories</ref> they implement, and for each <ref target="#def-datacat">data category</ref>, which type of selection they support, whether they support processing - of XML. If the implementation provides the conversion to NIF (see conformance clause - <ref target="#its-conformance-2-4">2-4</ref>), this <ref target="#rfc-keywords" - >MUST</ref> be stated.</p> + of XML.</p> <note><p>The above conformance clauses are directly reflected in the <ref target="https://github.com/finnle/ITS-2.0-Testsuite/">ITS 2.0 test suite</ref>. All @@ -1073,8 +1073,7 @@ defaults and precedence of selections (clauses <ref target="#its-conformance-2-1-2">2-1-2</ref> and <ref target="#its-conformance-2-1-3">2-1-3</ref>); for each data category there are tests with linked rules (<ref target="#its-conformance-2-2">2-2</ref>); and all types of tests are given for - XML (clause <ref target="#its-conformance-2-3">2-3</ref>). In addition, there are test cases for conversion to NIF (clause - <ref target="#its-conformance-2-4">2-4</ref>). Implementers are encouraged to organize their documentation in a similar way, so + XML (clause <ref target="#its-conformance-2-3">2-3</ref>). Implementers are encouraged to organize their documentation in a similar way, so that users of ITS 2.0 easily can understand the processing capabilities available.</p></note> </div> <div xml:id="conformance-product-html-processing-expectations"> @@ -1134,13 +1133,6 @@ process ITS markup implementing the conformance clauses 3-1 and 3-2, it <ref target="#rfc-keywords">MUST</ref> process that markup within HTML documents.</p></item> - <item><p xml:id="its-conformance-3-4"><emph>3-4:</emph> After processing ITS information - on the basis of conformance clauses <ref target="#its-conformance-3-1">3-1</ref>, - <ref target="#its-conformance-3-2">3-2</ref> and - <ref target="#its-conformance-3-3">3-3</ref>, an application <ref - target="#rfc-keywords">MAY</ref> convert an <ptr target="#html5" type="bibref"/> document to - <ptr target="#nif-reference" type="bibref"/>, using the - algorithm described in <ptr target="#conversion-to-nif" type="specref"/>.</p></item> </list> <p xml:id="its-html-processing-conformance-claims">Statements related to this conformance type <ref target="#rfc-keywords">MUST</ref> list all <ref target="#def-datacat">data @@ -1616,146 +1608,6 @@ </list> </div> - <div xml:id="conversion-to-nif"> - <head>Conversion to NIF</head> - <p>This section defines an algorithm to convert XML or HTML documents (or their DOM - representations) that contain ITS metadata to the RDF format based on <ptr - target="#nif-reference" type="bibref"/>. The conversion results in RDF triples.</p> - <note><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can - produce a lot of <quote>phantom</quote> predicates from excessive whitespace, which 1) - increases the size of the intermediate mapping and 2) extracts this whitespace as - text, and therefore might decrease NLP performance. It is strongly recommended to - normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom - predicates. A normalized example is given below. Since the whitespace normalization - algorithm itself is format dependent (for example, it differs for HTML compared to - general XML), no normative algorithm for whitespace normalization is given as part of - this specification.</p></note> - <note><p xml:id="its-rdf-ontology-status">The output of the algorithm shown below uses the ITS RDF ontology <ptr target="#its-rdf-ontology" type="bibref"/> and its namespace<?br?><ref target="http://www.w3.org/2005/11/its/rdf#">http://www.w3.org/2005/11/its/rdf#</ref><?br?>This ontology is not a normative part of the ITS 2.0 specification and is being discussed in the <ref target="http://www.w3.org/International/its/wiki/ITS-RDF_mapping">ITS Interest Group</ref>.</p></note> - <exemplum xml:id="EX-HTML-whitespace-normalization"> - <head>Example (see <ref target="examples/html5/EX-HTML-whitespace-normalization.html">source code</ref>) of an HTML document with whitespace character normalization as preparation for the conversion to NIF</head> - <eg><![CDATA[<html><body><h2 translate="yes">Welcome to <span - its-ta-ident-ref="http://dbpedia.org/resource/Dublin" its-within-text="yes" - translate="no">Dublin</span> in <b translate="no" its-within-text="yes">Ireland</b>!</h2></body></html>]]></eg> - </exemplum> - <p xml:id="its2nif-algorithm">The conversion algorithm to generate NIF consists of seven - steps:</p> - <list type="unordered"> - <item><p xml:id="its2nif-algorithm-step1">STEP 1: Get an ordered list of all text nodes - of the document.</p></item> - <item><p xml:id="its2nif-algorithm-step2">STEP 2: Generate an XPath expression for each non-empty text node of all leaf elements and memorize them.</p></item> - <item><p xml:id="its2nif-algorithm-step3">STEP 3: Get the text for each text node and make a tuple with the corresponding XPath expression (X,T). Since the text nodes have a certain order we - now have a list of ordered tuples ((x0,t0), (x1,t1), ..., (xn,tn)).</p></item> - <item><p xml:id="its2nif-algorithm-step4">STEP 4 (optional): Serialize as XML or as RDF. - The list with the XPath-to-text mapping can also be kept in memory. Part of a - serialization example is given below. The upper part is in RDF Turtle Syntax while the lower part - is in XML (the <code>mappings</code> element).</p></item> - </list> - <eg rend="text"><![CDATA[# Turtle example: -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . -@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -<http://example.com/exampledoc.html#char=b0,e0> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x0)> . -<http://example.com/exampledoc.html#char=b1,e1> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x1)> . -# ... -<http://example.com/exampledoc.html#char=bn,en> - nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(xn)> . -<!-- XML Example --> -<mappings> - <mapping x="xpath(x0)" b="b0" e="e0" /> - <mapping x="xpath(x1)" b="b1" e="e1" /> - <!-- ... --> - <mapping x="xpath(xn)" b="bn" e="en" /> -</mappings> -]]></eg> - <p>where</p> - <eg rend="text"><![CDATA[b0 = 0 -e0 = b0 + (Number of characters of t0) -b1 = e0 -e1 = b1 + (Number of characters of t1) -... -bn = e(n-1) -en = bn + (Number of characters of tn) -]]></eg> - <p>Example (continued)</p> - <eg rend="text"><![CDATA[# Turtle example: -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . -@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -# "Welcome to " -<http://example.com/exampledoc.html#char=0,11> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[1])>. -# "Dublin" -<http://example.com/exampledoc.html#char=11,17> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/span[1]/text()[1])>. -# " in " -<http://example.com/exampledoc.html#char=17,21> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[2])> . -# "Ireland" -<http://example.com/exampledoc.html#char=21,28> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/b[1]/text()[1])> . -# "!" -<http://example.com/exampledoc.html#char=28,29> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[3])> . -# "Welcome to Dublin Ireland!" -<http://example.com/exampledoc.html#char=0,29> - nif:wasConvertedFrom - <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text())> . -<!-- XML Example --> -<mappings> - <mapping x="xpath(/html/body[1]/h2[1]/text()[1])" b="0" e="11" /> - <mapping x="xpath(/html/body[1]/h2[1]/span[1]/text()[1])" b="11" e="17" /> - <mapping x="xpath(/html/body[1]/h2[1]/text()[2])" b="17" e="21" /> - <mapping x="xpath(/html/body[1]/h2[1]/b[1]/text()[1])" b="21" e="28" /> - <mapping x="xpath(/html/body[1]/h2[1]/text()[3])" b="28" e="29" /> - <mapping x="xpath(/html/body[1]/h2[1])" b="0" e="29" /> -</mappings>]]></eg> - <list type="unordered"> - <item><p xml:id="its2nif-algorithm-step5">STEP 5: Create a context URI and attach the - whole concatenated text <code>$(t0+t1+t2+...+tn)</code> of the document as reference.</p></item> - <item><p xml:id="its2nif-algorithm-step6">STEP 6: Attach any ITS metadata annotations from the XML/HTML/DOM input to the respective NIF URIs.</p></item> - <item><p xml:id="its2nif-algorithm-step7">STEP 7: Omit all URIs that do not carry annotations (to avoid - bloating the data).</p></item> - </list> - <eg rend="text"><![CDATA[@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . -@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> -<http://example.com/exampledoc.html#char=0,29> - rdf:type nif:Context ; - rdf:type nif:RFC5147String ; -# concatenate the whole text - nif:isString "$(t0+t1+t2+...+tn)" ; - nif:beginIndex "0" ; - nif:endIndex "29" ; - itsrdf:translate "yes"; - nif:sourceUrl <http://example.com/exampledoc.html> . -<http://example.com/exampledoc.html#char=11,17> - rdf:type nif:RFC5147String ; - nif:beginIndex "11" ; - nif:endIndex "17" ; - itsrdf:translate "no"; - itsrdf:taIdentRef <http://dbpedia.org/resource/Dublin> ; - nif:referenceContext <http://example.com/exampledoc.html#char=0,29> . -<http://example.com/exampledoc.html#char=21,28> - rdf:type nif:RFC5147String ; - nif:beginIndex "21" ; - nif:endIndex "28" ; - itsrdf:translate "no"; - nif:referenceContext <http://example.com/exampledoc.html#char=0,29> . -]]></eg> - <p>A complete sample output in RDF/XML format after step 7, given the input document <ptr - target="#EX-HTML-whitespace-normalization" type="exref"/>, is available at <ref - target="examples/nif/EX-nif-conversion-output.xml" - >examples/nif/EX-nif-conversion-output.xml</ref>.</p> - <note><p>The conversion to NIF is a possible basis for a natural language processing (NLP) application - that creates, for example, named entity annotations. A non-normative algorithm to - integrate these annotations into the original input document is given in <ptr - target="#nif-backconversion" type="specref"/>. This algorithm is non-normative - because many decisions depend on the particular NLP application being used.</p></note> - </div> <div xml:id="its-tool-annotation"> <head>ITS Tools Annotation</head> <p>In some cases, it may be important for instances of data categories to be associated @@ -4864,7 +4716,6 @@ <ref target="http://www.iana.org/assignments/character-sets">Character Sets</ref> </title> Available at <ref target="http://www.iana.org/assignments/character-sets" >http://www.iana.org/assignments/character-sets</ref>.</bibl> - <bibl xml:id="nif-reference" n="NIF">Hellmann, S. et al. (ed.). <ref target="http://persistence.uni-leipzig.org/nlp2rdf/">NIF Core Ontology Version 1.0</ref>, as of May 2013. Available at http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# under CC-BY 3.0 license. </bibl> <bibl xml:id="qa-framework" n="QAFRAMEWORK">Karl Dubost, Lynne Rosental, Dominique Hazaël-Massieux, Lofton Henderson. <title> <ref target="http://www.w3.org/TR/2005/REC-qaframe-spec-20050817/">QA Framework: @@ -5597,6 +5448,7 @@ Standardization). <title>Translation projects – General guidance</title>. [Geneva]: International Organization for Standardization, 2012.</bibl> <bibl xml:id="its10" n="ITS 1.0">Christian Lieske and Felix Sasaki. <title> <ref target="http://www.w3.org/TR/2007/REC-its-20070403/">Internationalization Tag Set (ITS) Version 1.0</ref></title>. W3C Recommendation 03 April 2007. Available at <ref target="http://www.w3.org/TR/2007/REC-its-20070403/">http://www.w3.org/TR/2007/REC-its-20070403/</ref>. The latest version of <ref target="http://www.w3.org/TR/its/">ITS 1.0</ref> is available at http://www.w3.org/TR/its/.</bibl> + <bibl xml:id="its-rdf-ontology" n="ITS RDF"><title><ref target="http://www.w3.org/2005/11/its/rdf#">ITS RDF Ontology</ref></title>, version May 2013. Available at http://www.w3.org/2005/11/its/rdf# .</bibl> <bibl xml:id="itsreq" n="ITS REQ">Yves Savourel. <title> <ref target="http://www.w3.org/TR/2006/WD-itsreq-20060518/">Internationalization and Localization Markup Requirements</ref> @@ -5620,7 +5472,7 @@ <bibl xml:id="nerd" n="NERD">Named Entity Recognition and Disambiguation ontology (NERD) available at: <ref target="http://nerd.eurecom.fr/ontology" >http://nerd.eurecom.fr/ontology</ref></bibl> - <bibl xml:id="its-rdf-ontology" n="ITS RDF"><title><ref target="http://www.w3.org/2005/11/its/rdf#">ITS RDF Ontology</ref></title>, version May 2013. Available at http://www.w3.org/2005/11/its/rdf# .</bibl> + <bibl xml:id="nif-reference" n="NIF">Hellmann, S. et al. (ed.). <ref target="http://persistence.uni-leipzig.org/nlp2rdf/">NIF Core Ontology Version 1.0</ref>, as of May 2013. Available at http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core# under CC-BY 3.0 license. </bibl> <bibl xml:id="nvdl" n="NVDL">Information technology – Document Schema Definition Languages (DSDL) – Part 4: <title>Namespace-based Validation Dispatching Language (NVDL)</title>. International Organization for Standardization (ISO) ISO/IEC @@ -5703,6 +5555,145 @@ http://www.xulplanet.com/</ref>.</bibl> </listBibl> </div> + <div xml:id="conversion-to-nif" type="inform"> + <head>Conversion to NIF</head> + <p>This section provides an informative algorithm to convert XML or HTML documents (or their DOM + representations) that contain ITS metadata to the RDF format based on <ptr + target="#nif-reference" type="bibref"/>. The conversion results in RDF triples.</p> + <note><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can + produce a lot of <quote>phantom</quote> predicates from excessive whitespace, which 1) + increases the size of the intermediate mapping and 2) extracts this whitespace as + text, and therefore might decrease NLP performance. It is strongly recommended to + normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom + predicates. A normalized example is given below. The whitespace normalization + algorithm itself is format dependent (for example, it differs for HTML compared to + general XML).</p></note> + <note><p xml:id="its-rdf-ontology-status">The output of the algorithm shown below uses the ITS RDF ontology <ptr target="#its-rdf-ontology" type="bibref"/> and its namespace<?br?><ref target="http://www.w3.org/2005/11/its/rdf#">http://www.w3.org/2005/11/its/rdf#</ref><?br?>Like the algorithm, this ontology is not a normative part of the ITS 2.0 specification and is being discussed in the <ref target="http://www.w3.org/International/its/wiki/ITS-RDF_mapping">ITS Interest Group</ref>.</p></note> + <exemplum xml:id="EX-HTML-whitespace-normalization"> + <head>Example (see <ref target="examples/html5/EX-HTML-whitespace-normalization.html">source code</ref>) of an HTML document with whitespace character normalization as preparation for the conversion to NIF</head> + <eg><![CDATA[<html><body><h2 translate="yes">Welcome to <span + its-ta-ident-ref="http://dbpedia.org/resource/Dublin" its-within-text="yes" + translate="no">Dublin</span> in <b translate="no" its-within-text="yes">Ireland</b>!</h2></body></html>]]></eg> + </exemplum> + <p xml:id="its2nif-algorithm">The conversion algorithm to generate NIF consists of seven + steps:</p> + <list type="unordered"> + <item><p xml:id="its2nif-algorithm-step1">STEP 1: Get an ordered list of all text nodes + of the document.</p></item> + <item><p xml:id="its2nif-algorithm-step2">STEP 2: Generate an XPath expression for each non-empty text node of all leaf elements and memorize them.</p></item> + <item><p xml:id="its2nif-algorithm-step3">STEP 3: Get the text for each text node and make a tuple with the corresponding XPath expression (X,T). Since the text nodes have a certain order we + now have a list of ordered tuples ((x0,t0), (x1,t1), ..., (xn,tn)).</p></item> + <item><p xml:id="its2nif-algorithm-step4">STEP 4 (optional): Serialize as XML or as RDF. + The list with the XPath-to-text mapping can also be kept in memory. Part of a + serialization example is given below. The upper part is in RDF Turtle Syntax while the lower part + is in XML (the <code>mappings</code> element).</p></item> + </list> + <eg rend="text"><![CDATA[# Turtle example: +@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . +@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . +<http://example.com/exampledoc.html#char=b0,e0> + nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x0)> . +<http://example.com/exampledoc.html#char=b1,e1> + nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(x1)> . +# ... +<http://example.com/exampledoc.html#char=bn,en> + nif:wasConvertedFrom <http://example.com/exampledoc.html#xpath(xn)> . +<!-- XML Example --> +<mappings> + <mapping x="xpath(x0)" b="b0" e="e0" /> + <mapping x="xpath(x1)" b="b1" e="e1" /> + <!-- ... --> + <mapping x="xpath(xn)" b="bn" e="en" /> +</mappings> +]]></eg> + <p>where</p> + <eg rend="text"><![CDATA[b0 = 0 +e0 = b0 + (Number of characters of t0) +b1 = e0 +e1 = b1 + (Number of characters of t1) +... +bn = e(n-1) +en = bn + (Number of characters of tn) +]]></eg> + <p>Example (continued)</p> + <eg rend="text"><![CDATA[# Turtle example: +@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . +@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> . +# "Welcome to " +<http://example.com/exampledoc.html#char=0,11> + nif:wasConvertedFrom + <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[1])>. +# "Dublin" +<http://example.com/exampledoc.html#char=11,17> + nif:wasConvertedFrom + <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/span[1]/text()[1])>. +# " in " +<http://example.com/exampledoc.html#char=17,21> + nif:wasConvertedFrom + <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[2])> . +# "Ireland" +<http://example.com/exampledoc.html#char=21,28> + nif:wasConvertedFrom + <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/b[1]/text()[1])> . +# "!" +<http://example.com/exampledoc.html#char=28,29> + nif:wasConvertedFrom + <http://example.com/exampledoc.html#xpath(/html/body[1]/h2[1]/text()[3])> . [84 lines skipped]
Received on Friday, 16 August 2013 15:57:51 UTC