its20 CVS commit

Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20
In directory hutz:/tmp/cvs-serv30579

Modified Files:
	its20.html its20.odd 
Log Message:
NIF related edits, see action-284, and change log update

Index: its20.odd
===================================================================
RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd,v
retrieving revision 1.216
retrieving revision 1.217
diff -u -d -r1.216 -r1.217
--- its20.odd	11 Nov 2012 16:43:45 -0000	1.216
+++ its20.odd	12 Nov 2012 21:17:08 -0000	1.217
@@ -1727,9 +1727,9 @@
 					<p>This section defines an algorithm to convert XML or HTML documents (or their DOM representations) that contain ITS metadata to the RDF-based format <ref
 						target="http://nlp2rdf.org/nif-1-0">NIF</ref>. The conversion results in RDF triples that rely on the ITS 2.0 ontology, see tbd.</p>
 					<note type="ed">Add link to ontology once it is done; assure that the examples use the correct base URIs for the ontology.</note>
-					<note><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool and can produce a lot of <quote>phantom</quote> predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2) extracts this whitespace as text. This might decrease NLP performance. It is recommended to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates. A normalized example is given below. The whitespace normalization algorithm itself is format dependend, e.g. it differs for HTML compared to general XML. Hence no normative algorithm for whitespace normalization is given as part of this specification.</p></note>
+					<note><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool and can produce a lot of <quote>phantom</quote> predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2) extracts this whitespace as text. This might decrease NLP performance. It is recommended to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates. A normalized example is given below. The whitespace normalization algorithm itself is format dependent, e.g. it differs for HTML compared to general XML. Hence no normative algorithm for whitespace normalization is given as part of this specification.</p></note>
 					<exemplum xml:id="EX-HTML-whitespace-normalization">
-						<head>Example of an HTML document with whitespace nornalized, as a preparation for conversion to NIF</head>
+						<head>Example of an HTML document with whitespace normalized as preparation for conversion to NIF</head>
 <eg><![CDATA[<html><body><h2 translate="yes">Welcome to <span 
    its-disambig-ident-ref="http://dbpedia.org/resource/Dublin" 
    translate="no">Dublin</span> in <b translate="no">Ireland</b>!</h2></body></html>]]></eg>
@@ -1801,17 +1801,20 @@
 					<eg><![CDATA[@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
 <http://example.com/exampledoc.html#offset_0_29>
     rdf:type             str:Context ;
+    rdf:type             str:OffsetBasedString ;
 # concatenate the whole text
     str:isString         "$(t0+t1+t2+...+tn)" ; 
     itsrdf:translate     "yes"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo> ;
     str:occursIn      <http://example.com/exampledoc.html> .
 <http://example.com/exampledoc.html#offset_11_17> 
     rdf:type              str:String ;
+    rdf:type              str:OffsetBasedString ;
     itsrdf:translate     "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo> ;
     itsrdf:disambigIdentRef  <http://dbpedia.org/resource/Dublin> ;
     str:referenceContext <http://example.com/exampledoc.html#offset_0_29> .
 <http://example.com/exampledoc.html#offset_21_28> 
     rdf:type              str:String ;
+    rdf:type              str:OffsetBasedString ;
     itsrdf:translate     "no"^^<http://www.w3.org/TR/its-2.0/its.xsd#yesOrNo> ;
     str:referenceContext <http://example.com/exampledoc.html#offset_0_29> .
 ]]></eg>
@@ -4297,7 +4300,7 @@
 								target="examples/html5/EX-locQualityIssue-html5-local-2.html"
 								type="html5"/>
 						</exemplum>
-						<note type="ed">TODO for above: Finalize how HTML should work: use its-* attributes for standoff markup or markup inside the script element.</note>
+						<note type="ed">TODO for above (not only as an example, in general): document that for using standoff in HTML5, there needs to be the same id in HTML "script" and in the embedded XML. This is needed also for the other standoff case (provenance).</note>
 					</div>					
 				</div>
 
@@ -5756,11 +5759,13 @@
 					have been made to this document since the <ref
 						target="http://www.w3.org/TR/2012/WD-its20-20121023/">ITS 2.0 Working Draft
 						23 October 2012</ref>.</p>
-				<list type="unordered">
+				<list type="ordered">
 					<item>Clarified usage of <ref target="#domain">Domain</ref> data category in HTML5 in response to <ref target="https://www.w3.org/International/multilingualweb/lt/track/issues/56">issue-56</ref>.</item>
 				  <item>Added the <ref target="#lqissueDefs">enabled information</ref> in <ptr type="specref" target="#lqissue"/>.</item>
 				  <item>Updated the <ref target="#Disambiguation">Disambiguation</ref> data category.</item>
 				  <item>Fine tuned the algorithm to compute the result values of the <ref target="#domain">Domain</ref> data category.</item>
+					<item>Fix on <ptr target="#EX-locQualityIssue-html5-local-2" type="specref"/>: <code>id</code> attribute of <code>script</code> element now the same as of containing XML.</item>
+					<item>NIF example fix - see <ref target="https://www.w3.org/International/multilingualweb/lt/track/actions/284">action-284</ref>.</item>
 				</list> 
 				<p xml:id="changelog-since-20120829">The following log records major changes that
 					have been made to this document since the <ref

Index: its20.html
===================================================================
RCS file: /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html,v
retrieving revision 1.219
retrieving revision 1.220
diff -u -d -r1.219 -r1.220
--- its20.html	12 Nov 2012 20:56:15 -0000	1.219
+++ its20.html	12 Nov 2012 21:17:08 -0000	1.220
@@ -1223,7 +1223,7 @@
 							mechanism. For example, for a command-line tool: providing the paths of
 							both the XML document to process and its corresponding external rules
 							file.</p></li></ul></div><div class="div2">
-<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="conversion-to-nif" id="conversion-to-nif" shape="rect"/>5.7 Conversion to NIF</h3><p>This section defines an algorithm to convert XML or HTML documents (or their DOM representations) that contain ITS metadata to the RDF-based format <a href="http://nlp2rdf.org/nif-1-0" shape="rect">NIF</a>. The conversion results in RDF triples that rely on the ITS 2.0 ontology, see tbd.</p><span class="editor-note">[Ed. note: Add link to ontology once it is done; assure that the examples use the correct base URIs for the ontology.]</span><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool and can produce a lot of "<span class="quote">phantom</span>" predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2)extracts this whitespace as text. This might decrease NLP performance. It is recommended to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates. A normalized example is given below. The whitespace normalization algorithm itself is format dependend, e.g. it differs for HTML compared to general XML. Hence no normative algorithm for whitespace normalization is given as part of this specification.</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-HTML-whitespace-normalization" id="EX-HTML-whitespace-normalization" shape="rect"/>Example 24: Example of an HTML document with whitespace nornalized, as a preparation for conversion to NIF</div><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096">&lt;html&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;body&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;h2</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=span class="hl-value" style="color: #993300">"yes"</span><strong class="hl-tag" style="color: #000096">&gt;</strong>Welcome to <strong class="hl-tag" style="color: #000096">&lt;span</strong> 
+<h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="conversion-to-nif" id="conversion-to-nif" shape="rect"/>5.7 Conversion to NIF</h3><p>This section defines an algorithm to convert XML or HTML documents (or their DOM representations) that contain ITS metadata to the RDF-based format <a href="http://nlp2rdf.org/nif-1-0" shape="rect">NIF</a>. The conversion results in RDF triples that rely on the ITS 2.0 ontology, see tbd.</p><span class="editor-note">[Ed. note: Add link to ontology once it is done; assure that the examples use the correct base URIs for the ontology.]</span><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool and can produce a lot of "<span class="quote">phantom</span>" predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2)extracts this whitespace as text. This might decrease NLP performance. It is recommended to normalize whitespace in the input XML/HTML/DOM in order to minimize such phantom predicates. A normalized example is given below. The whitespace normalization algorithm itself is format dependent, e.g. it differs for HTML compared to general XML. Hence no normative algorithm for whitespace normalization is given as part of this specification.</p></div><div class="exampleOuter"><div class="exampleHeader"><a name="EX-HTML-whitespace-normalization" id="EX-HTML-whitespace-normalization" shape="rect"/>Example 24: Example of an HTML document with whitespace normalized as preparation for conversion to NIF</div><div class="exampleInner"><pre xml:space="preserve"><strong class="hl-tag" style="color: #000096">&lt;html&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;body&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;h2</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=<spn class="hl-value" style="color: #993300">"yes"</span><strong class="hl-tag" style="color: #000096">&gt;</strong>Welcome to <strong class="hl-tag" style="color: #000096">&lt;span</strong> 
    <span class="hl-attribute" style="color: #F5844C">its-disambig-ident-ref</span>=<span class="hl-value" style="color: #993300">"http://dbpedia.org/resource/Dublin"</span> 
    <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">&gt;</strong>Dublin<strong class="hl-tag" style="color: #000096">&lt;/span&gt;</strong> in <strong class="hl-tag" style="color: #000096">&lt;b</strong> <span class="hl-attribute" style="color: #F5844C">translate</span>=<span class="hl-value" style="color: #993300">"no"</span><strong class="hl-tag" style="color: #000096">&gt;</strong>Ireland<strong class="hl-tag" style="color: #000096">&lt;/b&gt;</strong>!<strong class="hl-tag" style="color: #000096">&lt;/h2&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;/body&gt;</strong><strong class="hl-tag" style="color: #000096">&lt;/html&gt;</strong></pre></div></div><p id="its2nif-algorithm">The conversion algorithm to generate NIF consists of seven steps.</p><ul><li><p id="its2nif-algorithm-step1">STEP 1: Get an ordered list of all text nodes of the document.</p></li><li><p id="itsnif-algorithm-step2">STEP 2: Generate an XPath expression for each non-empty text node of all leaf elements and remember them.</p></li><li><p id="its2nif-algorithm-step3">STEP 3: Get the text for each node and make a tuple with the XPath expressions (X,T). Since the text nodes have a certain order we now have a list of ordered tuples ((x0,t0), (x1,t1), ..., (xn,tn)).</p></li><li><p id="its2nif-algorithm-step4">STEP 4 (optional): Serialize as XML or as RDF. The list with the XPath-to-text mapping can also be kept in memory. Part of a serialization example is given below.</p></li></ul><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve">@prefix itsrdf: <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//www.w3.org/2005/11/its/rdf#&gt; .
 <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#xpath(x0)&gt; 
@@ -1274,17 +1274,20 @@
 <strong class="hl-tag" style="color: #000096">&lt;/mappings&gt;</strong></pre></div></div><span class="editor-note">[Ed. note: Below needs a reference to the ITS ontology, once available.]</span><ul><li><p id="its2nif-algorithm-step5">STEP 5: Create a context URI and attach the whole concatenated text of the document as reference.</p></li><li><p id="its2nif-algorithm-step6">STEP 6: Now attach any ITS metadata items from the XML/HTML/DOM input to respective NIF URIs using the ITS/RDF ontology (TODO Name).</p></li><li><p id="its2nif-algorithm-step7">STEP 7: Omit all irrelevant URIs (those that do not carry annotations, they will just bloat the data).</p></li></ul><div class="exampleInner"><div class="exampleOuter"><pre xml:space="preserve">@prefix itsrdf: <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//www.w3.org/2005/11/its/rdf#&gt; .
 <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#offset_0_29&gt;
     rdf:type             str:Context ;
+    rdf:type             str:OffsetBasedString ;
 # concatenate the whole text
     str:isString         "$(t0+t1+t2+...+tn)" ; 
     itsrdf:translate     "yes"^^<strong class="hl-tag" style="color: #000096">&lt;http:</strong>//www.w3.org/TR/its-2.0/its.xsd#yesOrNo&gt; ;
     str:occursIn      <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html&gt; .
 <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#offset_11_17&gt; 
     rdf:type              str:String ;
+    rdf:type              str:OffsetBasedString ;
     itsrdf:translate     "no"^^<strong class="hl-tag" style="color: #000096">&lt;http:</strong>//www.w3.org/TR/its-2.0/its.xsd#yesOrNo&gt; ;
     itsrdf:disambigIdentRef  <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//dbpedia.org/resource/Dublin&gt; ;
     str:referenceContext <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#offset_0_29&gt; .
 <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#offset_21_28&gt; 
     rdf:type              str:String ;
+    rdf:type              str:OffsetBasedString ;
     itsrdf:translate     "no"^^<strong class="hl-tag" style="color: #000096">&lt;http:</strong>//www.w3.org/TR/its-2.0/its.xsd#yesOrNo&gt; ;
     str:referenceContext <strong class="hl-tag" style="color: #000096">&lt;http:</strong>//example.com/exampledoc.html#offset_0_29&gt; .
 </pre></div></div><p>A complete sample output in RDF/XML format after step 7, given the input document <a href="#EX-HTML-whitespace-normalization" shape="rect">Example 24</a>, is available at <a href="examples/nif/EX-nif-conversion-output.xml" shape="rect">examples/nif/EX-nif-conversion-output.xml</a>.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The conversion to NIF is the basis for natural language processing (NLP) applications, creating for example named entity annotations. A non-normative algorithm to integrate these annotations into the original input document is given in <a class="section-ref" href="#nif-backconversion" shape="rect">Appendix G: Conversion NIF2ITS</a>. The algorithm in that appendix is non-normative since many choices depend on the actual NLP application.</p></div></div></div><div class="div1">
@@ -2053,7 +2056,8 @@
 								nodes to which this rule applies.</p></li><li><p>An optional <code>disambigGranularity</code> attribute that contains a string, specifying the granularity 
 						    level of the disambiguation. The value can be one of the following identifiers: 
 						    <code>lexicalConcept</code>, <code>ontologyConcept</code>, or <code>entity</code>.</p></li><li><p>At least one of the following:
-						    <ul><li><p>To specify the target type class, exactly one of the following:<ul><li><p>A <code>disambigClassPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a>
+						    <ul><li><p>To specify the target type class, exactly one of the following:
+						        <ul><li><p>A <code>disambigClassPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a>
 				                pointing to a node specifying the entity type class behind the selector.</p></li><li><p>A <code>disambigClassRef</code> attribute that contains a IRI, specifying the type class of the concept
 				                or entity behind the selector.</p></li><li><p>A <code>disambigClassRefPointer</code> attribute that contains a <a href="#selectors" shape="rect">relative selector</a>
 				                pointing to a node that holds a IRI that specifies the entity type class behind the selector.</p></li></ul></p></li><li><p>To specify the target identity, exactly one of the following:
@@ -3020,7 +3024,7 @@
     <strong class="hl-tag" style="color: #000096">&lt;p&gt;</strong>
       <strong class="hl-tag" style="color: #000096">&lt;span</strong> <span class="hl-attribute" style="color: #F5844C">its-loc-quality-issues-ref</span>=<span class="hl-value" style="color: #993300">#lq1</span><strong class="hl-tag" style="color: #000096">&gt;</strong>c'es<strong class="hl-tag" style="color: #000096">&lt;/span&gt;</strong> le contenu<strong class="hl-tag" style="color: #000096">&lt;/p&gt;</strong>
   <strong class="hl-tag" style="color: #000096">&lt;/body&gt;</strong>
-<strong class="hl-tag" style="color: #000096">&lt;/html&gt;</strong></pre></div><p>[Source file: <a href="examples/html5/EX-locQualityIssue-html5-local-2.html" shape="rect">examples/html5/EX-locQualityIssue-html5-local-2.html</a>]</p></div><span class="editor-note">[Ed. note: TODO for above: Finalize how HTML should work: use its-* attributes for standoff markup or markup inside the script element.]</span></div></div><div class="div2">
+<strong class="hl-tag" style="color: #000096">&lt;/html&gt;</strong></pre></div><p>[Source file: <a href="examples/html5/EX-locQualityIssue-html5-local-2.html" shape="rect">examples/html5/EX-locQualityIssue-html5-local-2.html</a>]</p></div><span class="editor-note">[Ed. note: TODO for above (not only as an example, in general): document that for using standoff in HTML5, there needs to be the same id in HTML "script" and in the embedded XML. This is needed also for the other standoff case (provenance).]</span></div></div><div class="div2">
 <h3><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="lqprecis" id="lqprecis" shape="rect"/>6.19 Localization Quality Précis</h3><div class="div3">
 <h4><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="lqprecis-definition" id="lqprecis-definition" shape="rect"/>6.19.1 Definition</h4><p>The <a href="#lqprecis" shape="rect">Localization Quality Précis</a> data
 							category is used to express an overall measurement of the localization
@@ -3725,8 +3729,7 @@
 									included in the previously listed values. This value <a href="#rfc-keywords" shape="rect">MUST NOT</a> be used for any tool-
 									or model-specific issues that can be mapped to the values listed
 									above.</p></li><li><p>In addition, this value is not synonymous with
-										<code>uncategorized</code> in that
-										<code>uncategorized</code> issues may be assigned to another
+										<code>uncategorized</code> in that<code>uncategorized</code> issues may be assigned to another
 									precise value, while other issues cannot.</p></li><li><p>If a system has an "miscellaneous" or "other" category, it
 										<a href="#rfc-keywords" shape="rect">MUST</a> be mapped to this
 									value even if the specific instance of the issue might be mapped
@@ -3924,7 +3927,7 @@
 &lt;/html&gt;</span></pre></div></div><p>Case 3: The NLP annotation created in NIF starts in one region and ends in another. Solution: No straight mapping is possible; a mapping can be created if both regions have the same parent.</p></div><div class="div1">
 <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="revisionlog" id="revisionlog" shape="rect"/>H Revision Log (Non-Normative)</h2><p id="changelog-since-20121023">The following log records major changes that
 					have been made to this document since the <a href="http://www.w3.org/TR/2012/WD-its20-20121023/" shape="rect">ITS 2.0 Working Draft
-						23 October 2012</a>.</p><ul><li><p>Clarified usage of <a href="#domain" shape="rect">Domain</a> data category in HTML5 in response to <a href="https://www.w3.org/International/multilingualweb/lt/track/issues/56" shape="rect">issue-56</a>.</p></li><li><p>Added the <a href="#lqissueDefs" shape="rect">enabled information</a> in <a class="section-ref" href="#lqissue" shape="rect">Section 6.18: Localization Quality Issue</a>.</p></li><li><p>Updated the <a href="#Disambiguation" shape="rect">Disambiguation</a> data category.</p></li><li><p>Fine tuned the algorithm to compute the result values of the <a href="#domain" shape="rect">Domain</a> data category.</p></li></ul><p id="changelog-since-20120829">The following log records major changes that
+						23 October 2012</a>.</p><ol class="depth1"><li><p>Clarified usage of <a href="#domain" shape="rect">Domain</a> data category in HTML5 in response to <a href="https://www.w3.org/International/multilingualweb/lt/track/issues/56" shape="rect">issue-56</a>.</p></li><li><p>Added the <a href="#lqissueDefs" shape="rect">enabled information</a> in <a class="section-ref" href="#lqissue" shape="rect">Section 6.18: Localization Quality Issue</a>.</p></li><li><p>Updated the <a href="#Disambiguation" shape="rect">Disambiguation</a> data category.</p></li><li><p>Fine tuned the algorithm to compute the result values of the <a href="#domain" shape="rect">Domain</a> data category.</p></li><li><p>Fix on <a class="example-ref" href="#EX-locQualityIssue-html5-local-2" shape="rect">86</a>: <code>id</code> attribute of <code>script</code> element now the same as of containing XML.</p></li><li><p>NIF example fix - see <a href="https://www.w3.org/International/multilingualweb/lt/track/actions/284" shape="rect">action-284</>.</p></li></ol><p id="changelog-since-20120829">The following log records major changes that
 					have been made to this document since the <a href="http://www.w3.org/TR/2012/WD-its20-20120829/" shape="rect">ITS 2.0 Working Draft
 						29 August 2012</a>.</p><ol class="depth1"><li><p>Added a first draft of <a class="section-ref" href="#translation-agent-provenance" shape="rect">Section 6.12: Translation Agent Provenance</a></p></li><li><p>Added <a class="section-ref" href="#html5-markup" shape="rect">Section 7: Using ITS Markup in HTML5</a>.</p></li><li><p>Removed inline markup declarations.</p></li><li><p>Addition of a <code>locQualityPrecisVote</code> attribute and a
 							<code>locQualityPrecisVotePointer</code> attribute to <a class="section-ref" href="#lqprecis" shape="rect">Section 6.19: Localization Quality Précis</a>.</p></li><li><p>A <a href="#its-information_versus_content" shape="rect">clarification</a> of ITS

Received on Monday, 12 November 2012 21:17:12 UTC