- From: CVS User fsasaki <cvsmail@w3.org>
- Date: Fri, 06 Sep 2013 11:09:51 +0000
- To: public-multilingualweb-lt-commits@w3.org
Update of /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20 In directory gil:/tmp/cvs-serv5804 Modified Files: its20.html its20.odd Log Message: implemented further nif section fixes, see http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Sep/0027.html --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2013/09/06 10:11:40 1.494 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.html 2013/09/06 11:09:50 1.495 @@ -5594,7 +5594,7 @@ <a href="http://www.xulplanet.com/" shape="rect"><cite>exTensible User Interface Language</cite></a>. Available at <a href="http://www.xulplanet.com/" shape="rect"> http://www.xulplanet.com/</a>.</dd></dl></div><div class="div1"> <h2><a href="#contents" shape="rect"><img src="images/topOfPage.gif" align="right" height="26" width="26" title="Go to the table of contents." alt="Go to the table of contents."/></a><a name="conversion-to-nif" id="conversion-to-nif" shape="rect"/>F Conversion to NIF (Non-Normative)</h2><p>This section provides an informative algorithm to convert XML or HTML documents (or their DOM - representations) that contain ITS metadata to the RDF format based on <a title="" href="#nif-reference" shape="rect">[NIF]</a>. The conversion results in RDF triples.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can + representations) that contain ITS metadata to the RDF format based on <a title="" href="#nif-reference" shape="rect">[NIF]</a>. The conversion results in RDF triples.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm creates URIs that in the query part contain the characters "[" and "]", as part of XPath expressions. In the conversion output (see an <a href="examples/nif/EX-nif-conversion-output.xml" shape="rect">example</a>), The URIs are escaped as "%5B" and "%5D". For readability the URIs shown in this section do not escape these characters.</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can produce a lot of "<span class="quote">phantom</span>" predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2) extracts this whitespace as text, and therefore might decrease NLP performance. It is strongly recommended to @@ -5681,7 +5681,7 @@ nif:beginIndex "0" ; nif:endIndex "29" ; itsrdf:translate "yes"; - nif:sourceUrl <http://example.com/exampledoc.html> . + nif:sourceUrl <http://example.com/doc.html> . <http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&char=11,17> rdf:type nif:RFC5147String ; nif:beginIndex "11" ; @@ -5698,14 +5698,12 @@ </pre></div></div><p>A complete sample output in RDF/XML format after step 7, given the input document <a href="#EX-HTML-whitespace-normalization" shape="rect">Example 97</a>, is available at <a href="examples/nif/EX-nif-conversion-output.xml" shape="rect">examples/nif/EX-nif-conversion-output.xml</a>.</p><div class="note"><p class="prefix"><b>Note:</b></p><p>The conversion to NIF is a possible basis for a natural language processing (NLP) application that creates, for example, named entity annotations. A non-normative algorithm to integrate these annotations into the original input document is given in <a class="section-ref" href="#nif-backconversion" shape="rect">Appendix G: Conversion NIF2ITS</a>. Many decisions to be made in this algorithm - depend on the particular NLP application being used.</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p>NIF allows URL for a String resource to be referenced as URIs + depend on the particular NLP application being used.</p></div><div class="note"><p class="prefix"><b>Note:</b></p><p>NIF allows an URL for a String resource to be referenced as URIs that are fragments of the original document in the form:<br clear="none"/><code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html#char=0,11</code> - <br clear="none"/>or - <br clear="none"/><code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&xpath=/html/body[1]/h2[1]/text()[1]</code> + <br clear="none"/>or<br clear="none"/><code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&xpath=/html/body[1]/h2[1]/text()[1]</code> <br clear="none"/> - This offers a convenient mechanism for linking NIF resources in RDF back - to the original document. RDF treats URIs as opaque and does not impose + to the original document. The <a href="http://persistence.uni-leipzig.org/nlp2rdf/specification/api.html" shape="rect">NIF Web Service Access Specification</a> defines the parameters for NIF web services.</p><p>RDF treats URIs as opaque and does not impose any semantic constraints on the used fragment identifiers, thus enabling their usage in RDF in a consistent manner. However, fragment identifiers get interpreted according to the retrieved mime type, if a retrieval @@ -5743,7 +5741,7 @@ <http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&char=21,28> itsrdf:taIdentRef <http://dbpedia.org/resource/Ireland> . # we can attach the metadata to the parent node: -<b its-ta-ident-ref="http://dbpedia.org/resource/Dublin" +<b its-ta-ident-ref="http://dbpedia.org/resource/Ireland" translate="no">Ireland</b> </pre></div></div><p>CASE 2: The NLP annotation created in NIF is a substring of the text node. Solution: Create a new element, e.g., for HTML "span". A different input example is given below as --- /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2013/09/06 10:11:40 1.509 +++ /w3ccvs/WWW/International/multilingualweb/lt/drafts/its20/its20.odd 2013/09/06 11:09:51 1.510 @@ -5536,6 +5536,7 @@ <p>This section provides an informative algorithm to convert XML or HTML documents (or their DOM representations) that contain ITS metadata to the RDF format based on <ptr target="#nif-reference" type="bibref"/>. The conversion results in RDF triples.</p> + <note> <p>The algorithm creates URIs that in the query part contain the characters "[" and "]", as part of XPath expressions. In the conversion output (see an <ref target="examples/nif/EX-nif-conversion-output.xml">example</ref>), The URIs are escaped as "%5B" and "%5D". For readability the URIs shown in this section do not escape these characters.</p></note> <note><p>The algorithm is intended to extract the text from the XML/HTML/DOM for an NLP tool. It can produce a lot of <quote>phantom</quote> predicates from excessive whitespace, which 1) increases the size of the intermediate mapping and 2) extracts this whitespace as @@ -5645,7 +5646,7 @@ nif:beginIndex "0" ; nif:endIndex "29" ; itsrdf:translate "yes"; - nif:sourceUrl <http://example.com/exampledoc.html> . + nif:sourceUrl <http://example.com/doc.html> . <http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&char=11,17> rdf:type nif:RFC5147String ; nif:beginIndex "11" ; @@ -5670,14 +5671,12 @@ target="#nif-backconversion" type="specref"/>. Many decisions to be made in this algorithm depend on the particular NLP application being used.</p></note> <note> - <p>NIF allows URL for a String resource to be referenced as URIs + <p>NIF allows an URL for a String resource to be referenced as URIs that are fragments of the original document in the form:<?br?> - <code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html#char=0,11</code> - <?br?>or - <?br?><code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&xpath=/html/body[1]/h2[1]/text()[1]</code><?br?> - + <code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html#char=0,11</code> <?br?>or<?br?><code>http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&xpath=/html/body[1]/h2[1]/text()[1]</code><?br?> This offers a convenient mechanism for linking NIF resources in RDF back - to the original document. RDF treats URIs as opaque and does not impose + to the original document. The <ref target="http://persistence.uni-leipzig.org/nlp2rdf/specification/api.html">NIF Web Service Access Specification</ref> defines the parameters for NIF web services.</p> + <p>RDF treats URIs as opaque and does not impose any semantic constraints on the used fragment identifiers, thus enabling their usage in RDF in a consistent manner. However, fragment identifiers get interpreted according to the retrieved mime type, if a retrieval @@ -5738,7 +5737,7 @@ <http://example.com/myitsservice?informat=html&intype=url&input=http://example.com/doc.html&char=21,28> itsrdf:taIdentRef <http://dbpedia.org/resource/Ireland> . # we can attach the metadata to the parent node: -<b its-ta-ident-ref="http://dbpedia.org/resource/Dublin" +<b its-ta-ident-ref="http://dbpedia.org/resource/Ireland" translate="no">Ireland</b> ]]></eg> <p>CASE 2: The NLP annotation created in NIF is a substring of the text node. Solution:
Received on Friday, 6 September 2013 11:09:52 UTC