NIF ITS roundtripping (Re: How to put an annotation in HTML?) from Felix Sasaki on 2013-05-16 (semantic-web@w3.org from May 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 16 May 2013 16:53:43 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: Denny Vrandečić <denny.vrandecic@wikimedia.de>, John Flynn <jflynn12@verizon.net>, semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <5194F2F7.9050606@w3.org>
Hi Sebastian, all,

coming back to an old thread.

Am 26.04.13 20:57, schrieb Felix Sasaki:
> Am 26.04.13 17:15, schrieb Sebastian Hellmann:
>> Hi Denny,
>> they are just several months away of becoming a recommendation, so it 
>> will happen soon. They are starting implementation within some weeks.
>> For exact details you would have to ask the mailing list or just wait 
>> for a while ;)
>>
>> There should be an xslt stylesheet somewhere, that retrieves NIF RDF 
>> from ITS within HTML.
>
> Thanks for the ping, Sebastian - you encouraged me to finally put that 
> online. See
> http://www.w3.org/People/fsasaki/its20-general-processor/tools/its-ta-2-nif.xsl

Above is now updated to do better white space handling. There is now 
also a stylesheet to go back from NIF to an HTML document and generate 
its-ta-ident-ref etc.

How to use this

1) Sample input doc
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile-without-ta-annotations.html
2) Output of generating NIF from 1), and of generating entity 
annotations in the NIF wrapper (here done manually)
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/its-ta-2-nif-output.rdf
3) XSLT Stylesheet to go back from 2) to 1) and to add the entity 
annotations to the HTML
http://www.w3.org/People/fsasaki/its20-general-processor/tools/nif-2-its-ta.xsl
4) Output of 3)
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/nit-2-its-ta-output.html
with some javascript to show the annotations.

Comments welcome. At Sebastian: the NIF RDF/XML is not yet up to date 
wrt to the comments you gave during the MWL-LT f2f call 8 May, I'll do 
that later.

Felix

> with some mini documentation in the stylesheet and a sample 
> transformation of an HTML document
> http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile.html
> here:
> http://tinyurl.com/clwd64n
> I think it provides the right triples http://tinyurl.com/btkvkvy
>
> Let me know if you need more. I saw that in this thread there was also 
> discussion about "term annotation" - this table
> http://www.w3.org/TR/its20/#textAnalysis-info-pieces
> and the note below the table might be helpful for you as well.
>
>
> Felix
>
>>
>> All the best,
>> Sebastian
>>
>>
>> Am 26.04.2013 16:05, schrieb Denny Vrandečić:
>>> Sebastian,
>>>
>>> thanks! its-ta-ident-ref is perfect! That's exactly what I have been 
>>> looking for.
>>>
>>> Only drawbacks are, that it is not a Recommendation yet (what's the 
>>> timeline here?), but that's not so terrible, and that this is the 
>>> possibly worst attribute name I have seen so far in HTML.
>>>
>>> Still, that's what I am going to use! Thanks,
>>> Cheers,
>>> Denny
>>>
>>>
>>>
>>>
>>>
>>> 2013/4/26 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de 
>>> <mailto:hellmann@informatik.uni-leipzig.de>>
>>>
>>>     Hi John and Denny,
>>>     the problem is well known and RDFa has its limits. Please see
>>>     the new ITS 2.0 spec [1], which provides a solution for this.
>>>     ITS 2.0 will likely be widely adopted by CMS and translation
>>>     industry and it has an RDF transition using NIF[2] .
>>>
>>>     @Denny: For your request RDFa should be fine, if you just want
>>>     to include:
>>>     <http://sws.geonames.org/4951788>
>>>     <http://sws.geonames.org/4951788> a owl:Thing .
>>>
>>>     Note that the resulting RDF does not contain any provenance
>>>     information, so I am unsure, whether calling it an "annotation"
>>>     is appropriate. It is rather an inclusion of extra triples in HTML.
>>>     You are loosing any reference to "Springfield" as RDFa parsers
>>>     don't support this.
>>>     Turtle in HTML would also be an easy option:
>>>     http://www.w3.org/TR/turtle/#xhtml
>>>
>>>     ITS 2.0 example:
>>>     <p>It is well known, that <span
>>>     its-ta-ident-ref="http://sws.geonames.org/4951788"
>>>     <http://sws.geonames.org/4951788> >Springfield</span> has mild
>>>     summers and short, but hard winters.</p>
>>>     NIF:
>>>     ...
>>>     <http://example.com/doc.html#xpath(/p[1]/span[1]/text()[1])>
>>>     <http://example.com/doc.html#xpath%28/p[1]/span[1]/text%28%29[1]%29>
>>>
>>>         itsrdf:xpath2nif <http://example.com/doc.html#char=23,34>
>>>     <http://example.com/doc.html#char=23,34> .
>>>     <http://example.com/doc.html#char=23,34>
>>>     <http://example.com/doc.html#char=23,34>
>>>         rdf:type              nif:RFC5147String ;
>>>         itsrdf:taIdentRef <http://sws.geonames.org/4951788>
>>>     <http://sws.geonames.org/4951788> ;
>>>     ...
>>>
>>>     Well, NIF is more for natural language processing tools and
>>>     middleware, so it's overkill for just including the occasional
>>>     triple now and then ...
>>>
>>>     All the best,
>>>     Sebastian
>>>
>>>
>>>
>>>     [1] http://www.w3.org/TR/its20/
>>>     [2] http://www.w3.org/TR/its20/#conversion-to-nif
>>>
>>>     Am 24.04.2013 22 <tel:24.04.2013%2022>:08, schrieb John Flynn:
>>>>
>>>>     I have long thought that a clean and simple method for
>>>>     identifying terms in HTML that are instances of a specific
>>>>     ontology would be a very valuable adjunct to the growth of the
>>>>     Semantic Web. A number of years ago I proposed an approach to a
>>>>     solution I called Instance Markup Language (1) which gained no
>>>>     traction. The consensus at the time was that RDFa would provide
>>>>     the solution for this need and also that it wasn't really
>>>>     important because the great bulk of instance data would come
>>>>     from large data bases and not from HTML. I don't think RDFa has
>>>>     in fact provided a "clean and simple" way to identify specific
>>>>     terms in HTML text and link those terms to classes or
>>>>     properties in a specific ontology. I never thought my proposed
>>>>     approach was exactly right, but I did have hope it would
>>>>     inspire someone come forward with a similar, but cleaner, way
>>>>     to do this. Even though the subject still occasionally come up,
>>>>     after all these years it's pretty clear I was wrong about this
>>>>     being an important component of Semantic Web technology.
>>>>
>>>>     (1) http://mysite.verizon.net/jflynn12/IML.htm
>>>>
>>>>     *From:*Denny Vrandečić [mailto:denny.vrandecic@wikimedia.de]
>>>>     *Sent:* Wednesday, April 24, 2013 1:59 PM
>>>>     *To:* semantic-web at W3C
>>>>     *Subject:* How to put an annotation in HTML?
>>>>
>>>>     Sorry, probably a stupid questions:
>>>>
>>>>     Let us say, I have some HTML like this...
>>>>
>>>>     <p>It is well known, that Springfield has mild summers and
>>>>     short, but hard winters.</p>
>>>>
>>>>     And now, for example in order to simplify extraction, I want to
>>>>     annotate Springfield with an URI, maybe like this, to make sure
>>>>     that the computer understands I mean the Springfield
>>>>     in Massachusetts:
>>>>
>>>>     <p>It is well known, that <span
>>>>     about="http://sws.geonames.org/4951788/">Springfield</span> has
>>>>     mild summers and short, but hard winters.</p>
>>>>
>>>>     How do I actually do that?
>>>>
>>>>     Mind you, I don't want to add whole triples, but just annotate
>>>>     the HTML and say "this element refers to the following URI".
>>>>
>>>>     Cheers,
>>>>
>>>>     Denny
>>>>
>>>>     -- 
>>>>     Project director Wikidata
>>>>     Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>>>     Tel. +49-30-219 158 26-0 <tel:%2B49-30-219%20158%2026-0> |
>>>>     http://wikimedia.de
>>>>
>>>>     Wikimedia Deutschland - Gesellschaft zur Förderung Freien
>>>>     Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts
>>>>     Berlin-Charlottenburg unter der Nummer 23855 B. Als
>>>>     gemeinnützig anerkannt durch das Finanzamt für Körperschaften I
>>>>     Berlin, Steuernummer 27/681/51985 <tel:27%2F681%2F51985>.
>>>>
>>>
>>>
>>>     -- 
>>>     Dipl. Inf. Sebastian Hellmann
>>>     Department of Computer Science, University of Leipzig
>>>     Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>>     http://dbpedia.org/Wiktionary , http://dbpedia.org
>>>     Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>>     Research Group: http://aksw.org
>>>
>>>
>>>
>>>
>>> -- 
>>> Project director Wikidata
>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>>>
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens 
>>> e.V. Eingetragen im Vereinsregister des Amtsgerichts 
>>> Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig 
>>> anerkannt durch das Finanzamt für Körperschaften I Berlin, 
>>> Steuernummer 27/681/51985.
>>
>>
>> -- 
>> Dipl. Inf. Sebastian Hellmann
>> Department of Computer Science, University of Leipzig
>> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
>> Deadline: *July 8th*)
>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
>> http://dbpedia.org/Wiktionary , http://dbpedia.org
>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>> Research Group: http://aksw.org
>
Received on Thursday, 16 May 2013 14:54:15 UTC