Re: How to put an annotation in HTML? from Felix Sasaki on 2013-04-26 (semantic-web@w3.org from April 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 26 Apr 2013 20:57:05 +0200
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: Denny Vrandečić <denny.vrandecic@wikimedia.de>, John Flynn <jflynn12@verizon.net>, semantic-web at W3C <semantic-web@w3c.org>
Message-ID: <517ACE01.6000601@w3.org>
Am 26.04.13 17:15, schrieb Sebastian Hellmann:
> Hi Denny,
> they are just several months away of becoming a recommendation, so it 
> will happen soon. They are starting implementation within some weeks.
> For exact details you would have to ask the mailing list or just wait 
> for a while ;)
>
> There should be an xslt stylesheet somewhere, that retrieves NIF RDF 
> from ITS within HTML.

Thanks for the ping, Sebastian - you encouraged me to finally put that 
online. See
http://www.w3.org/People/fsasaki/its20-general-processor/tools/its-ta-2-nif.xsl
with some mini documentation in the stylesheet and a sample 
transformation of an HTML document
http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile.html
here:
http://tinyurl.com/clwd64n
I think it provides the right triples http://tinyurl.com/btkvkvy

Let me know if you need more. I saw that in this thread there was also 
discussion about "term annotation" - this table
http://www.w3.org/TR/its20/#textAnalysis-info-pieces
and the note below the table might be helpful for you as well.


Felix

>
> All the best,
> Sebastian
>
>
> Am 26.04.2013 16:05, schrieb Denny Vrandečić:
>> Sebastian,
>>
>> thanks! its-ta-ident-ref is perfect! That's exactly what I have been 
>> looking for.
>>
>> Only drawbacks are, that it is not a Recommendation yet (what's the 
>> timeline here?), but that's not so terrible, and that this is the 
>> possibly worst attribute name I have seen so far in HTML.
>>
>> Still, that's what I am going to use! Thanks,
>> Cheers,
>> Denny
>>
>>
>>
>>
>>
>> 2013/4/26 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de 
>> <mailto:hellmann@informatik.uni-leipzig.de>>
>>
>>     Hi John and Denny,
>>     the problem is well known and RDFa has its limits. Please see the
>>     new ITS 2.0 spec [1], which provides a solution for this. ITS 2.0
>>     will likely be widely adopted by CMS and translation industry and
>>     it has an RDF transition using NIF[2] .
>>
>>     @Denny: For your request RDFa should be fine, if you just want to
>>     include:
>>     <http://sws.geonames.org/4951788>
>>     <http://sws.geonames.org/4951788> a owl:Thing .
>>
>>     Note that the resulting RDF does not contain any provenance
>>     information, so I am unsure, whether calling it an "annotation"
>>     is appropriate. It is rather an inclusion of extra triples in HTML.
>>     You are loosing any reference to "Springfield" as RDFa parsers
>>     don't support this.
>>     Turtle in HTML would also be an easy option:
>>     http://www.w3.org/TR/turtle/#xhtml
>>
>>     ITS 2.0 example:
>>     <p>It is well known, that <span
>>     its-ta-ident-ref="http://sws.geonames.org/4951788"
>>     <http://sws.geonames.org/4951788> >Springfield</span> has mild
>>     summers and short, but hard winters.</p>
>>     NIF:
>>     ...
>>     <http://example.com/doc.html#xpath(/p[1]/span[1]/text()[1])>
>>     <http://example.com/doc.html#xpath%28/p[1]/span[1]/text%28%29[1]%29>
>>         itsrdf:xpath2nif <http://example.com/doc.html#char=23,34>
>>     <http://example.com/doc.html#char=23,34> .
>>     <http://example.com/doc.html#char=23,34>
>>     <http://example.com/doc.html#char=23,34>
>>         rdf:type              nif:RFC5147String ;
>>         itsrdf:taIdentRef <http://sws.geonames.org/4951788>
>>     <http://sws.geonames.org/4951788> ;
>>     ...
>>
>>     Well, NIF is more for natural language processing tools and
>>     middleware, so it's overkill for just including the occasional
>>     triple now and then ...
>>
>>     All the best,
>>     Sebastian
>>
>>
>>
>>     [1] http://www.w3.org/TR/its20/
>>     [2] http://www.w3.org/TR/its20/#conversion-to-nif
>>
>>     Am 24.04.2013 22 <tel:24.04.2013%2022>:08, schrieb John Flynn:
>>>
>>>     I have long thought that a clean and simple method for
>>>     identifying terms in HTML that are instances of a specific
>>>     ontology would be a very valuable adjunct to the growth of the
>>>     Semantic Web. A number of years ago I proposed an approach to a
>>>     solution I called Instance Markup Language (1) which gained no
>>>     traction. The consensus at the time was that RDFa would provide
>>>     the solution for this need and also that it wasn't really
>>>     important because the great bulk of instance data would come
>>>     from large data bases and not from HTML. I don't think RDFa has
>>>     in fact provided a "clean and simple" way to identify specific
>>>     terms in HTML text and link those terms to classes or properties
>>>     in a specific ontology. I never thought my proposed approach was
>>>     exactly right, but I did have hope it would inspire someone come
>>>     forward with a similar, but cleaner, way to do this. Even though
>>>     the subject still occasionally come up, after all these years
>>>     it's pretty clear I was wrong about this being an important
>>>     component of Semantic Web technology.
>>>
>>>     (1) http://mysite.verizon.net/jflynn12/IML.htm
>>>
>>>     *From:*Denny Vrandečić [mailto:denny.vrandecic@wikimedia.de]
>>>     *Sent:* Wednesday, April 24, 2013 1:59 PM
>>>     *To:* semantic-web at W3C
>>>     *Subject:* How to put an annotation in HTML?
>>>
>>>     Sorry, probably a stupid questions:
>>>
>>>     Let us say, I have some HTML like this...
>>>
>>>     <p>It is well known, that Springfield has mild summers and
>>>     short, but hard winters.</p>
>>>
>>>     And now, for example in order to simplify extraction, I want to
>>>     annotate Springfield with an URI, maybe like this, to make sure
>>>     that the computer understands I mean the Springfield
>>>     in Massachusetts:
>>>
>>>     <p>It is well known, that <span
>>>     about="http://sws.geonames.org/4951788/">Springfield</span> has
>>>     mild summers and short, but hard winters.</p>
>>>
>>>     How do I actually do that?
>>>
>>>     Mind you, I don't want to add whole triples, but just annotate
>>>     the HTML and say "this element refers to the following URI".
>>>
>>>     Cheers,
>>>
>>>     Denny
>>>
>>>     -- 
>>>     Project director Wikidata
>>>     Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>>     Tel. +49-30-219 158 26-0 <tel:%2B49-30-219%20158%2026-0> |
>>>     http://wikimedia.de
>>>
>>>     Wikimedia Deutschland - Gesellschaft zur Förderung Freien
>>>     Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts
>>>     Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig
>>>     anerkannt durch das Finanzamt für Körperschaften I Berlin,
>>>     Steuernummer 27/681/51985 <tel:27%2F681%2F51985>.
>>>
>>
>>
>>     -- 
>>     Dipl. Inf. Sebastian Hellmann
>>     Department of Computer Science, University of Leipzig
>>     Projects: http://nlp2rdf.org , http://linguistics.okfn.org ,
>>     http://dbpedia.org/Wiktionary , http://dbpedia.org
>>     Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
>>     Research Group: http://aksw.org
>>
>>
>>
>>
>> -- 
>> Project director Wikidata
>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>> Tel. +49-30-219 158 26-0 | http://wikimedia.de
>>
>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens 
>> e.V. Eingetragen im Vereinsregister des Amtsgerichts 
>> Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig 
>> anerkannt durch das Finanzamt für Körperschaften I Berlin, 
>> Steuernummer 27/681/51985.
>
>
> -- 
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
> Deadline: *July 8th*)
> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
> http://dbpedia.org/Wiktionary , http://dbpedia.org
> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
> Research Group: http://aksw.org
Received on Friday, 26 April 2013 18:57:37 UTC