- From: Felix Sasaki <fsasaki@w3.org>
- Date: Thu, 16 May 2013 16:53:43 +0200
- To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- CC: Denny Vrandečić <denny.vrandecic@wikimedia.de>, John Flynn <jflynn12@verizon.net>, semantic-web at W3C <semantic-web@w3c.org>
- Message-ID: <5194F2F7.9050606@w3.org>
Hi Sebastian, all, coming back to an old thread. Am 26.04.13 20:57, schrieb Felix Sasaki: > Am 26.04.13 17:15, schrieb Sebastian Hellmann: >> Hi Denny, >> they are just several months away of becoming a recommendation, so it >> will happen soon. They are starting implementation within some weeks. >> For exact details you would have to ask the mailing list or just wait >> for a while ;) >> >> There should be an xslt stylesheet somewhere, that retrieves NIF RDF >> from ITS within HTML. > > Thanks for the ping, Sebastian - you encouraged me to finally put that > online. See > http://www.w3.org/People/fsasaki/its20-general-processor/tools/its-ta-2-nif.xsl Above is now updated to do better white space handling. There is now also a stylesheet to go back from NIF to an HTML document and generate its-ta-ident-ref etc. How to use this 1) Sample input doc http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile-without-ta-annotations.html 2) Output of generating NIF from 1), and of generating entity annotations in the NIF wrapper (here done manually) http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/its-ta-2-nif-output.rdf 3) XSLT Stylesheet to go back from 2) to 1) and to add the entity annotations to the HTML http://www.w3.org/People/fsasaki/its20-general-processor/tools/nif-2-its-ta.xsl 4) Output of 3) http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/nit-2-its-ta-output.html with some javascript to show the annotations. Comments welcome. At Sebastian: the NIF RDF/XML is not yet up to date wrt to the comments you gave during the MWL-LT f2f call 8 May, I'll do that later. Felix > with some mini documentation in the stylesheet and a sample > transformation of an HTML document > http://www.w3.org/People/fsasaki/its20-general-processor/sample/nif-conversion/inputfile.html > here: > http://tinyurl.com/clwd64n > I think it provides the right triples http://tinyurl.com/btkvkvy > > Let me know if you need more. I saw that in this thread there was also > discussion about "term annotation" - this table > http://www.w3.org/TR/its20/#textAnalysis-info-pieces > and the note below the table might be helpful for you as well. > > > Felix > >> >> All the best, >> Sebastian >> >> >> Am 26.04.2013 16:05, schrieb Denny Vrandečić: >>> Sebastian, >>> >>> thanks! its-ta-ident-ref is perfect! That's exactly what I have been >>> looking for. >>> >>> Only drawbacks are, that it is not a Recommendation yet (what's the >>> timeline here?), but that's not so terrible, and that this is the >>> possibly worst attribute name I have seen so far in HTML. >>> >>> Still, that's what I am going to use! Thanks, >>> Cheers, >>> Denny >>> >>> >>> >>> >>> >>> 2013/4/26 Sebastian Hellmann <hellmann@informatik.uni-leipzig.de >>> <mailto:hellmann@informatik.uni-leipzig.de>> >>> >>> Hi John and Denny, >>> the problem is well known and RDFa has its limits. Please see >>> the new ITS 2.0 spec [1], which provides a solution for this. >>> ITS 2.0 will likely be widely adopted by CMS and translation >>> industry and it has an RDF transition using NIF[2] . >>> >>> @Denny: For your request RDFa should be fine, if you just want >>> to include: >>> <http://sws.geonames.org/4951788> >>> <http://sws.geonames.org/4951788> a owl:Thing . >>> >>> Note that the resulting RDF does not contain any provenance >>> information, so I am unsure, whether calling it an "annotation" >>> is appropriate. It is rather an inclusion of extra triples in HTML. >>> You are loosing any reference to "Springfield" as RDFa parsers >>> don't support this. >>> Turtle in HTML would also be an easy option: >>> http://www.w3.org/TR/turtle/#xhtml >>> >>> ITS 2.0 example: >>> <p>It is well known, that <span >>> its-ta-ident-ref="http://sws.geonames.org/4951788" >>> <http://sws.geonames.org/4951788> >Springfield</span> has mild >>> summers and short, but hard winters.</p> >>> NIF: >>> ... >>> <http://example.com/doc.html#xpath(/p[1]/span[1]/text()[1])> >>> <http://example.com/doc.html#xpath%28/p[1]/span[1]/text%28%29[1]%29> >>> >>> itsrdf:xpath2nif <http://example.com/doc.html#char=23,34> >>> <http://example.com/doc.html#char=23,34> . >>> <http://example.com/doc.html#char=23,34> >>> <http://example.com/doc.html#char=23,34> >>> rdf:type nif:RFC5147String ; >>> itsrdf:taIdentRef <http://sws.geonames.org/4951788> >>> <http://sws.geonames.org/4951788> ; >>> ... >>> >>> Well, NIF is more for natural language processing tools and >>> middleware, so it's overkill for just including the occasional >>> triple now and then ... >>> >>> All the best, >>> Sebastian >>> >>> >>> >>> [1] http://www.w3.org/TR/its20/ >>> [2] http://www.w3.org/TR/its20/#conversion-to-nif >>> >>> Am 24.04.2013 22 <tel:24.04.2013%2022>:08, schrieb John Flynn: >>>> >>>> I have long thought that a clean and simple method for >>>> identifying terms in HTML that are instances of a specific >>>> ontology would be a very valuable adjunct to the growth of the >>>> Semantic Web. A number of years ago I proposed an approach to a >>>> solution I called Instance Markup Language (1) which gained no >>>> traction. The consensus at the time was that RDFa would provide >>>> the solution for this need and also that it wasn't really >>>> important because the great bulk of instance data would come >>>> from large data bases and not from HTML. I don't think RDFa has >>>> in fact provided a "clean and simple" way to identify specific >>>> terms in HTML text and link those terms to classes or >>>> properties in a specific ontology. I never thought my proposed >>>> approach was exactly right, but I did have hope it would >>>> inspire someone come forward with a similar, but cleaner, way >>>> to do this. Even though the subject still occasionally come up, >>>> after all these years it's pretty clear I was wrong about this >>>> being an important component of Semantic Web technology. >>>> >>>> (1) http://mysite.verizon.net/jflynn12/IML.htm >>>> >>>> *From:*Denny Vrandečić [mailto:denny.vrandecic@wikimedia.de] >>>> *Sent:* Wednesday, April 24, 2013 1:59 PM >>>> *To:* semantic-web at W3C >>>> *Subject:* How to put an annotation in HTML? >>>> >>>> Sorry, probably a stupid questions: >>>> >>>> Let us say, I have some HTML like this... >>>> >>>> <p>It is well known, that Springfield has mild summers and >>>> short, but hard winters.</p> >>>> >>>> And now, for example in order to simplify extraction, I want to >>>> annotate Springfield with an URI, maybe like this, to make sure >>>> that the computer understands I mean the Springfield >>>> in Massachusetts: >>>> >>>> <p>It is well known, that <span >>>> about="http://sws.geonames.org/4951788/">Springfield</span> has >>>> mild summers and short, but hard winters.</p> >>>> >>>> How do I actually do that? >>>> >>>> Mind you, I don't want to add whole triples, but just annotate >>>> the HTML and say "this element refers to the following URI". >>>> >>>> Cheers, >>>> >>>> Denny >>>> >>>> -- >>>> Project director Wikidata >>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin >>>> Tel. +49-30-219 158 26-0 <tel:%2B49-30-219%20158%2026-0> | >>>> http://wikimedia.de >>>> >>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien >>>> Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts >>>> Berlin-Charlottenburg unter der Nummer 23855 B. Als >>>> gemeinnützig anerkannt durch das Finanzamt für Körperschaften I >>>> Berlin, Steuernummer 27/681/51985 <tel:27%2F681%2F51985>. >>>> >>> >>> >>> -- >>> Dipl. Inf. Sebastian Hellmann >>> Department of Computer Science, University of Leipzig >>> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >>> http://dbpedia.org/Wiktionary , http://dbpedia.org >>> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann >>> Research Group: http://aksw.org >>> >>> >>> >>> >>> -- >>> Project director Wikidata >>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin >>> Tel. +49-30-219 158 26-0 | http://wikimedia.de >>> >>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens >>> e.V. Eingetragen im Vereinsregister des Amtsgerichts >>> Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig >>> anerkannt durch das Finanzamt für Körperschaften I Berlin, >>> Steuernummer 27/681/51985. >> >> >> -- >> Dipl. Inf. Sebastian Hellmann >> Department of Computer Science, University of Leipzig >> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, >> Deadline: *July 8th*) >> Projects: http://nlp2rdf.org , http://linguistics.okfn.org , >> http://dbpedia.org/Wiktionary , http://dbpedia.org >> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann >> Research Group: http://aksw.org >
Received on Thursday, 16 May 2013 14:54:15 UTC