Re: {Disarmed} Re: How to put an annotation in HTML? from Hugh Glaser on 2013-04-27 (semantic-web@w3.org from April 2013)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sat, 27 Apr 2013 16:47:07 +0000
To: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
CC: Hugh Glaser <hg@ecs.soton.ac.uk>, Denny Vrandečić <denny.vrandecic@wikimedia.de>, semantic-web at W3C <semantic-web@w3c.org>, Tadej Stajner <tadej.stajner@ijs.si>, Felix Sasaki <fsasaki@w3.org>, "David Lewis" <dave.lewis@cs.tcd.ie>
Message-ID: <41EB6C3E-D021-4239-AA23-7E5CB1EDEA80@soton.ac.uk>
Hi.
On 27 Apr 2013, at 17:32, Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
 wrote:

> Hi Hugh,
> The vocabulary used by ITS will be well-defined and properly documented. You can find a description about all attributes starting with "its-ta" here:
> http://www.w3.org/TR/its20/#textanalysis

> 
> If you find anything missing you can still provide feedback to their mailing list ( public-multilingualweb-lt@w3.org ). We have discussed the textanalysis attributes quite thoroughly during the last months, but it always helps to get another perspective.
> 
> Your example seems to work quite well and should be correct and what the standard was intended for.
> 
> Actually, your example <span its-ta-ident-ref="http://usefulinc.com/ns/doap#developer">someone who works on</span>
> is quite interesting as #developer is an rdf:property. This might actually be problematic later in RDF as it causes OWL Full, when used as an object.
Ah - I think that is why I put it in - to see what happened :-)
I was thinking of putting a Class in as well, but I guess that makes less difference.
> 
> <http://example.com/doc.html#char=x,y>
>   rdf:type              nif:RFC5147String ;
>   itsrdf:taIdentRef<http://usefulinc.com/ns/doap#developer>  ;
> 
> One solution would be to make http://www.w3.org/2005/11/its/rdf#taIdentRef an rdf:Property instead  of and ObjectProperty, leaving it underspecified.
> Clearly this is not ideal for reasoners, but quite acceptable.
> 
> The other solution would be to add one more attribute to ITS for properties.
> 
> Any other ideas?
Sorry, I wouldn't dare - I am more of a parasite, feeding on the hard work of people who do the standards service.
And my apologies to people who are having severe indigestion at the example I used, as opposed to encoding it properly in RDF.
Best
> 
> All the best,
> Sebastian
> 
> 
> Am 27.04.2013 14:22, schrieb Hugh Glaser:
>> Great discussion, finding out about ITS in this context - thanks Sebastian.
>> I would never have found section 5.4 otherwise, or even thought that ITS had much direct relevance to RDF - the abstract certainly doesn't fire me with RDF enthusiasm :-)
>> 
>> That's some serious algorithm to do for its-ta-ident-ref (is that a Eurovision Song Contest entry?).
>> But it starts from a seriously simple annotation (which Denny asked for), which is exactly what we should be providing.
>> I can imagine (in some universe!) lots of documents getting things like
>> <span its-ta-ident-ref="http://dbpedia.org/resource/Dublin">Dublin</span>
>> all over the place, to great advantage for us consumers.
>> 
>> By hand, or really simple tools to use.
>> I am guessing there are such tools of which I am ignorant?
>> 
>> Am I right in thinking I can have things like?:
>> Am <span its-ta-ident-ref="http://id.ecs.soton.ac.uk/person/21">I</span> right in thinking that <span its-ta-ident-ref="http://www.w3.org/People/Berners-Lee/card#i">Tim</span> is <span its-ta-ident-ref="http://usefulinc.com/ns/doap#developer">someone who works on</span> the <span its-ta-ident-ref="http://dig.csail.mit.edu/2005/ajar/ajaw/data#Tabulator">Tabulator</span>? <span its-ta-ident-ref="http://www.w3.org/People/Berners-Lee/card#i">He</span> says so in his <span its-ta-ident-ref="http://www.w3.org/People/Berners-Lee/card">personal profile</span>.
>> 
>> Cheers
>> On 26 Apr 2013, at 15:05, Denny Vrandečić<denny.vrandecic@wikimedia.de>  wrote:
>> 
>>> Sebastian,
>>> 
>>> thanks! its-ta-ident-ref is perfect! That's exactly what I have been looking for.
>>> 
>>> Only drawbacks are, that it is not a Recommendation yet (what's the timeline here?), but that's not so terrible, and that this is the possibly worst attribute name I have seen so far in HTML.
>>> 
>>> Still, that's what I am going to use! Thanks,
>>> Cheers,
>>> Denny
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 2013/4/26 Sebastian Hellmann<hellmann@informatik.uni-leipzig.de>
>>> Hi John and Denny,
>>> the problem is well known and RDFa has its limits. Please see the new ITS 2.0 spec [1], which provides a solution for this. ITS 2.0 will likely be widely adopted by CMS and translation industry and it has an RDF transition using NIF[2] .
>>> 
>>> @Denny: For your request RDFa should be fine, if you just want to include:
>>> <http://sws.geonames.org/4951788>  a owl:Thing .
>>> 
>>> Note that the resulting RDF does not contain any provenance information, so I am unsure, whether calling it an "annotation" is appropriate. It is rather an inclusion of extra triples in HTML.
>>> You are loosing any reference to "Springfield" as RDFa parsers don't support this.
>>> Turtle in HTML would also be an easy option:http://www.w3.org/TR/turtle/#xhtml

>>> 
>>> ITS 2.0 example:
>>> <p>It is well known, that <span its-ta-ident-ref=MailScanner has detected a possible fraud attempt from "sws.geonames.org" claiming to be"http://sws.geonames.org/4951788"  >Springfield</span> has mild summers and short, but hard winters.</p>
>>> NIF:
>>> ...
>>> <http://example.com/doc.html#xpath(/p[1]/span[1]/text()[1])>      itsrdf:xpath2nif<http://example.com/doc.html#char=23,34>  .
>>> <http://example.com/doc.html#char=23,34>
>>>    rdf:type              nif:RFC5147String ;
>>>    itsrdf:taIdentRef<http://sws.geonames.org/4951788>  ;
>>> ...
>>> 
>>> Well, NIF is more for natural language processing tools and middleware, so it's overkill for just including the occasional triple now and then ...
>>> 
>>> All the best,
>>> Sebastian
>>> 
>>> 
>>> 
>>> [1]http://www.w3.org/TR/its20/

>>> [2]http://www.w3.org/TR/its20/#conversion-to-nif

>>> 
>>> Am 24.04.2013 22:08, schrieb John Flynn:
>>>> I have long thought that a clean and simple method for identifying terms in HTML that are instances of a specific ontology would be a very valuable adjunct to the growth of the Semantic Web. A number of years ago I proposed an approach to a solution I called Instance Markup Language (1) which gained no traction. The consensus at the time was that RDFa would provide the solution for this need and also that it wasn't really important because the great bulk of instance data would come from large data bases and not from HTML. I don't think RDFa has in fact provided a "clean and simple" way to identify specific terms in HTML text and link those terms to classes or properties in a specific ontology. I never thought my proposed approach was exactly right, but I did have hope it would inspire someone come forward with a similar, but cleaner, way to do this. Even though the subject still occasionally come up, after all these years it's pretty clear I was wrong about this
>>>>  being an important component of Semantic Web technology.
>>>> 
>>>> 
>>>> 
>>>> (1)http://mysite.verizon.net/jflynn12/IML.htm

>>>> 
>>>> 
>>>> 
>>>> From: Denny Vrandečić [mailto:denny.vrandecic@wikimedia.de]
>>>> Sent: Wednesday, April 24, 2013 1:59 PM
>>>> To: semantic-web at W3C
>>>> Subject: How to put an annotation in HTML?
>>>> 
>>>> 
>>>> 
>>>> Sorry, probably a stupid questions:
>>>> 
>>>> 
>>>> 
>>>> Let us say, I have some HTML like this...
>>>> 
>>>> 
>>>> 
>>>> <p>It is well known, that Springfield has mild summers and short, but hard winters.</p>
>>>> 
>>>> 
>>>> 
>>>> And now, for example in order to simplify extraction, I want to annotate Springfield with an URI, maybe like this, to make sure that the computer understands I mean the Springfield in Massachusetts:
>>>> 
>>>> 
>>>> 
>>>> <p>It is well known, that <span about="http://sws.geonames.org/4951788/">Springfield</span> has mild summers and short, but hard winters.</p>
>>>> 
>>>> 
>>>> 
>>>> How do I actually do that?
>>>> 
>>>> 
>>>> 
>>>> Mind you, I don't want to add whole triples, but just annotate the HTML and say "this element refers to the following URI".
>>>> 
>>>> 
>>>> 
>>>> Cheers,
>>>> 
>>>> Denny
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Project director Wikidata
>>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>>> Tel. +49-30-219 158 26-0 |http://wikimedia.de

>>>> 
>>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
>>>> 
>>> -- 
>>> Dipl. Inf. Sebastian Hellmann
>>> Department of Computer Science, University of Leipzig
>>> Projects:http://nlp2rdf.org  ,http://linguistics.okfn.org  ,http://dbpedia.org/Wiktionary  ,http://dbpedia.org

>>> Homepage:http://bis.informatik.uni-leipzig.de/SebastianHellmann

>>> Research Group:http://aksw.org

>>> 
>>> 
>>> 
>>> -- 
>>> Project director Wikidata
>>> Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
>>> Tel. +49-30-219 158 26-0 |http://wikimedia.de

>>> 
>>> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.
> 
> 
> -- 
> Dipl. Inf. Sebastian Hellmann
> Department of Computer Science, University of Leipzig
> Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline: *July 8th*)
> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org

> Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann

> Research Group: http://aksw.org

>
Received on Saturday, 27 April 2013 16:48:22 UTC