[issue 3] How to specify the entity type with RDFa ? from Maxime Lefrançois on 2012-04-27 (public-multilingualweb-lt@w3.org from April 2012)

From: Maxime Lefrançois <maxime.lefrancois@inria.fr>
Date: Fri, 27 Apr 2012 16:30:54 +0200 (CEST)
To: public-multilingualweb-lt@w3.org
Message-ID: <1757693516.223601.1335537054696.JavaMail.root@zmbs3.inria.fr>
Hi, 



When one says that a fragment of text is identified as a named entity... how should we model that if we were to model that with semantic web formalisms ? 




In the mail http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0130.html , I commented on pros and cons of three cases using only html. In the following, I comment on pros and cons of two cases using RFAa (I definitly vote for RDFa-B, or similar.) 




//case RDFa-A: typeof 
We represent the named entity <span typeof" http://dbpedia.org/resource/Barcelona " property="its:value" lang="en">Barcelona</span> using RDFa: 
- meaning: there is a resource in the document (no id here so it is a blank-node) that is an instance of the resource < http://dbpedia.org/resource/Barcelona >, and that has for its:value in english by the string literal "Barcelona". 
- turtle translation: two triples: [ _:a rdf:type < http://dbpedia.org/resource/Barcelona > ; its:value "Barcelona"@en . ] . 
- Bad solution because it has a wrong meaning: the resource identified here is not the named entity but an instance of it. 


//case RDFa-B: 
We represent the named entity <span property="its:value" lang="en"> <meta rel="its:lexicalizes" resource=" http://dbpedia.org/resource/Barcelona "/>Barcelona</span> using RDFa: 
- meaning: very precise meanings : there is a resource in the document (no id here so it is a blank-node) that its:lexicalizes the resource < http://dbpedia.org/resource/Barcelona >, and that has for its:value in english the string literal "Barcelona". 
- turtle translation: two triples: [ _:a its:lexicalizes < http://dbpedia.org/resource/Barcelona > ; its:value "Barcelona"@en . ] . 
- shift of meaning: the resource identified here is not the named entity. Let class X be the domain of the property its:lexicalizes, The resource is an instance of class X. What should class X be ? 


// RDFa cases in general 

+1: browsers will support it, CSS can't be applied directly but javascript extensions can detect these elements, and a web api could be designed along with the recommendation to deal with this 
+1: we can refer to the same entity in different places of the same document, and in different documents. 
+1: RDFa permits a standard prefix mechanism (its:value will be resolved as a full uri) 
-1: We embrace the capabilities of all the SemanticWeb applications, especially the decoupled content management http://nemein.com/en/blog/decoupled_content_management_on_tour/ , leverage the interoperability between existing/futur w3c standards (correct validation, easy data consumability by browser extensions, enhanced search engine results goo.gl/aCb2P , etc.) 
+1: if the resource that we link to is dereferencable (it has a uri that when you do a http get on it, you get a description of the resource, cf. linked data), we can use their description to get their type. 
+1: if the resource that we link to is not dereferencable we can insert another triple in the document to specify its type. 
There are very precise meanings because we use semantic web formalisms 


// three interesting use-cases derived from the fact we use RDFa: 
If the named entity < http://dbpedia.org/resource/Barcelona > is described elsewhere on the web, three use cases arise with the description: <span typeof" http://dbpedia.org/resource/Barcelona " property="its:value" lang="en"> xyz </span> , or 
<span property="its:value" lang="en"> <meta rel="its:lexicalizes" resource=" http://dbpedia.org/resource/Barcelona "/> xyz </span> 
- description : we can consider that in this document, the text fragment "xyz" is desirably identified by the named entity < http://dbpedia.org/resource/Barcelona >, so it is a good lexicalization of the named entity in a certain context (valuable piece of information) 
- validation : we can design validators that will look for the resource < http://dbpedia.org/resource/Barcelona > somewhere on the web, in this case it would find that there is a triple [ < http://dbpedia.org/resource/Barcelona > rdf:label "Barcelona"@en . ] and it would throw stg like a "mis-spell exception" (if a fragment of a its 1.0 annotated xml document tbx tmx xliff or whatever, is retrieved, I guess it could lead to a similar result) 
- edition : we can design semi-automatic systems that won't throw an exception, but that will suggest to replace "xyz" by one of the correct values found on the web, using widgets similar to the spelling correctors for instance. 


Best, 
Maxime Lefrançois 
Ph.D. Student, INRIA - WIMMICS Team 
http://maxime-lefrancois.info 
@Max_Lefrancois
Received on Friday, 27 April 2012 14:31:29 UTC