[issue 3] How to specify the entity type

Hi, 



When one says that a fragment of text is identified as a named entity, we need to be very careful about how we represent the URI of the entity. 


In the following, I comment on pros and cons of three cases using only html, (I definitly vote for case C for HTML5) 


//case A: id 
We represent the named entity <span id="Barcelona" lang="en">Barcelona</span> using an id in document < http://www.example.org/en/doc.html >: 
meaning: the uri of the span element is < http://www.example.org/en/doc.html#Bacelona >. 
-1: we can't refer to an entity described elsewhere on the web (we can't refer to the entity < http://dbpedia.org/resource/Barcelona > for instance. 
-1: we can't refer to this very same entity elsewhere in the same document. 
-1: if we wan't to refer to this very same entity in another document, like for instance the translated version < http://www.example.org/ru/doc.html >, we can't do it using the same mechanism (the uri of the named entity would be < http://www.example.org/ru/doc.html#Bacelona >, which is different. 
-1: semantic problem: we give a uri to the html element, and not to the named entity identifying the fragment of text that this element contains. 
conclusion: This mechanism may thus be usable when one describes a language-independent xml document, but not for HTML5 documents. It was valid for the internationalization of xml documents in ITS 1.0, it is not applicable on the web using HTML5. 



//case B: class (not a good idea) 
We represent the named entity <span class=" http://dbpedia.org/resource/Barcelona " lang="en">Barcelona</span> using the HTML5 class attribute in accordance with http://dev.w3.org/html5/spec/infrastructure.html#extensibility , 
meaning: we create an element of type < http://dbpedia.org/resource/Barcelona >, using the most applicable existing "real" HTML element which is <span>. 
reformulation: we have the resource < http://dbpedia.org/resource/Barcelona > that is labeled "Barcelona" in english, and we make this english label visible in the html5 document. 
+1: browsers will support it, css can be applied on elements of this type, javascript extensions can detect these elements, and a web api could be designed along with the recommendation to deal with this 
+1: we can refer to the same entity in different places of the same document, and in different documents. -1: there is no standard prefix mechanism, so this case will lead to a very verbose document. 

-1: the semantics of this case is very ambiguous, shall anyone somewhere on the web put an uri in the class attribute, would we interpret it as a named entity ? conclusion: This mechanism is not desirable. 




//case C: its-* attributes (would be the best for HTML5) 
We represent the named entity <span its-lexicalizes=" http://dbpedia.org/resource/Barcelona " lang="en">Barcelona</span> using attributes in accordance with http://dev.w3.org/html5/spec/infrastructure.html#extensibility: 
meaning: no special meaning, the text content of this span lexicalizes the named entity < http://dbpedia.org/resource/Barcelona > in english. 
+1: browsers will support it, css can't be applied directly but javascript extensions can detect these elements, and a web api could be designed along with the recommendation to deal with this 
+1: we can refer to the same entity in different places of the same document, and in different documents. 


-1: without using RDFa, there is no standard prefix mechanism, so this mechanism will lead to a very verbose document. 
-1: this could map exactly to case RDFa-B, but without using RDFa, thus disabling the benefits of SemanticWeb in general. for instance, we can't reason on the triples and ask simple SPARQL queries such as "give me all the lexicalizations of named entity < http://dbpedia.org/resource/Barcelona >" using semanticweb technologies. 


// question: 
anyone sees mistakes ? pros and cons that I haven't mentioned ? other possible cases ? 

Best, 
Maxime Lefrançois 
Ph.D. Student, INRIA - WIMMICS Team 
http://maxime-lefrancois.info 
@Max_Lefrancois

Received on Friday, 27 April 2012 14:06:05 UTC