- From: Dave Lewis <dave.lewis@cs.tcd.ie>
- Date: Mon, 30 Apr 2012 11:36:32 +0100
- To: public-multilingualweb-lt@w3.org
- Message-ID: <4F9E6B30.8060407@cs.tcd.ie>
Hi Maxime, Thanks for this detail technical input. It important we understand the limitations of the various mark up, however it is also important we have a clear understanding of the requirements we are trying to meet here. For example, i think you are right that we are dealing with a specific 'lexicalizes' type of attribute, but perhaps you could provide a definition of the semantics of 'lexicalizes' as you understand it. I assume it means that the segment we are marking up is taken to be a lexical representation of a specific concept - but it would be good if you could clarify this. Also, this is more specific than the current terminology data category in ITS1.0 (which just points to 'some information'), BUT it does raise the question of whether it addresses all the use cases covered by the consolidating set of related data categories suggestions currently on the table, i.e. is the information these data categories are point to always a 'concept' that is being 'lexicalized'? Could contribute your views under the ACTION-80 thread - http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0145.html thanks, Dave On 27/04/2012 15:30, Maxime Lefrançois wrote: > Hi, > > When one says that a fragment of text is identified as a named > entity... how should we model that if we were to model that with > semantic web formalisms ? > > In the mail > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0130.html, > I commented on pros and cons of three cases using only html. In the > following, I comment on pros and cons of two cases using RFAa (I > definitly vote for RDFa-B, or similar.) > > //case RDFa-A: typeof > We represent the named entity <span > typeof"http://dbpedia.org/resource/Barcelona" property="its:value" > lang="en">Barcelona</span> using RDFa: > - meaning: there is a resource in the document (no id here so it is a > blank-node) that is an instance of the resource > <http://dbpedia.org/resource/Barcelona>, and that has for its:value in > english by the string literal "Barcelona". > - turtle translation: two triples: [ _:a rdf:type > <http://dbpedia.org/resource/Barcelona> ; its:value "Barcelona"@en . ] . > - Bad solution because it has a wrong meaning: the resource > identified here is not the named entity but an instance of it. > > //case RDFa-B: > We represent the named entity <span property="its:value" > lang="en"><meta rel="its:lexicalizes" > resource="http://dbpedia.org/resource/Barcelona"/>Barcelona</span> > using RDFa: > - meaning: very precise meanings: there is a resource in the document > (no id here so it is a blank-node) that its:lexicalizes the resource > <http://dbpedia.org/resource/Barcelona>, and that has for its:value in > english the string literal "Barcelona". > - turtle translation: two triples: [ _:a its:lexicalizes > <http://dbpedia.org/resource/Barcelona> ; its:value "Barcelona"@en . ] . > - shift of meaning: the resource identified here is not the named > entity. Let class X be the domain of the property its:lexicalizes, The > resource is an instance of class X. What should class X be ? > > // RDFa cases in general > +1: browsers will support it, CSS can't be applied directly but > javascript extensions can detect these elements, and a web api could > be designed along with the recommendation to deal with this > +1: we can refer to the same entity in different places of the same > document, and in different documents. > +1: RDFa permits a standard prefix mechanism (its:value will be > resolved as a full uri) > -1: We embrace the capabilities of all the SemanticWeb applications, > especially the decoupled content management > http://nemein.com/en/blog/decoupled_content_management_on_tour/ , > leverage the interoperability between existing/futur w3c standards > (correct validation, easy data consumability by browser extensions, > enhanced search engine results goo.gl/aCb2P , etc.) > +1: if the resource that we link to is dereferencable (it has a uri > that when you do a http get on it, you get a description of the > resource, cf. linked data), we can use their description to get their > type. > +1: if the resource that we link to is not dereferencable we can > insert another triple in the document to specify its type. > There are very precise meanings because we use semantic web formalisms > // three interesting use-cases derived from the fact we use RDFa: > If the named entity <http://dbpedia.org/resource/Barcelona> is > described elsewhere on the web, three use cases arise with the > description: <span typeof"http://dbpedia.org/resource/Barcelona" > property="its:value" lang="en">*xyz*</span> , or > <span property="its:value" lang="en"><meta rel="its:lexicalizes" > resource="http://dbpedia.org/resource/Barcelona"/>*xyz*</span> > - *description*: we can consider that in this document, the text > fragment "xyz" is desirably identified by the named entity > <http://dbpedia.org/resource/Barcelona>, so it is a good > lexicalization of the named entity in a certain context (valuable > piece of information) > - *validation*: we can design validators that will look for the > resource <http://dbpedia.org/resource/Barcelona> somewhere on the web, > in this case it would find that there is a triple [ > <http://dbpedia.org/resource/Barcelona> rdf:label "Barcelona"@en . ] > and it would throw stg like a "mis-spell exception" (if a fragment of > a its 1.0 annotated xml document tbx tmx xliff or whatever, is > retrieved, I guess it could lead to a similar result) > - *edition*: we can design semi-automatic systems that won't throw > an exception, but that will suggest to replace "xyz" by one of the > correct values found on the web, using widgets similar to the spelling > correctors for instance. > > *Best,* > Maxime Lefrançois > Ph.D. Student, INRIA - WIMMICS Team > http://maxime-lefrancois.info <http://maxime-lefrancois.info/> > @Max_Lefrancois <http://twitter.com/Max_Lefrancois>
Received on Monday, 30 April 2012 10:29:34 UTC