Re: [issue 3] How to specify the entity type with RDFa ? from Maxime Lefrançois on 2012-05-02 (public-multilingualweb-lt@w3.org from May 2012)

From: Maxime Lefrançois <maxime.lefrancois@inria.fr>
Date: Wed, 2 May 2012 11:17:44 +0200 (CEST)
To: Tadej Stajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <346534283.250046.1335950264880.JavaMail.root@zmbs3.inria.fr>
Hi Tadej, Dave, all, <span property="its:value" lang="en"> <meta rel="its:lexicalizes" resource=" http://dbpedia.org/resource/Barcelona "/>Barcelona</span> The two triples of case RDFa-B mean: - there is a resource _:a (a blank node) that is a mention of the concept < http://dbpedia.org/resource/Barcelona > , and that has for value in english the string literal "Barcelona". _:a can be subject (in the same document) of other its:value properties, for instance to add translation to this concept, it can be subject of other triples that would add metadata categories, and it can be object of other triples. isn't 'mentions' a little bit vague ? I could say that the following sentence mentions the concept [toy]: "Peter bought a toy to his daughter" I chose lexicalizes because it has a stronger meaning (the exclusivity of mention): no other concept than [toy] should be mentioned in the text fragment. "to lexicalize" has an occurence in the free dictionary that is interesting: Verb 1. lexicalize - make or coin into a word or accept a new word into the lexicon of a language; "The concept expressed by German `Gemuetlichkeit' is not lexicalized in English" Is "mentions" really clearer to you ? Best, Maxime Lefrançois Ph.D. Student, INRIA - WIMMICS Team http://maxime-lefrancois.info @Max_Lefrancois ----- Mail original -----
> De: "Tadej Stajner" <tadej.stajner@ijs.si>
> À: public-multilingualweb-lt@w3.org
> Envoyé: Lundi 30 Avril 2012 14:18:00
> Objet: Re: [issue 3] How to specify the entity type with RDFa ?
> Hi, Maxime,
> the RDFa-B case is the one we're aiming for, since our markup is
> mostly describing the mentions of concepts, and not necessarily
> concepts themselves. So, it's natural that it will transform into two
> triplets, mostly for the reasons you mentioned in RDFa-A.
> Also, I understand the 'lexicalizes' predicate to be equivalent to
> saying 'mentions', which I think sounds easier to communicate.
> -- Tadej
> On 4/30/2012 12:36 PM, Dave Lewis wrote:
> > Hi Maxime,
> > Thanks for this detail technical input. It important we understand
> > the
> > limitations of the various mark up, however it is also important we
> > have a clear understanding of the requirements we are trying to meet
> > here.
> > For example, i think you are right that we are dealing with a
> > specific
> > 'lexicalizes' type of attribute, but perhaps you could provide a
> > definition of the semantics of 'lexicalizes' as you understand it. I
> > assume it means that the segment we are marking up is taken to be a
> > lexical representation of a specific concept - but it would be good
> > if
> > you could clarify this.
> > Also, this is more specific than the current terminology data
> > category
> > in ITS1.0 (which just points to 'some information'), BUT it does
> > raise
> > the question of whether it addresses all the use cases covered by
> > the
> > consolidating set of related data categories suggestions currently
> > on
> > the table, i.e. is the information these data categories are point
> > to
> > always a 'concept' that is being 'lexicalized'? Could contribute
> > your
> > views under the ACTION-80 thread -
> > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0145.html
> > thanks,
> > Dave
> > On 27/04/2012 15:30, Maxime Lefrançois wrote:
> > > Hi,
> > > When one says that a fragment of text is identified as a named
> > > entity... how should we model that if we were to model that with
> > > semantic web formalisms ?
> > > In the mail
> > > http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Apr/0130.html
> > > , I commented on pros and cons of three cases using only html. In
> > > the
> > > following, I comment on pros and cons of two cases using RFAa (I
> > > definitly vote for RDFa-B, or similar.)
> > > //case RDFa-A: typeof
> > > We represent the named entity <span typeof"
> > > http://dbpedia.org/resource/Barcelona " property="its:value"
> > > lang="en">Barcelona</span> using RDFa:
> > > - meaning: there is a resource in the document (no id here so it
> > > is
> > > a
> > > blank-node) that is an instance of the resource <
> > > http://dbpedia.org/resource/Barcelona >, and that has for
> > > its:value
> > > in
> > > english by the string literal "Barcelona".
> > > - turtle translation: two triples: [ _:a rdf:type <
> > > http://dbpedia.org/resource/Barcelona > ; its:value "Barcelona"@en
> > > .
> > > ]
> > > .
> > > - Bad solution because it has a wrong meaning: the resource
> > > identified
> > > here is not the named entity but an instance of it.
> > > //case RDFa-B:
> > > We represent the named entity <span property="its:value"
> > > lang="en">
> > > <meta rel="its:lexicalizes" resource="
> > > http://dbpedia.org/resource/Barcelona "/>Barcelona</span> using
> > > RDFa:
> > > - meaning: very precise meanings : there is a resource in the
> > > document
> > > (no id here so it is a blank-node) that its:lexicalizes the
> > > resource
> > > <
> > > http://dbpedia.org/resource/Barcelona >, and that has for
> > > its:value
> > > in
> > > english the string literal "Barcelona".
> > > - turtle translation: two triples: [ _:a its:lexicalizes <
> > > http://dbpedia.org/resource/Barcelona > ; its:value "Barcelona"@en
> > > .
> > > ]
> > > .
> > > - shift of meaning: the resource identified here is not the named
> > > entity. Let class X be the domain of the property its:lexicalizes,
> > > The
> > > resource is an instance of class X. What should class X be ?
> > > // RDFa cases in general
> > > +1: browsers will support it, CSS can't be applied directly but
> > > javascript extensions can detect these elements, and a web api
> > > could
> > > be designed along with the recommendation to deal with this
> > > +1: we can refer to the same entity in different places of the
> > > same
> > > document, and in different documents.
> > > +1: RDFa permits a standard prefix mechanism (its:value will be
> > > resolved as a full uri)
> > > -1: We embrace the capabilities of all the SemanticWeb
> > > applications,
> > > especially the decoupled content management
> > > http://nemein.com/en/blog/decoupled_content_management_on_tour/ ,
> > > leverage the interoperability between existing/futur w3c standards
> > > (correct validation, easy data consumability by browser
> > > extensions,
> > > enhanced search engine results goo.gl/aCb2P , etc.)
> > > +1: if the resource that we link to is dereferencable (it has a
> > > uri
> > > that when you do a http get on it, you get a description of the
> > > resource, cf. linked data), we can use their description to get
> > > their
> > > type.
> > > +1: if the resource that we link to is not dereferencable we can
> > > insert another triple in the document to specify its type.
> > > There are very precise meanings because we use semantic web
> > > formalisms
> > > // three interesting use-cases derived from the fact we use RDFa:
> > > If the named entity < http://dbpedia.org/resource/Barcelona > is
> > > described elsewhere on the web, three use cases arise with the
> > > description: <span typeof" http://dbpedia.org/resource/Barcelona "
> > > property="its:value" lang="en"> xyz </span> , or
> > > <span property="its:value" lang="en"> <meta rel="its:lexicalizes"
> > > resource=" http://dbpedia.org/resource/Barcelona "/> xyz </span>
> > > - description : we can consider that in this document, the text
> > > fragment "xyz" is desirably identified by the named entity <
> > > http://dbpedia.org/resource/Barcelona >, so it is a good
> > > lexicalization of the named entity in a certain context (valuable
> > > piece of information)
> > > - validation : we can design validators that will look for the
> > > resource < http://dbpedia.org/resource/Barcelona > somewhere on
> > > the
> > > web, in this case it would find that there is a triple [ <
> > > http://dbpedia.org/resource/Barcelona > rdf:label "Barcelona"@en .
> > > ]
> > > and it would throw stg like a "mis-spell exception" (if a fragment
> > > of
> > > a its 1.0 annotated xml document tbx tmx xliff or whatever, is
> > > retrieved, I guess it could lead to a similar result)
> > > - edition : we can design semi-automatic systems that won't throw
> > > an
> > > exception, but that will suggest to replace "xyz" by one of the
> > > correct values found on the web, using widgets similar to the
> > > spelling
> > > correctors for instance.
> > > Best,
> > > Maxime Lefrançois
> > > Ph.D. Student, INRIA - WIMMICS Team
> > > http://maxime-lefrancois.info
> > > @Max_Lefrancois
Received on Wednesday, 2 May 2012 09:18:19 UTC