Re: language-tagged literal datatypes from Antoine Zimmermann on 2011-08-26 (public-rdf-wg@w3.org from August 2011)

From: Antoine Zimmermann <antoine.zimmermann@insa-lyon.fr>
Date: Fri, 26 Aug 2011 17:22:32 +0200
To: public-rdf-wg@w3.org
Message-ID: <4E57BA38.4050405@insa-lyon.fr>
Richard, Pat, Pierre-Antoine (and all),


Let me re-examine the alternatives. We apparently all agree that there 
must be a new class rdf:LangString, but there's some disagreement on 
what it should exactly be.



A. What if it is not a datatype?

  - DATATYPE("foo"@en) should not return rdf:LangString, since it would 
not be a datatype, unless the SPARQL WG agrees to make an exception.
  - The interpretation of language-tagged literals would have to be 
defined separately, as it is already the case in RDF 2004.
  - A datatype like rdf:PlainLiteral would still be needed to be able to 
build the kind of custom datatypes that can be defined in OWL (e.g., 
lower case British English strings).

I think that this option (rdf:LangString not a datatype) would be 
acceptable if we additionally introduce language-specific datatypes. 
Otherwise, it seems the improvement is too limited.


B. What if it is a datatype?

If it's a datatype, rdf:LangString must have exactly all the 
characteristics that all datatypes have. It must follow the definition. 
Now, the definition can either be kept as is (as in RDF 2004) or modified.
  B1: If we keep the RDF 2004 definition, I think we can't really do any 
better than what the rdf:PlainLiteral spec is doing. The difference is 
that we do not need to deal with untagged plain literals, so the lexical 
form is quite straightforward: "foo@en"^^rdf:LangString would be the 
abstract syntax of "foo"@en.
  B2: If we change the definition of datatypes, I can see two ways:
   *B2x: do as Pat suggested, that is, allow the lexical space to 
contain tuples; then everything else follows the standard definitions 
(DATATYPE("foo"@en) is as per the normal rule, the semantics is as per 
the normal semantics of typed literals, etc)
   *B2y: include in the definition of datatypes an exception, that is, 
say "A datatype is either: (i) a combination of 3 parts: lexical space, 
value space and L2V; or (ii) rdf:LangString. Then you have to define the 
semantics of rdf:LangString literals differently from other types, since 
there is no L2V. DATATYPE("foo"@en) would follow the normal rule.

Whichever path we choose, we could also introduce language-specific 
datatypes.

Personally, I have no problem with B1, it's straightforward, has the 
advantages of rdf:PlainLiteral without the awkward "@" for 
non-lang-tagged strings (since it does not deal with non-lang-tagged 
strings at all) and it has a better name.
Then I prefer B2x over B2y, but could live with either solutions.


Hope this helps,
-- 
Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
France
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex
France
antoine.zimmermann@insa-lyon.fr
http://zimmer.aprilfoolsreview.com/
Received on Friday, 26 August 2011 15:23:44 UTC