Re: language-tagged literal datatypes from Pierre-Antoine Champin on 2011-08-22 (public-rdf-wg@w3.org from August 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Mon, 22 Aug 2011 10:05:45 +0200
To: Pat Hayes <phayes@ihmc.us>
CC: "public-rdf-wg@w3.org Group WG" <public-rdf-wg@w3.org>
Message-ID: <4E520DD9.7050003@liris.cnrs.fr>
Thanks Pat for this very nice synthesis,

may I propose a variant of Option 3, which main advantage is to keep the
datatype model intact, but which seems to bug most people because of the
infinite number of URIs it generates.

Option 3a. All literals have a type.  Each language tag defines a
datatype which is unique to that tag, and whose L2V mapping takes a
string and produces a language-tagged string tagged with that
particular tag. These datatypes are conventional but are anonymous:
they do not have any standard URI. These would all be subclasses of
rdf:LangString. DATATYPE("foo"@en) returns rdf:LangString, as it is
the most specific URI for describing its actual datatype;
LANG("foo"@en) returns "en".

Note that I removed the part stating that "rdf:LangString is not itself
a datatyp". If we keep it, we have the oddity of DATATYPE() returning a
non-datatype. But can't we arrange to keep it as a datatype anyway? I
must say I am not sure what DT1 subclassof DT2 implies if DT2 is a
datatype...

  pa


On 08/19/2011 01:11 AM, Pat Hayes wrote:
> As promised (http://www.w3.org/2011/rdf-wg/track/actions/76) a summary of various options for how to handle language-tagged literals. This builds on and uses the terminology of [1].
> 
> Option 1 (minimalist). Language-tagged literals have no datatype and hence are distinct from all other literals, which are typed.  rdf:LangString is a class name but not a datatype. DATATYPE("foo"@en) returns an error message.
> 
> Option 1a. Just as option 1, except that DATATYPE("foo"@en) returns rdf:LangString, even though it is not called a datatype and does not have a defined L2V mapping. 
> 
> Option 2. All literals have a type. rdf:LangString is a special datatype whose L2V mapping takes a pair of strings as input and returns a language-tagged pair as output. This mapping is the identity mapping on pairs <string, tag>, just as xsd:String is the identity mapping on single strings. DATATYPE("foo"@en) returns rdf:LangString, following the normal rules for datatyping. 
> 
> Option 3. All literals have a type.  Each language tag defines a datatype which is unique to that tag, and whose L2V mapping takes a string and produces a language-tagged string tagged with that particular tag. These datatypes are conventional but we would need to invent some kind of naming convention for them, perhaps rdf:LangString/en, rdf:LangString/fr, etc.. These would all be subclasses of rdf:LangString, which would not itself be a datatype. DATATYPE("foo"@en) returns rdf:LangString/en, following the normal rules for datatyping.
> 
> The pros and cons of these, as far as I can see them:
> 
> option 1: + minimal change -- does not resolve the muddle -- causes needless SPARQL errors
> option 1a: + almost-minimal change + removes SPARQL errors -- introduces a confusing exception with no rationale
> option 2: + simplifies literal syntax + removes SPARQL errors + theoretically clean -- requires change to the datatyping model
> option 3: + simplifies literal syntax + removes SPARQL errors + gives access to tag information + theoretically clean -- requires an 'open extendable' rdf vocabulary. 
> 
> ----------
> 
> A few other thoughts which might be worth taking into consideration. 
> 
> = The semantic change needed for option 2 really is semantically trivial, and it might have other uses. If we say that the L2V mapping takes as input all the syntactic  'components' of a literal, rather than forcing these to be all inside one string, then we allow such things as literals with latitude and longitude denoting positions, complex numbers with real and imaginary parts, etc.., without forcing people to invent coding tricks (like the trailing '^' in rdf:PlainLiteral) to artificially map these into a single string. This might be a genuinely useful extension, in other words. We can also quietly deprecate rdf:PlainLiteral along with 8-track tape players.
> 
> = If a SPARQL querier wants to determine the actual language tag in use, option 2 requires them to look inside the returned value, while option 3 requires looking inside the datatype URI, and can be determined from a DATATYPE query. I have no idea which of these is hardest to handle, but it might be worth thinking about the difference if it matters to anyone. 
> 
> Pat
> 
> PS. FWIW, I vote for either 2 or 3, and against 1 or 1a. I prefer 2., for the reason mentioned above, and because it seems to me to be the most elegant solution. 
> 
> [1] http://lists.w3.org/Archives/Public/public-rdf-wg/2011Jul/0048.html
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Monday, 22 August 2011 08:06:37 UTC