Re: varieties of datatyped tagged literals from Pierre-Antoine Champin on 2011-09-07 (public-rdf-wg@w3.org from September 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Wed, 07 Sep 2011 18:42:41 +0200
To: Pat Hayes <phayes@ihmc.us>
CC: RDF Working Group WG <public-rdf-wg@w3.org>, Ivan Herman <ivan@w3.org>
Message-ID: <4E679F01.1080504@liris.cnrs.fr>
Following todays's discussion, let me rephrase the rationale of each
"family" of solution:

1. Don't change anything: literals will have *either* a datatype or a
literal.

In the following options, we unify literals by ensuring that every
literal has a datatype.

2. The language tag is still "outside" the (lexical/value) mechanism of
the datatype; the various sub-options differ in how this
extra-information is introduced in the system.

In the following options, we unify literals even more by making
language-tagged literals a special case of datatyped literal.

3. The language tag is attached to the by the datatype.

4. The language tag is attached to the lexical form.


I agree with Pat: the longer I think about it, the better 4 looks after all.

I know that the pain is in the ugly lexical form "chat@fr", but I would
expect the following arrangements to make that bearable:

* SPARQL would have to define a special case for the str() function, so
that it does not return the *full* lexical form (e.g. "chat@fr") but the
*stripped* one (e.g. "chat").

* APIs could arrange similarly. Jena, for example, could return "chat"
for Literal.toString(), but "chat@fr" for Literal.getLexicalForm(),
though I suspect this may cause some backward compatibility problem.

Another option would be to let Literal.getLexicalForm() return "chat" as
before (documenting the fact that, in that case, this is not the "real"
lexical form) and introduce a new method Literal.getFullLexicalForm()
return "chat@fr", for the sake of completeness.

But those are minor pains compared to the implications of any other
solution, I think.

  pa

PS: of course, the WG will not tell the API implementors what to do, but
it should probably provide guidelines about how to handle the changes in
RDF 1.1 .


On 09/07/2011 06:10 AM, Pat Hayes wrote:
> OK, sorry this is late, but here is my best attempt to summarize the various options for how to handle datatyping of tagged literals. I have tried to be objective and up to date, but feel free to correct any mistakes y'all might still find here. Thanks to Pierre-Antoine and Richard for recent corrections. 
> 
> Throughout, I will illustrate with the literal "foo"@tag. In some cases it is necessary to distinguish this surface syntax from the abstract "real" syntax form. As SPARQL refers to the 'lexical form' of a literal, which has to be a string, to be returned by STR(), I will list what this is in each case. 
> 
> In all cases, the value is the pair <"foo", tag>.
> 
> 1. Current state: tagged literals have no type.   
> 
> 2. Lexical form is "foo", datatype is rdf:TaggedLiteral. There are various ways to "fix" the spec to make this possible:
> 
> 2a. Abstract syntax is a pair  <"foo", str>, and we modify the RDF datatype definitions to allow an L2V mapping from pairs to pairs. (Pain: major change to specs, possible clash with OWL and XSD specs.) 
> 2b. There is no L2V mapping, and this datatype is anomalous but specified by the RDF semantics directly, and is a datatype by fiat. (Pain: this datatype is anomalous and must not be used with the ^^ syntax.) 
> 2c. The abstract syntax has no lexical form, the dataype is empty and the L2V is the empty mapping. Nevertheless, the value is linked to the present syntax by the RDF semantics directly and this is a datatype by fiat. (Pain: overly elaborate; the idea of an empty datatype is confusing, and having an L2V map which does not specify the actual value is even more confusing :-).)(Positive: the illegality of literals of the form "string"^^rdf:TaggedLiteral falls out automatically.) 
> 
> 3. Lexical form is "foo", datatype is unique to the tag, ie there is one datatype per tag. These are conventional datatypes with a welldefined L2V mapping. Again there are several (well, two) options based on this idea.
> 
> 3a. We invent an IRI naming convention for these datatypes, eg rdf:taggedLiteral/tag. Then this is the type of the literal. (Pain: inventing this open-ended naming convention.) 
> 3b. These per-tag datatypes are all anonymous and have no IRI, but are sub-datatypes of rdf:TaggedLiteral, which is returned as the type for them all. (Pain: overly elaborate; potentially confusing; need to define a new notion of sub-datatype.) 
> 
> 4. Lexical form is "foo@tag", where tag is required to be nonempty and not contain '@' (just as in the rdf:PlainLIteral spec). This is a conventional datatype (it is rdf:PlainLiteral restricted to nonempty tags) with a conventional L2V mapping. (Pain: might be considered to be the wrong lexical form (??)) (Positive: conforms closely to existing specs; simple; extra tag information might be useful?)
> 
> ------
> 
> On balance, my own vote is for either 2b or 4, and the longer I think about it, the better 4 looks after all. If we choose one of the 2 family, I would plead editorial discretion to be allowed to choose among them depending on which one fits best with the semantics, when we get down to details. They differ only in theoretical issues. Well, OK, I give up on 2a.
> 
> Pat
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
>
Received on Wednesday, 7 September 2011 16:43:17 UTC