varieties of datatyped tagged literals from Pat Hayes on 2011-09-07 (public-rdf-wg@w3.org from September 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Tue, 6 Sep 2011 23:10:36 -0500
To: RDF Working Group WG <public-rdf-wg@w3.org>
Cc: Ivan Herman <ivan@w3.org>
Message-Id: <A3254E3B-72F4-41C1-8A2A-27DF0EC28E2B@ihmc.us>

OK, sorry this is late, but here is my best attempt to summarize the various options for how to handle datatyping of tagged literals. I have tried to be objective and up to date, but feel free to correct any mistakes y'all might still find here. Thanks to Pierre-Antoine and Richard for recent corrections.

Throughout, I will illustrate with the literal "foo"@tag. In some cases it is necessary to distinguish this surface syntax from the abstract "real" syntax form. As SPARQL refers to the 'lexical form' of a literal, which has to be a string, to be returned by STR(), I will list what this is in each case.

In all cases, the value is the pair <"foo", tag>.

1. Current state: tagged literals have no type.

2. Lexical form is "foo", datatype is rdf:TaggedLiteral. There are various ways to "fix" the spec to make this possible:

2a. Abstract syntax is a pair <"foo", str>, and we modify the RDF datatype definitions to allow an L2V mapping from pairs to pairs. (Pain: major change to specs, possible clash with OWL and XSD specs.)
2b. There is no L2V mapping, and this datatype is anomalous but specified by the RDF semantics directly, and is a datatype by fiat. (Pain: this datatype is anomalous and must not be used with the ^^ syntax.)
2c. The abstract syntax has no lexical form, the dataype is empty and the L2V is the empty mapping. Nevertheless, the value is linked to the present syntax by the RDF semantics directly and this is a datatype by fiat. (Pain: overly elaborate; the idea of an empty datatype is confusing, and having an L2V map which does not specify the actual value is even more confusing :-).)(Positive: the illegality of literals of the form "string"^^rdf:TaggedLiteral falls out automatically.)

3. Lexical form is "foo", datatype is unique to the tag, ie there is one datatype per tag. These are conventional datatypes with a welldefined L2V mapping. Again there are several (well, two) options based on this idea.

3a. We invent an IRI naming convention for these datatypes, eg rdf:taggedLiteral/tag. Then this is the type of the literal. (Pain: inventing this open-ended naming convention.)
3b. These per-tag datatypes are all anonymous and have no IRI, but are sub-datatypes of rdf:TaggedLiteral, which is returned as the type for them all. (Pain: overly elaborate; potentially confusing; need to define a new notion of sub-datatype.)

4. Lexical form is "foo@tag", where tag is required to be nonempty and not contain '@' (just as in the rdf:PlainLIteral spec). This is a conventional datatype (it is rdf:PlainLiteral restricted to nonempty tags) with a conventional L2V mapping. (Pain: might be considered to be the wrong lexical form (??)) (Positive: conforms closely to existing specs; simple; extra tag information might be useful?)

------

On balance, my own vote is for either 2b or 4, and the longer I think about it, the better 4 looks after all. If we choose one of the 2 family, I would plead editorial discretion to be allowed to choose among them depending on which one fits best with the semantics, when we get down to details. They differ only in theoretical issues. Well, OK, I give up on 2a.

Pat

------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32502 (850)291 0667 mobile
phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes

Received on Wednesday, 7 September 2011 04:12:13 UTC