Datayped tagged literals: a case for option 4 vs option 2d


I'd like to discuss here 2 options for lang tagged literals, viz., 
option 2d (Richard's proposal) and option 4 (one datatype with non-empty 
lexical space).

Richard's solution consists in adding a datatype with empty lexical 
space, and define the abstract syntax and semantics of these literals in 
a ad hoc fashion.
I don't have a problem with having an empty lexical space but I find 
issues in that proposal because I find that it does not make things more 
uniform than previously, with the only exception that tagged literals 
would have a datatype.

  - First, syntactically, tagged literals are an exception to the 
standard typed literals, since it would be inconsistent to write 
anything of the form "xxx"^^rdf:LangString.
  - Second, semantically, tagged literals are an exception too, since 
standard tagged literals are normally interpreted according to the L2V 
mapping, which is empty in this case.
  - Third, in SPARQL, the DATATYPE keyword would have to be redefined 
with an exception, since currently SPARQL says nothing about typed 
literals which cannot be written in the form "xxx"^^dt. I especially 
don't like when the RDF working group imposes a change to another WG's 
specification, especially at such a late stage.

In contrast, I find that making rdf:LangString a "normal" datatype with 
a non-empty lexical space makes everything more uniform (option 4).

  - Syntactically, xxx@lll would simply be a shortcut for the abstract 
syntax "xxx@lll"^^rdf:LangString. The only exceptional feature would be 
that we recommend the concrete syntax xxx@lll, but we already made such 
an exception for "xxx"^^xsd:string, which we recommend to write "xxx".
  - Semantically, tagged literals would be interpreted as standard typed 
literals through the L2V mapping.
  - In terms of SPARQL specs, DATATYPE(xxx@lll) would be rdf:LangString, 
as required already by SPARQL without any change (since it is in fact 
the same as DATATYPE("xxx@lll"^^rdf:LangString)).

Now, responding a concern raised by Andy who said that no matter the 
option chosen, tagged literals are made "special" in some way [1]. I do 
not think so. Option 4 makes lang tags part of the lexical form, such 
that language must be accessed by parsing the literal. That's how 
information from a literal should always be accessed. For instance, time 
zone, year, hour, date in xsd:datetimeStamp are obtained by parsing the 
lexical form. Same for exponent in xsd:float, same for any component of 
any typed literals. Why should it be different for lang tagged strings?

In terms of pure specs, I think option 4 is much more elegant and easy. 
However, I understand that there are practical issues with option 4: in 
SPARQL, STR(xxx@lll) should send back "xxx@lll" instead of "xxx", unless 
an exception is added to SPARQL. I would not mind getting this, but I 
understand that this can be unpleasant to some people. Other problems 
exist wrt APIs, and this may have consequences on existing 
implementation. I am not sure to what extent this is causing troubles.

All in all, I would not be against Richard's "minimal" proposal since it 
does not imply dramatic changes and could be integrated quite smoothly, 
but I still have a strong preference for option 4.

Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66

Received on Monday, 26 September 2011 08:50:59 UTC