Re: Datayped tagged literals: a case for option 4 vs option 2d

Thanks Andy and Richard for your answers. You're providing arguments 
against option 4 that, I think, were not presented before. That's 
certainly of interest to all those who voted in favour of option 4.

Le 26/09/2011 12:18, Andy Seaborne a écrit :
> On 26/09/11 09:50, Antoine Zimmermann wrote:
>> All,
>> I'd like to discuss here 2 options for lang tagged literals, viz.,
>> option 2d (Richard's proposal) and option 4 (one datatype with non-empty
>> lexical space).
>> Richard's solution consists in adding a datatype with empty lexical
>> space, and define the abstract syntax and semantics of these literals in
>> a ad hoc fashion.
>> I don't have a problem with having an empty lexical space but I find
>> issues in that proposal because I find that it does not make things more
>> uniform than previously, with the only exception that tagged literals
>> would have a datatype.
>> - First, syntactically, tagged literals are an exception to the standard
>> typed literals, since it would be inconsistent to write anything of the
>> form "xxx"^^rdf:LangString.
>> - Second, semantically, tagged literals are an exception too, since
>> standard tagged literals are normally interpreted according to the L2V
>> mapping, which is empty in this case.
>> - Third, in SPARQL, the DATATYPE keyword would have to be redefined with
>> an exception, since currently SPARQL says nothing about typed literals
>> which cannot be written in the form "xxx"^^dt. I especially don't like
>> when the RDF working group imposes a change to another WG's
>> specification, especially at such a late stage.
> In SPARQL, under option 2d,
> DATATYPE("xyz"@en) would be rdf:LangString
> because RDF says so. There is no SPARQL exception.

Ok, sorry, I missread or misinterpreted the spec. I thought SPARQL was 
referring to typed literals as syntactic pairs <string, datatype> but in 
fact it's compatible with Richard's formulation.

>> In contrast, I find that making rdf:LangString a "normal" datatype with
>> a non-empty lexical space makes everything more uniform (option 4).
>> - Syntactically, xxx@lll would simply be a shortcut for the abstract
>> syntax "xxx@lll"^^rdf:LangString. The only exceptional feature would be
>> that we recommend the concrete syntax xxx@lll, but we already made such
>> an exception for "xxx"^^xsd:string, which we recommend to write "xxx".
>> - Semantically, tagged literals would be interpreted as standard typed
>> literals through the L2V mapping.
>> - In terms of SPARQL specs, DATATYPE(xxx@lll) would be rdf:LangString,
>> as required already by SPARQL without any change (since it is in fact
>> the same as DATATYPE("xxx@lll"^^rdf:LangString)).
> This is not accurate:
> 1/ Currently (SPARQL 1.0, SPARQL 1.1 LC) DATATYPE("xxx"@lll) is an error
> so there is change.

I meant: the spec would not have to change, but of course the answer 
would (as it would with option 2d as well).

> 2/ DATATYPE("xxx"@lll) would be rdf:LangString simply because RDF says
> it is which ever of option 2b or option 4 is chosen, just as
> DATATYPE(123) says that's an xsd:integer in the abstract model.
>> Now, responding a concern raised by Andy who said that no matter the
>> option chosen, tagged literals are made "special" in some way [1]. I do
>> not think so. Option 4 makes lang tags part of the lexical form, such
>> that language must be accessed by parsing the literal. That's how
>> information from a literal should always be accessed. For instance, time
>> zone, year, hour, date in xsd:datetimeStamp are obtained by parsing the
>> lexical form. Same for exponent in xsd:float, same for any component of
>> any typed literals. Why should it be different for lang tagged strings?
>  >
>> In terms of pure specs, I think option 4 is much more elegant and easy.
>> However, I understand that there are practical issues with option 4: in
>> SPARQL, STR(xxx@lll) should send back "xxx@lll" instead of "xxx", unless
>> an exception is added to SPARQL.
> So there is a special exception in SPARQL for option 4. This makes it
> very disruptive.
> And how would you get the lexical form of a literal in SPARQL if you did
> want it?

With a function of the same kind as the functions that get the year of a 
date. It could even be called "str", such that it guarantees backward 
compatibility. There could be functions specific to LangString just like 
there are functions specific to datetime, for instance.

Bah, you're probably right, this is getting too disruptive...

> SPARQL treats literals in the abstract syntax: there are three aspects
> (they don't have to be independent) each with an accessor:
> lexical form : STR
> lang tag : LANG
> datatype : DATATYPE
> SPARQL 1.1 adds constructors for terms: STRDT, SRTLANG.
>> I would not mind getting this, but I
>> understand that this can be unpleasant to some people. Other problems
>> exist wrt APIs, and this may have consequences on existing
>> implementation. I am not sure to what extent this is causing troubles.
> This would be a huge amount of trouble.
> Are there any RDF systems that would not be affected?

Certainly this would require checking lots of lines of code, but there 
are RDF systems that do not need looking at the lexical form of lang 
tagged string, especially if you just want to publish RDF. Also, many 
reasoning tasks do not require to check the lexical part of a lang 
tagged literal.

> > All in all, I would not be against Richard's "minimal" proposal since it
>> does not imply dramatic changes and could be integrated quite smoothly,
>> but I still have a strong preference for option 4.
>> [1]
> Andy

Antoine Zimmermann
ISCOD / LSTI - Institut Henri Fayol
École Nationale Supérieure des Mines de Saint-Étienne
158 cours Fauriel
42023 Saint-Étienne Cedex 2
Tél:+33(0)4 77 42 66 03
Fax:+33(0)4 77 42 66 66

Received on Monday, 26 September 2011 12:18:09 UTC