W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2011

Re: Simplified proposal for string literals

From: Pat Hayes <phayes@ihmc.us>
Date: Wed, 18 May 2011 09:06:07 -0500
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <03521CCE-0D16-4D06-8D79-73DA5F66A695@ihmc.us>
To: Richard Cyganiak <richard@cyganiak.de>

On May 18, 2011, at 7:56 AM, Richard Cyganiak wrote:

> On 17 May 2011, at 23:57, Pat Hayes wrote:
>> As my proposed extension to rdf:PlainLIteral seems to have fallen on deaf ears, allow me to suggest a simplified version of it which might be more acceptable. There are two versions. In the first, plain literals are no longer strings. so the current equivalence between "string" and "string"^^xsd:string no longer applies. The second keeps this equivalence. 
>> 
>> Veraion A
> (snip)
> 
> Version A seems to make matters worse, as "foo" and "foo"^^xsd:string are still both allowed, are still distinct in the abstract syntax, but now actually are semantically different as well for no reason.
> 
>> Version B
>> 
>> 1.  rdf:PlainLIteral is a unique special datatype, built into basic RDF (along with rdf:XMLLIteral) with a special, unique formulation. It applies to plain literal syntax, which is thought of as specifying either a character string, or a pair of a string and a language tag.  The L2V mapping of this datatype takes both strings and pairs <string, tag> to themselves, ie it is the identity mapping on strings and on pairs. 
>> Put another way, the datatype value of "string" is  string  and of "string"@tag is <string, tag>. 
>> Every plain literal in RDF has the datatype rdf:PlainLIteral, even though this name is not used explicitly in the literal syntax. 
>> 
>> 2. rdf:PlainLIteral MUST NOT be used as an explicit datatype name in any RDF literal of the form "string"^^datatype. LIterals of the form "string@tag"^^rdf:PlainLiteral MUST be rewritten as a plain literal "string"@tag or flagged as an error.
>> 
>> 3. "string" and "string"^^xsd:string are equivalent, so to avoid equality reasoning, the datatype xsd:string is deprecated in RDF. RDF SHOULD NOT use xsd:string as a datatype in typed literals, and applications MAY rewrite any literal typed with xsd:strong as a plain literal with no language tag. 
> 
> Version B seems to be quite similar in its practical implications to the earlier proposal of allowing xsd:string typed literals to also have a language tag in the abstract syntax:
> http://lists.w3.org/Archives/Public/public-rdf-wg/2011May/0175.html

Except that it does not change the current abstract syntax, which I take to be a rather large advantage. 

> 
> Except Version B has two disadvantages over that proposal.
> 
> First, it standardizes on rdf:PlainLiteral, deprecating the well-known xsd:string; while the other proposal standardizes on xsd:string, keeping rdf:PlainLiteral only in its current fringe role as compatibility band-aid for systems that can't handle the presence of language tags. (see footnote 1)

I had formed the impression that there was a consensus against typing plain literals with xsd:string. If this is wrong, then obviously I need to get this clear. Can we decide this issue in isolation, I wonder? There seem to be several communities which take different positions on this rather basic choice. Ignoring language tags for the moment, should a plain, untyped string used as a literal in RDF be considered to have the type xsd:string or the type rdf:PlainLiteral, or to not have a type at all? 

Seems to me that while xsd:string is well-known, several people has suggested that it is not widely used in RDF data (compared to plain literals) and it has the problem that it cannot deal with language tags. I know that some people feel strongly that "chat" in English and "chat" in French are distinct entities, and should be counted as distinct. Reducing them to a single string value destroys this ability. 

> 
> Second, it retains "foo" and "foo"^^xsd:string as distinct in the abstract syntax (they become "foo"^^rdf:PlainLiteral and "foo"^^xsd:string).

No, the proposal is not to have "foo"^^rdf:PlainLiteral in the syntax, even in the abstract syntax. In fact, it explicitly prohibits literals typed with 'rdf:PlainLIteral'. The proposal is to keep the syntax of plain literals exactly as it is at present. 

> This means there are still two triples, and implementations may or may not rewrite one to the other

The proposal explicitly says that xsd:string literals can be rewritten to plain literals. I would be happy if it said MUST be, in fact. 

> , with all the disadvantages that Eric discussed. In the other proposal, both "foo" and "foo"^^xsd:string would become "foo"^^xsd:string in the abstract syntax, and one would only find "foo"^^rdf:PlainLiteral in systems that can't represent "foo"^^xsd:string@en.

But that proposal, as I understand it, requires language tags in literals where language tags make no sense at all. I do not think this will fly, frankly: the user push-back will be overwhelming. 

Pat


> 
> So I'd argue that the proposal linked above, amended according to footnote 1, is preferable over Version B.
> 
> Footnote 1: The proposal linked above suggests to completely remove rdf:PlainLiteral. That doesn't work, it has to be kept as a compatibility band-aid, like it is now.
> 
> Best,
> Richard
> 
> 
>> 
>> --------
>> 
>> Either way, this keeps existing plain literal syntax exactly as it is at present, does not require anyone to rewrite any up-front code, and retains the rdf:PlainLIteral typing without getting involved with the trailing-@ messiness. It  requires one exception in the RDF semantics to allow this slightly nonstandard datatype, but I don't think this is of any importance at all, especially as the L2V mapping is so trivial. It will require short changes to Concepts and Semantics, and a quick check over Testcases, but we will be doing this anyway. 
>> 
>> FWIW, I marginally prefer  version B, as it settles the xsd:string business once and for all. But only marginally.
>> 
>> Pat
>> 
>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 18 May 2011 14:06:38 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:42 GMT