W3C home > Mailing lists > Public > public-rdf-wg@w3.org > May 2011

Re: Simplified proposal for string literals

From: Richard Cyganiak <richard@cyganiak.de>
Date: Wed, 18 May 2011 20:46:55 +0100
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <05B04004-7ACB-44A4-9448-4C4B4A897246@cyganiak.de>
To: Pat Hayes <phayes@ihmc.us>

On 18 May 2011, at 19:51, Pat Hayes wrote:
> By 'abstract syntax' you mean the graph syntax, right?

Yes.

> As far as the internal form of URIs and literals is concerned, the graph syntax and surface syntaxes are virtually the same.

Well, no, there are many different surface syntaxes, some of them contain syntactic sugar, and some of the proposals under discussion hinge on adding more syntactic sugar. So there's plenty of potential for confusion.

> I thought that the proposal added language tags even to things like "123"^^xsd:number, etc.., which was what I was referring to. But maybe that was a misapprehension. It is hard to keep track of the many proposals :-)

Most certainly!

>> 1) every literal has *both* a datatype and a (possibly empty) language tag;
>> 2) only xsd:string can have non-empty language tags;
> 
> ?? Why do we want to hallucinate tags which are required to be empty on all typed literals? This seems weird to me. 

It *is* weird, but so is the status quo, and so is every single proposal that has been brought forward. At least one aspect of the solution is going to be strange.

>> 3) plain literals don't exist;
>> 4) rdf:PlainLiteral only for use inside systems that can't do language tags;
> 
> ?? Are there any such systems?

OWL? RIF? (Not sure)

> I thought the point of this datatype was to provide a type for 'untyped' , ie plain, literals. 

I thought the point is that some downstream communities don't want to buy into RDF's unusual <lexical, langtag, datatype> representation for literals, and instead want to use the more common <lexical, datatype> representation. That's my impression from reading the introduction of the spec, anyways.

>> 5) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
> 
> OK, but... 
> 
>> 6) "foo"@en in concrete syntaxes is syntactic sugar for "foo"^^xsd:string@en.
> 
> ... why change the literal syntax in this way?

Well, why not? We do agree that it has to be changed *somehow* if we want to unify the different representations of the same thing, right? And this has the benefit of keeping the established (way beyond RDF) xsd:string type around.

>> 7) the value of "xxx"^^yyy is L2V_yyy(xxx)
>> 8) the value of "xxx"^^yyy@zzz is <L2V_yyy(xxx), zzz>
> 
> But that surely means that the type of such al literal should *not* be xsd:string, since strings cannot contain language tags.

But there are no strings containing language tags in the proposal!

If we have a literal that has both a type (necessarily xsd:string) and a language tag, and consider the abstract syntax, then it's not really the lexical form that's being tagged, but the <lexical form, datatype> pair. In other words, you can think of the tag as annotating an xsd:string typed literal. And I can't see anything inherently wrong with tagging an xsd:string typed literal as @en. It makes sense for xsd:strings; current RDF just happens to somewhat arbitrarily forbid it.

> And so we still do not have a type for the values of plain tagged literals.

True, but I have not yet seen any proposal addressing this. rdf:PlainLiteral covers tagged and untagged.

> The problem, seems to me, is that a "language-tagged string" really is not a string.

Well, I'm not saying that it is. I'm saying it's a <string, tag> pair.

> What are the equality (sameAs) rules for tagged literal values, in this proposal? Is this consistent:
> 
> _:x owl:sameAs "chat"@en .
> _:x owl:sameAs "chat"@fr .

No it's not, because minus syntactic sugar we get:

_:x owl:sameAs "chat"^^xsd:string@en .
_:x owl:sameAs "chat"^^xsd:string@fr .

And the proposal says:

8) the value of "xxx"^^yyy@zzz is <L2V_yyy(xxx), zzz>

L2V_xsd:string("xxx") is still the unicode string "xxx", like it always was. So the values of the two original literals are a pair <"chat", en> and a pair <"chat", fr>, like they always were. So the graph is inconsistent because those pairs are different from each other.

Best,
Richard



> 
> ? I think it should not be. 
> 
> Pat
> 
>> 
>> Best,
>> Richard
>> 
>> 
>> 
>> 
>> 
>>> 
>>> Pat
>>> 
>>> 
>>>> 
>>>> So I'd argue that the proposal linked above, amended according to footnote 1, is preferable over Version B.
>>>> 
>>>> Footnote 1: The proposal linked above suggests to completely remove rdf:PlainLiteral. That doesn't work, it has to be kept as a compatibility band-aid, like it is now.
>>>> 
>>>> Best,
>>>> Richard
>>>> 
>>>> 
>>>>> 
>>>>> --------
>>>>> 
>>>>> Either way, this keeps existing plain literal syntax exactly as it is at present, does not require anyone to rewrite any up-front code, and retains the rdf:PlainLIteral typing without getting involved with the trailing-@ messiness. It  requires one exception in the RDF semantics to allow this slightly nonstandard datatype, but I don't think this is of any importance at all, especially as the L2V mapping is so trivial. It will require short changes to Concepts and Semantics, and a quick check over Testcases, but we will be doing this anyway. 
>>>>> 
>>>>> FWIW, I marginally prefer  version B, as it settles the xsd:string business once and for all. But only marginally.
>>>>> 
>>>>> Pat
>>>>> 
>>>>> 
>>>>> 
>>>>> ------------------------------------------------------------
>>>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>>> Pensacola                            (850)202 4440   fax
>>>>> FL 32502                              (850)291 0667   mobile
>>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>> 40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 
Received on Wednesday, 18 May 2011 19:47:24 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:25:42 GMT