Re: Simplified proposal for string literals

On 18 May 2011, at 15:06, Pat Hayes wrote:
>>> Version B
>>> 
>>> 1.  rdf:PlainLIteral is a unique special datatype, built into basic RDF (along with rdf:XMLLIteral) with a special, unique formulation. It applies to plain literal syntax, which is thought of as specifying either a character string, or a pair of a string and a language tag.  The L2V mapping of this datatype takes both strings and pairs <string, tag> to themselves, ie it is the identity mapping on strings and on pairs. 
>>> Put another way, the datatype value of "string" is  string  and of "string"@tag is <string, tag>. 
>>> Every plain literal in RDF has the datatype rdf:PlainLIteral, even though this name is not used explicitly in the literal syntax. 
>>> 
>>> 2. rdf:PlainLIteral MUST NOT be used as an explicit datatype name in any RDF literal of the form "string"^^datatype. LIterals of the form "string@tag"^^rdf:PlainLiteral MUST be rewritten as a plain literal "string"@tag or flagged as an error.
>>> 
>>> 3. "string" and "string"^^xsd:string are equivalent, so to avoid equality reasoning, the datatype xsd:string is deprecated in RDF. RDF SHOULD NOT use xsd:string as a datatype in typed literals, and applications MAY rewrite any literal typed with xsd:strong as a plain literal with no language tag. 
>> 
>> Version B seems to be quite similar in its practical implications to the earlier proposal of allowing xsd:string typed literals to also have a language tag in the abstract syntax:
>> http://lists.w3.org/Archives/Public/public-rdf-wg/2011May/0175.html
> 
> Except that it does not change the current abstract syntax, which I take to be a rather large advantage. 

But *both* proposals change the abstract syntax!

Currently, a typed literal is a pair of unicode string and datatype IRI.

You add an exception to that: If the type is rdf:PlainLiteral, then the other part is not a string but a pair of string and language tag.

> I had formed the impression that there was a consensus against typing plain literals with xsd:string. If this is wrong, then obviously I need to get this clear.

There was support for this option from some people, for example Lee and I believe Alex. See for example here:
http://lists.w3.org/Archives/Public/public-rdf-wg/2011May/0167.html

> Ignoring language tags for the moment, should a plain, untyped string used as a literal in RDF be considered to have the type xsd:string or the type rdf:PlainLiteral, or to not have a type at all? 

This seems to be hard for some to understand: You cannot consider this question while ignoring language tags! They are the only reason why this is complicated at all, and they are damn important!

> Seems to me that while xsd:string is well-known, several people has suggested that it is not widely used in RDF data (compared to plain literals)

That is not necessarily true -- seeing xsd:string in OWL files is extremely common. Part of that might be the desire to specify the range of a property as being strings of some sort.

> and it has the problem that it cannot deal with language tags. I know that some people feel strongly that "chat" in English and "chat" in French are distinct entities, and should be counted as distinct. Reducing them to a single string value destroys this ability.

You're knocking down strawmen. "chat"^^xsd:string@en and "chat"^^xsd:string@fr *are* distinct entities.

>> Second, it retains "foo" and "foo"^^xsd:string as distinct in the abstract syntax (they become "foo"^^rdf:PlainLiteral and "foo"^^xsd:string).
> 
> No, the proposal is not to have "foo"^^rdf:PlainLiteral in the syntax, even in the abstract syntax. In fact, it explicitly prohibits literals typed with 'rdf:PlainLIteral'. The proposal is to keep the syntax of plain literals exactly as it is at present. 

Please, Pat, if you say “syntax”, then say “abstract syntax” or “concrete syntax”. These are completely different things. I read you proposal above as saying that rdf:PlainLiteral *would* be in the abstract syntax, but *not* in the concrete syntax. It's really hard to understand what you're trying to say if you don't keep those apart.

> ut that proposal, as I understand it, requires language tags in literals where language tags make no sense at all. I do not think this will fly, frankly: the user push-back will be overwhelming. 

So I write "chat"@en in Turtle, and that is interpreted as: has a datatype xsd:string, and a language tag "en". What about this is it that makes no sense at all? What do you see as triggering user pushback?

(AndyS and SteveH gave some examples of implementation difficulties that I still have to respond to, but I don't think those are what you are talking about?)

Here's a clarified version of the proposal:

1) every literal has *both* a datatype and a (possibly empty) language tag;
2) only xsd:string can have non-empty language tags;
3) plain literals don't exist;
4) rdf:PlainLiteral only for use inside systems that can't do language tags;
5) "foo" in concrete syntaxes is syntactic sugar for "foo"^^xsd:string.
6) "foo"@en in concrete syntaxes is syntactic sugar for "foo"^^xsd:string@en.
7) the value of "xxx"^^yyy is L2V_yyy(xxx)
8) the value of "xxx"^^yyy@zzz is <L2V_yyy(xxx), zzz>

Best,
Richard





> 
> Pat
> 
> 
>> 
>> So I'd argue that the proposal linked above, amended according to footnote 1, is preferable over Version B.
>> 
>> Footnote 1: The proposal linked above suggests to completely remove rdf:PlainLiteral. That doesn't work, it has to be kept as a compatibility band-aid, like it is now.
>> 
>> Best,
>> Richard
>> 
>> 
>>> 
>>> --------
>>> 
>>> Either way, this keeps existing plain literal syntax exactly as it is at present, does not require anyone to rewrite any up-front code, and retains the rdf:PlainLIteral typing without getting involved with the trailing-@ messiness. It  requires one exception in the RDF semantics to allow this slightly nonstandard datatype, but I don't think this is of any importance at all, especially as the L2V mapping is so trivial. It will require short changes to Concepts and Semantics, and a quick check over Testcases, but we will be doing this anyway. 
>>> 
>>> FWIW, I marginally prefer  version B, as it settles the xsd:string business once and for all. But only marginally.
>>> 
>>> Pat
>>> 
>>> 
>>> 
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>> 40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 

Received on Wednesday, 18 May 2011 17:27:01 UTC