Re: Proposal for ISSUE-12, string literals from Pierre-Antoine Champin on 2011-05-17 (public-rdf-wg@w3.org from May 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Tue, 17 May 2011 21:03:07 +0200
To: Steve Harris <steve.harris@garlik.com>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4DD2C66B.2060109@liris.cnrs.fr>
On 05/17/2011 11:06 AM, Steve Harris wrote:
> This idea seems to have some merit to me.
> 
> It strikes me as a little confused semantically - I'm not sure that
> integer / byte has a similar relationship to French / English, but as a
> self confessed "scruffy" it's less gubbins to express the same
> information, which is a win.

granted, this is stretching the notion of datatype a bit.
But similarly, the reason for which "chat"@fr ≠ "chat"@en are not, in my
view, the reasons for which "chat"@fr ≠ "chien"@fr .

> So, I'm guessing as a formulation that rdflang:en would be a subtype
> of xsd:string,

sounds reasonable

> and rdflang:en-GB would be a subtype of rdflang:en, and
> so on?

I hadn't thought about that, but probably, yes.
> 
> A few practical considerations:
> 
> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
> canonical case for the datatype form.

I hadn't thought of that either, but yes, canonical case sounds like the
right thing to do.

> 2) Should systems prefer language tags, or datatypes in external
> data?
> i.e. is "kludge"@en-GB the canonical form, or is it
> "kludge"^^rdflang:en-GB ? This affects RDF serialisations, and for e.g.
> SPARQL results. ^^ seems the most obvious choice in one sense, but it's
> more bytes, so less obvious in another.

I would favor the "language tag" notation as the canonical one.

> 3) What about rdf:PlainLiteral? Would this proposal make it
> obsolete?

yes -- or archaic, if you prefer :)

> 4) Is the value space all UTF-8 strings? If not, is it a type error
> to write "מחשב"^^rdflang:en?

well, currently I guess any UTF-8 string is valid. So yes, the value
space would of all those datatypes would be all UTF-8 strings, if only
for the sake of BC (and because I sure don't want to walk down that path...)

  pa

> 
> - Steve>
> On 2011-05-17, at 07:53, Pierre-Antoine Champin wrote:
> 
>> Hi all,
>>
>> here's another idea:
>>
>> why not consider language tags as special datatypes?
>> In other words,
>>
>>  "chat"@en
>>
>> would be a shortcut for something like
>>
>>  "chat"^^rdflang:en
>>
>> (even if the above notation could be forbidden in serialization
>> syntaxes, alla rdf:PlainLiteal)
>>
>> this would
>> * make everything much more regular
>> * while matching the current behaviour (a literal could not possibly
>> have a "language" datatype and another datatype)
>> * and make it more natural (in my view) to unify language-less literals
>> with xsd:string.
>>
>> Also, it seems to me that upper layers (SPARQL, programming APIs) could
>> continue working as they do (their current behaviour can easily be
>> emulated on top of this new model) and smoothly evolve to align to the
>> new model.
>>
>>  pa
>>
>>
>> On 05/14/2011 03:34 PM, Pat Hayes wrote:
>>>
>>> On May 13, 2011, at 4:47 PM, Steve Harris wrote:
>>>
>>>> On 2011-05-13, at 21:49, Pat Hayes wrote:
>>>> ...
>>>>> Advantages: Gives a type to plain literals; preserves rdf:PlainLIteral specs (extending them, but not contradicting them); allows people to use plain literals without getting involved with trailing @; and allows xsd:string to be deprecated in favor of plain literal syntax (or the reverse, of course.) 
>>>>>
>>>>> Disadvantages: might be thought too complicated; takes the notion of type slightly outside the current RDF datatype specs.  
>>>>>
>>>>> Thoughts?
>>>>
>>>> A lot of this complexity seems to stem from trying to make "foo" be an xsd:string. Instead why no go with Plan A and make "foo"^^xsd:string a plain literal.
>>>
>>> I prefer that also. But there are still some issues remaining with this step. (1) people want a 'type' for plain literals, and (b) plain literals can have language tags, which breaks current RDF datatyping. The proposal is more trying to deal with this while keeping faithful to existing RDF syntax and also the rdf:PlainLIteral work.
>>>
>>> Pat
>>>
>>>>
>>>> xsd:strings are significantly rarer than plain literals in realworld RDF data (in my experience), so it's less weird overall to de-type xsd:strings, than to try and add a type to every plain literal.
>>>>
>>>> It's not the prettiest solution but probably RDF shouldn't have had explicit xsd:strings in the first place.
>>>>
>>>> - Steve
>>>>
>>>> -- 
>>>> Steve Harris, CTO, Garlik Limited
>>>> 1-3 Halford Road, Richmond, TW10 6AW, UK
>>>> +44 20 8439 8203  http://www.garlik.com/
>>>> Registered in England and Wales 535 7233 VAT # 849 0517 11
>>>> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
>>>>
>>>>
>>>
>>> ------------------------------------------------------------
>>> IHMC                                     (850)434 8903 or (650)494 3973   
>>> 40 South Alcaniz St.           (850)202 4416   office
>>> Pensacola                            (850)202 4440   fax
>>> FL 32502                              (850)291 0667   mobile
>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
Received on Tuesday, 17 May 2011 19:03:54 UTC