Re: Proposal for ISSUE-12, string literals from Steve Harris on 2011-05-18 (public-rdf-wg@w3.org from May 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 18 May 2011 09:37:47 +0100
To: Pierre-Antoine Champin <pierre-antoine@champin.net>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <09CCDFBC-920D-4D8D-BA6D-E8A683F14E8D@garlik.com>

On 2011-05-17, at 21:01, Pierre-Antoine Champin wrote:

> sorry, some second thoughts
> 
> On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
>> On 05/17/2011 11:06 AM, Steve Harris wrote:
> <snip/>
>>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>>> of xsd:string,
> 
> as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
> "chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
> is equal to a plain literal with language tag. So the respective
> datatypes should have disjoint value spaces, hence no subtype relation.
> 
>>> and rdflang:en-GB would be a subtype of rdflang:en, and
>>> so on?
> 
> I'm not even sure "en-GB" is a valid language tag, reading [1]:

It's a region subtag, see http://www.w3.org/International/articles/language-tags/
So, yes it is a valid language tag.

>  Note: When using the language tag, care must be taken not to confuse
>  language with locale. The language tag relates only to human language
>  text. Presentational issues should be addressed in end-user
>  applications.
> 
> [1]
> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-Literal
> 
> but if it is, literals with @en-GB" are disjoint from literals with @en
> and so the respective datatypes should be disjoint as well.

It's not quite that simple. @en matches @en-GB, but they're not equal c.f.
http://www.w3.org/International/articles/language-tags/#matching
and http://www.w3.org/TR/rdf-sparql-query/#func-langMatches

>>> A few practical considerations:
>>> 
>>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>>> canonical case for the datatype form.
>> 
>> I hadn't thought of that either, but yes, canonical case sounds like the
>> right thing to do.
> 
> and according to [1] again, the language tag is normalized to lowercase
> in the abstract syntax.

OK, that's easy.

> <snip />
>>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>>> to write "מחשב"^^rdflang:en?
>> 
>> well, currently I guess any UTF-8 string is valid. So yes, the value
>> space would of all those datatypes would be all UTF-8 strings, if only
>> for the sake of BC (and because I sure don't want to walk down that path...)
> 
> sorry, I was reading "lexical space".
> 
> The value space would be isomorphic to the set of UTF-8 strings, but
> different for each "language datatype". Defining it as the set of pair
> <text, language-tag> as in RDF Semantics seems like a good option.

Sounds reasonable.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 18 May 2011 08:38:16 UTC