Re: Proposal for ISSUE-12, string literals from Steve Harris on 2011-05-18 (public-rdf-wg@w3.org from May 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Wed, 18 May 2011 10:12:32 +0100
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <2C420A69-53A8-487C-9E0A-24A6DFDE53FD@garlik.com>
On 2011-05-18, at 10:07, Pierre-Antoine Champin wrote:

> On 05/18/2011 10:37 AM, Steve Harris wrote:
>> On 2011-05-17, at 21:01, Pierre-Antoine Champin wrote:
>> 
>>> sorry, some second thoughts
>>> 
>>> On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
>>>> On 05/17/2011 11:06 AM, Steve Harris wrote:
>>> <snip/>
>>>>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>>>>> of xsd:string,
>>> 
>>> as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
>>> "chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
>>> is equal to a plain literal with language tag. So the respective
>>> datatypes should have disjoint value spaces, hence no subtype relation.
>>> 
>>>>> and rdflang:en-GB would be a subtype of rdflang:en, and
>>>>> so on?
>>> 
>>> I'm not even sure "en-GB" is a valid language tag, reading [1]:
>> 
>> It's a region subtag, see http://www.w3.org/International/articles/language-tags/
>> So, yes it is a valid language tag.
> 
> I meant "valid language tag *in RDF*" of course. But I guess the URL you
> refer to can apply to RDF as well (as language tags in RDF are obviously
> inherited from xml:lang).

They are, yes.

>>> Note: When using the language tag, care must be taken not to confuse
>>> language with locale. The language tag relates only to human language
>>> text. Presentational issues should be addressed in end-user
>>> applications.
>>> 
>>> [1]
>>> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-Literal
>>> 
>>> but if it is, literals with @en-GB" are disjoint from literals with @en
>>> and so the respective datatypes should be disjoint as well.
>> 
>> It's not quite that simple. @en matches @en-GB, but they're not equal c.f.
>> http://www.w3.org/International/articles/language-tags/#matching
>> and http://www.w3.org/TR/rdf-sparql-query/#func-langMatches
> 
> Do they *match* in the sense of the model theory? In other words, does

I don't know what "match" means in a model theoretic sense, only in a RFC 4647 sense.

>  :a :b "chat"@en-GB .
> 
> entail
> 
>  :a :b "chat"@en .
> 
> in any entailment regime defined by the RDF semantics ??

No idea.

> I don't think so, which does not mean that it is not an interesting
> thing to consider —although it looks like a tricky can of worms...
> 
> In any case, I don't think that this entailment would mean that
> rdflang:en would be a supertype of rdflang:en-GB, as their value space
> would still be disjoint, in my view.

OK, seems reasonable.

- Steve

>>>>> A few practical considerations:
>>>>> 
>>>>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>>>>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>>>>> canonical case for the datatype form.
>>>> 
>>>> I hadn't thought of that either, but yes, canonical case sounds like the
>>>> right thing to do.
>>> 
>>> and according to [1] again, the language tag is normalized to lowercase
>>> in the abstract syntax.
>> 
>> OK, that's easy.
>> 
>>> <snip />
>>>>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>>>>> to write "מחשב"^^rdflang:en?
>>>> 
>>>> well, currently I guess any UTF-8 string is valid. So yes, the value
>>>> space would of all those datatypes would be all UTF-8 strings, if only
>>>> for the sake of BC (and because I sure don't want to walk down that path...)
>>> 
>>> sorry, I was reading "lexical space".
>>> 
>>> The value space would be isomorphic to the set of UTF-8 strings, but
>>> different for each "language datatype". Defining it as the set of pair
>>> <text, language-tag> as in RDF Semantics seems like a good option.
>> 
>> Sounds reasonable.
>> 
>> - Steve
>> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Wednesday, 18 May 2011 09:13:03 UTC