Re: Proposal for ISSUE-12, string literals from Pierre-Antoine Champin on 2011-05-18 (public-rdf-wg@w3.org from May 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Wed, 18 May 2011 11:07:10 +0200
To: Steve Harris <steve.harris@garlik.com>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4DD38C3E.4070405@liris.cnrs.fr>

On 05/18/2011 10:37 AM, Steve Harris wrote:
> On 2011-05-17, at 21:01, Pierre-Antoine Champin wrote:
> 
>> sorry, some second thoughts
>>
>> On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
>>> On 05/17/2011 11:06 AM, Steve Harris wrote:
>> <snip/>
>>>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>>>> of xsd:string,
>>
>> as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
>> "chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
>> is equal to a plain literal with language tag. So the respective
>> datatypes should have disjoint value spaces, hence no subtype relation.
>>
>>>> and rdflang:en-GB would be a subtype of rdflang:en, and
>>>> so on?
>>
>> I'm not even sure "en-GB" is a valid language tag, reading [1]:
> 
> It's a region subtag, see http://www.w3.org/International/articles/language-tags/
> So, yes it is a valid language tag.

I meant "valid language tag *in RDF*" of course. But I guess the URL you
refer to can apply to RDF as well (as language tags in RDF are obviously
inherited from xml:lang).

>>  Note: When using the language tag, care must be taken not to confuse
>>  language with locale. The language tag relates only to human language
>>  text. Presentational issues should be addressed in end-user
>>  applications.
>>
>> [1]
>> http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-Literal
>>
>> but if it is, literals with @en-GB" are disjoint from literals with @en
>> and so the respective datatypes should be disjoint as well.
> 
> It's not quite that simple. @en matches @en-GB, but they're not equal c.f.
> http://www.w3.org/International/articles/language-tags/#matching
> and http://www.w3.org/TR/rdf-sparql-query/#func-langMatches

Do they *match* in the sense of the model theory? In other words, does

  :a :b "chat"@en-GB .

entail

  :a :b "chat"@en .

in any entailment regime defined by the RDF semantics ??

I don't think so, which does not mean that it is not an interesting
thing to consider —although it looks like a tricky can of worms...

In any case, I don't think that this entailment would mean that
rdflang:en would be a supertype of rdflang:en-GB, as their value space
would still be disjoint, in my view.

  pa


> 
>>>> A few practical considerations:
>>>>
>>>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>>>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>>>> canonical case for the datatype form.
>>>
>>> I hadn't thought of that either, but yes, canonical case sounds like the
>>> right thing to do.
>>
>> and according to [1] again, the language tag is normalized to lowercase
>> in the abstract syntax.
> 
> OK, that's easy.
> 
>> <snip />
>>>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>>>> to write "מחשב"^^rdflang:en?
>>>
>>> well, currently I guess any UTF-8 string is valid. So yes, the value
>>> space would of all those datatypes would be all UTF-8 strings, if only
>>> for the sake of BC (and because I sure don't want to walk down that path...)
>>
>> sorry, I was reading "lexical space".
>>
>> The value space would be isomorphic to the set of UTF-8 strings, but
>> different for each "language datatype". Defining it as the set of pair
>> <text, language-tag> as in RDF Semantics seems like a good option.
> 
> Sounds reasonable.
> 
> - Steve
>

Received on Wednesday, 18 May 2011 09:07:35 UTC