Re: Proposal for ISSUE-12, string literals from Pierre-Antoine Champin on 2011-05-18 (public-rdf-wg@w3.org from May 2011)

From: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
Date: Wed, 18 May 2011 09:03:50 +0200
To: Pierre-Antoine Champin <pierre-antoine.champin@liris.cnrs.fr>
CC: Steve Harris <steve.harris@garlik.com>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4DD36F56.70103@liris.cnrs.fr>

sorry, some second thoughts

On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
> On 05/17/2011 11:06 AM, Steve Harris wrote:
<snip/>
>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>> of xsd:string,

as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
"chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
is equal to a plain literal with language tag. So the respective
datatypes should have disjoint value spaces, hence no subtype relation.

>> and rdflang:en-GB would be a subtype of rdflang:en, and
>> so on?

I'm not even sure "en-GB" is a valid language tag, reading [1]:

  Note: When using the language tag, care must be taken not to confuse
  language with locale. The language tag relates only to human language
  text. Presentational issues should be addressed in end-user
  applications.

[1]
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/#section-Graph-Literal

but if it is, literals with @en-GB" are disjoint from literals with @en
and so the respective datatypes should be disjoint as well.

>> A few practical considerations:
>>
>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>> canonical case for the datatype form.
> 
> I hadn't thought of that either, but yes, canonical case sounds like the
> right thing to do.

and according to [1] again, the language tag is normalized to lowercase
in the abstract syntax.

<snip />
>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>> to write "מחשב"^^rdflang:en?
> 
> well, currently I guess any UTF-8 string is valid. So yes, the value
> space would of all those datatypes would be all UTF-8 strings, if only
> for the sake of BC (and because I sure don't want to walk down that path...)

sorry, I was reading "lexical space".

The value space would be isomorphic to the set of UTF-8 strings, but
different for each "language datatype". Defining it as the set of pair
<text, language-tag> as in RDF Semantics seems like a good option.

  pa

Received on Wednesday, 18 May 2011 07:04:16 UTC