Re: Proposal for ISSUE-12, string literals

On 2011-05-17, at 21:01, Pierre-Antoine Champin wrote:

> sorry, some second thoughts
> On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
>> On 05/17/2011 11:06 AM, Steve Harris wrote:
> <snip/>
>>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>>> of xsd:string,
> as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
> "chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
> is equal to a plain literal with language tag. So the respective
> datatypes should have disjoint value spaces, hence no subtype relation.
>>> and rdflang:en-GB would be a subtype of rdflang:en, and
>>> so on?
> I'm not even sure "en-GB" is a valid language tag, reading [1]:

It's a region subtag, see
So, yes it is a valid language tag.

>  Note: When using the language tag, care must be taken not to confuse
>  language with locale. The language tag relates only to human language
>  text. Presentational issues should be addressed in end-user
>  applications.
> [1]
> but if it is, literals with @en-GB" are disjoint from literals with @en
> and so the respective datatypes should be disjoint as well.

It's not quite that simple. @en matches @en-GB, but they're not equal c.f.

>>> A few practical considerations:
>>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>>> canonical case for the datatype form.
>> I hadn't thought of that either, but yes, canonical case sounds like the
>> right thing to do.
> and according to [1] again, the language tag is normalized to lowercase
> in the abstract syntax.

OK, that's easy.

> <snip />
>>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>>> to write "מחשב"^^rdflang:en?
>> well, currently I guess any UTF-8 string is valid. So yes, the value
>> space would of all those datatypes would be all UTF-8 strings, if only
>> for the sake of BC (and because I sure don't want to walk down that path...)
> sorry, I was reading "lexical space".
> The value space would be isomorphic to the set of UTF-8 strings, but
> different for each "language datatype". Defining it as the set of pair
> <text, language-tag> as in RDF Semantics seems like a good option.

Sounds reasonable.

- Steve

Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 18 May 2011 08:38:16 UTC