Re: Proposal for ISSUE-12, string literals

On 2011-05-18, at 10:07, Pierre-Antoine Champin wrote:

> On 05/18/2011 10:37 AM, Steve Harris wrote:
>> On 2011-05-17, at 21:01, Pierre-Antoine Champin wrote:
>>> sorry, some second thoughts
>>> On 05/17/2011 09:03 PM, Pierre-Antoine Champin wrote:
>>>> On 05/17/2011 11:06 AM, Steve Harris wrote:
>>> <snip/>
>>>>> So, I'm guessing as a formulation that rdflang:en would be a subtype
>>>>> of xsd:string,
>>> as far as I understand, currently "chat"^^xsd:string ≠ "chat"@en and
>>> "chat" ≠ "chat"@en, and more generally no xsd:string or simple literal
>>> is equal to a plain literal with language tag. So the respective
>>> datatypes should have disjoint value spaces, hence no subtype relation.
>>>>> and rdflang:en-GB would be a subtype of rdflang:en, and
>>>>> so on?
>>> I'm not even sure "en-GB" is a valid language tag, reading [1]:
>> It's a region subtag, see
>> So, yes it is a valid language tag.
> I meant "valid language tag *in RDF*" of course. But I guess the URL you
> refer to can apply to RDF as well (as language tags in RDF are obviously
> inherited from xml:lang).

They are, yes.

>>> Note: When using the language tag, care must be taken not to confuse
>>> language with locale. The language tag relates only to human language
>>> text. Presentational issues should be addressed in end-user
>>> applications.
>>> [1]
>>> but if it is, literals with @en-GB" are disjoint from literals with @en
>>> and so the respective datatypes should be disjoint as well.
>> It's not quite that simple. @en matches @en-GB, but they're not equal c.f.
>> and
> Do they *match* in the sense of the model theory? In other words, does

I don't know what "match" means in a model theoretic sense, only in a RFC 4647 sense.

>  :a :b "chat"@en-GB .
> entail
>  :a :b "chat"@en .
> in any entailment regime defined by the RDF semantics ??

No idea.

> I don't think so, which does not mean that it is not an interesting
> thing to consider —although it looks like a tricky can of worms...
> In any case, I don't think that this entailment would mean that
> rdflang:en would be a supertype of rdflang:en-GB, as their value space
> would still be disjoint, in my view.

OK, seems reasonable.

- Steve

>>>>> A few practical considerations:
>>>>> 1) ISO language codes are not case sensitive, IRIs are. "foo"@fr =
>>>>> "foo"@FR, "foo"^^rdflang:fr != "foo"^^rdflang:FR. We'd need to define a
>>>>> canonical case for the datatype form.
>>>> I hadn't thought of that either, but yes, canonical case sounds like the
>>>> right thing to do.
>>> and according to [1] again, the language tag is normalized to lowercase
>>> in the abstract syntax.
>> OK, that's easy.
>>> <snip />
>>>>> 4) Is the value space all UTF-8 strings? If not, is it a type error
>>>>> to write "מחשב"^^rdflang:en?
>>>> well, currently I guess any UTF-8 string is valid. So yes, the value
>>>> space would of all those datatypes would be all UTF-8 strings, if only
>>>> for the sake of BC (and because I sure don't want to walk down that path...)
>>> sorry, I was reading "lexical space".
>>> The value space would be isomorphic to the set of UTF-8 strings, but
>>> different for each "language datatype". Defining it as the set of pair
>>> <text, language-tag> as in RDF Semantics seems like a good option.
>> Sounds reasonable.
>> - Steve

Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD

Received on Wednesday, 18 May 2011 09:13:03 UTC