Re: ISSUE-12 On languages and datatypes

Le 09/06/2011 11:00, Jan Wielemaker a écrit :
> On 06/09/2011 10:37 AM, Antoine Zimmermann wrote:
>> This has been already discussed and it's very problematic. The
>> conclusion was clearly "we won't do this". From the RDF spec
>> perspective, the lang tag is an opaque string which syntactically
>> follows RFC 5646. So no relation can be inferred (in RDF) between "en"
>> and "en-GB" (that's probably the best we can do). Notice though that a
>> multilingual system may apply further processing on language tags but it
>> is not RDF business.
>> In more details:
>> Le 09/06/2011 09:06, Jan Wielemaker a �crit :
>>> On 06/09/2011 12:25 AM, William Waites wrote:
>>>> rdflang:en rdfs:subClassOf xsd:string;
>>>> rdfs:label "en".
>> this leads to the conclusion that strings with "en" tags are not
>> distinguishable from plain sequences of characters;
> In itself, that is not new. The same applies to rdfs:subClassOf
> :dog, :mammal.
> There is a problem if the RFC5646 rules cannot be expressed in simple
> subClassOf relations. I don't know the details here. I think it should
> work 99%, but it is likely that there are corner cases.
> Note that without adding the rdfs class hierarchy between the various
> you can perfectly well distinguish @en from a plain xsd:string. I
> certainly would not advocate for the RDF working group to define the
> relations between the various language tags. I sympathise with William's
> proposal to replace the two-dimensional literal space with a single,
> always present, classifier.
> Mapping @<lang> to rdflang:<lang> does (IMHO) make handling (un)typed
> and language classified literals a lot more more straightforward.

I'm not against mapping @<lang> to rdflang:<lang>, I'm just against 
translating the relationship between lang tags into subclassof 

If rdflang:en-GB is a subclass of rdflang:en, and rdflang:<lang> a 
subclass of xsd:string, then "chat"@en-GB is just a syntactic variant of 
"chat"@en and "chat"@fr. If this is true, I'm afraid it won't be helping 

Just to go on a little bit with the consequences: if you query a dataset 
with datatype entailment regime, it is required (per SPARQL 1.1 
Entailment Regime, Section 4) that literals are given in canonical form. 
In the case of strings and all the datatypes derived from xsd:string, 
that would mean the canonical form of "xxx"@lang is "xxx". So you end up 
forgetting all lang tags.

(*) See for a 
proposal that maps lang tags to datatypes and preserves the distinction 
between languages.

Antoine Zimmermann
Researcher at:
Laboratoire d'InfoRmatique en Image et Systèmes d'information
Database Group
7 Avenue Jean Capelle
69621 Villeurbanne Cedex
Tel: +33(0)4 72 43 61 74 - Fax: +33(0)4 72 43 87 13
Lecturer at:
Institut National des Sciences Appliquées de Lyon
20 Avenue Albert Einstein
69621 Villeurbanne Cedex

Received on Thursday, 9 June 2011 09:38:04 UTC