Re: I18N issues an OWL2

Axel Polleres さんは書きました:
> Felix Sasaki wrote:
>> Ivan Herman さんは書きました:
>>> Axel Polleres wrote:
>>> [snip]
>>>> Sure!
>>>> As for the namespace, I personally prefer rdf: sharing jos' 
>>>> arguments here that it is in my opinion NOT problematic to do so. 
>>>> Several rdf: namespaced properties already do not have a specified 
>>>> formal semantics (the reification having been mentioned already, so 
>>>> what).
>>> Yes, that is indeed a good point.
>>> [snip]
>>>> A probably more feasible solution would be to do a real type 
>>>> hierarchy,
>>>> for language tags and - instead of a datatype 
>>>> owl:internationalizedString or rif:text which has pairs of strings 
>>>> and language tags as lexical space - define separate datatypes and 
>>>> (subtypes) for each lang-tag, ie.
>>>> use:
>>>> message("Hello"^^lang:en-US)
>>>> where e.g. lang:en-US is a subtype of lang:en, i.e.
>>>> that would also imply
>>>> message("Hello"^^lang:en)
>>>> (just as xsd:integer is a subtype of xsd:integer of xsd:decimal in 
>>>> the XML Schema type hierarchy, see 
>>>> Anything wrong with that? To me this seems much cleaner than this 
>>>> fiddling around with pairs of strings and lang-tags.
>>> [snip]
>>> This is indeed quite nice, I must say. Addison already referred to 
>>> one caveat that I intended to raise, namely the possibly high number 
>>> of language tags (by the way, [1] gives a fairly readable overview 
>>> of those). Let us see where that discussion goes...
>> This caveat might be a severe problem of this approach. The BCP 47 
>> language tags are relying on a generate approach using the ABNF in 
>> BCP 47 (so-called "well formed" language tags), and in addition the 
>> registry of sub tags. I'm not sure if it will be feasible to put 
>> these two types of conformance in relation to the planned OWL2 data 
>> type hierarchy, though I think it would be highly desirable ...
> The question is more then, whether we still want to go for the 
> somewhat crooked detour of having language tags outside the datatypes? 
> I mean, in what sense does the generic datatype rif:text or 
> owl:internationalizedText*) solve the problem instead of just hiding it?
> BCP 47 says: "Subtags are distinguished and separated from one another 
> by a hyphen ("-", ABNF [RFC4234] %x2D)."
> So, why could a lang: datatype hierarchy not simply state that the 
> hierarchy is defined *implicitly*. We don't need to list this 
> hierarchy explicitly, but could just define:
> <i>lang:tag1</i> is a supertype of </i>lang:tag2</i> if and only if
> <i>tag1</i> is a prefix of <i>tag2</i>, where both <i>tag1</i> and
> <i>tag2</i> are both valid language tags, following [BCP 47].
> Maybe, I am oversimplifying things here, but I really don't understand 
> the deep problem with this approach - which probably there is, but I'd 
> appreciate if someone could point me explicitly. 

I'm looking at , 
the currently planned revision of BCP 47. See esp.

 Many of the grandfathered tags have been superseded by the subsequent
   addition of new subtags: each superseded record contains a Preferred-
   Value field that ought to be used to form language tags representing
   that value.  For example, the tag "art-lojban" is superseded by the
   primary language subtag 'jbo'.

That is, for the language tags "art-lojban" and "jbo" there is no 
hierarchy. The language tags express the same language.

Another issue is with so-called Macro languages and extended language 
subtags, see
I can't explain these concepts in detail here, but the problem with the 
notion of "a longer sub tag = deeper hierarchy" arises here:

Each encompassed language's subtag SHOULD be used as the primary
language subtag. For example, a document in Mandarin Chinese
would be tagged "cmn" (the subtag for Mandarin Chinese) in
preference to "zh" (Chinese).
o If compatibility is desired or needed, the encompassed subtag MAY
be used as an extended language subtag. For example, a document
in Mandarin Chinese could be tagged "zh-cmn" instead of either
"cmn" or "zh".

That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or "zh. 
Again you have no clear "length to hierarchy" relation.

Addison can provide more examples and can judge if my concerns here are 


> Would it be a problem if all these datatypes would have the same 
> lexical space?
> Thanks for clarification,
> Axel
> *) no objection about coinflipping as suggested by Ian here, btw, if 
> we want to stick with it
>>> Another issue is that we have to see is how well this works with the 
>>> OWL design (I have explicitly added Boris on the cc list to draw his 
>>> attention:-). My understanding of the current datatype restriction 
>>> design[2] is that one can define facets for a specific datatype, but 
>>> not across several datatypes; on the other hand in this proposal the 
>>> datatype for 'en-us' and 'en-gb' would be different and both would 
>>> be different from 'en' (although 'en-us' and 'en-gb' would both be 
>>> subtypes of 'en'). How could I define facets that involves all 
>>> these? Would that work well with the OWL design? I actually hope we 
>>> can find a way, because the usage of these URI-s looks quite elegant...
>>> Cheers
>>> Ivan
>>> [1]
>>> [2]

Received on Monday, 14 July 2008 10:12:38 UTC