Re: I18N issues an OWL2

Axel Polleres さんは書きました:
>
> Felix Sasaki wrote:
>> Ivan Herman さんは書きました:
>>>
>>>
>>> Axel Polleres wrote:
>>> [snip]
>>>
>>>> Sure!
>>>>
>>>> As for the namespace, I personally prefer rdf: sharing jos' 
>>>> arguments here that it is in my opinion NOT problematic to do so. 
>>>> Several rdf: namespaced properties already do not have a specified 
>>>> formal semantics (the reification having been mentioned already, so 
>>>> what).
>>>>
>>>
>>> Yes, that is indeed a good point.
>>>
>>> [snip]
>>>
>>>>
>>>> A probably more feasible solution would be to do a real type 
>>>> hierarchy,
>>>> for language tags and - instead of a datatype 
>>>> owl:internationalizedString or rif:text which has pairs of strings 
>>>> and language tags as lexical space - define separate datatypes and 
>>>> (subtypes) for each lang-tag, ie.
>>>>
>>>> use:
>>>>
>>>> message("Hello"^^lang:en-US)
>>>>
>>>> where e.g. lang:en-US is a subtype of lang:en, i.e.
>>>> that would also imply
>>>>
>>>> message("Hello"^^lang:en)
>>>>
>>>> (just as xsd:integer is a subtype of xsd:integer of xsd:decimal in 
>>>> the XML Schema type hierarchy, see 
>>>> http://www.w3.org/TR/xmlschema-2/#built-in-datatypes)
>>>>
>>>> Anything wrong with that? To me this seems much cleaner than this 
>>>> fiddling around with pairs of strings and lang-tags.
>>>>
>>> [snip]
>>>
>>> This is indeed quite nice, I must say. Addison already referred to 
>>> one caveat that I intended to raise, namely the possibly high number 
>>> of language tags (by the way, [1] gives a fairly readable overview 
>>> of those). Let us see where that discussion goes...
>>
>>
>> This caveat might be a severe problem of this approach. The BCP 47 
>> language tags are relying on a generate approach using the ABNF in 
>> BCP 47 (so-called "well formed" language tags), and in addition the 
>> registry of sub tags. I'm not sure if it will be feasible to put 
>> these two types of conformance in relation to the planned OWL2 data 
>> type hierarchy, though I think it would be highly desirable ...
>
> The question is more then, whether we still want to go for the 
> somewhat crooked detour of having language tags outside the datatypes? 
> I mean, in what sense does the generic datatype rif:text or 
> owl:internationalizedText*) solve the problem instead of just hiding it?
>
> BCP 47 says: "Subtags are distinguished and separated from one another 
> by a hyphen ("-", ABNF [RFC4234] %x2D)."
>
> So, why could a lang: datatype hierarchy not simply state that the 
> hierarchy is defined *implicitly*. We don't need to list this 
> hierarchy explicitly, but could just define:
>
> <i>lang:tag1</i> is a supertype of </i>lang:tag2</i> if and only if
> <i>tag1</i> is a prefix of <i>tag2</i>, where both <i>tag1</i> and
> <i>tag2</i> are both valid language tags, following [BCP 47].
>
> Maybe, I am oversimplifying things here, but I really don't understand 
> the deep problem with this approach - which probably there is, but I'd 
> appreciate if someone could point me explicitly. 

I'm looking at http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16 , 
the currently planned revision of BCP 47. See esp.
http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-2.2.8

 Many of the grandfathered tags have been superseded by the subsequent
   addition of new subtags: each superseded record contains a Preferred-
   Value field that ought to be used to form language tags representing
   that value.  For example, the tag "art-lojban" is superseded by the
   primary language subtag 'jbo'.

That is, for the language tags "art-lojban" and "jbo" there is no 
hierarchy. The language tags express the same language.

Another issue is with so-called Macro languages and extended language 
subtags, see
http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-4.1.2
I can't explain these concepts in detail here, but the problem with the 
notion of "a longer sub tag = deeper hierarchy" arises here:

[
Each encompassed language's subtag SHOULD be used as the primary
language subtag. For example, a document in Mandarin Chinese
would be tagged "cmn" (the subtag for Mandarin Chinese) in
preference to "zh" (Chinese).
o If compatibility is desired or needed, the encompassed subtag MAY
be used as an extended language subtag. For example, a document
in Mandarin Chinese could be tagged "zh-cmn" instead of either
"cmn" or "zh".
]

That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or "zh. 
Again you have no clear "length to hierarchy" relation.

Addison can provide more examples and can judge if my concerns here are 
valid.

Felix

> Would it be a problem if all these datatypes would have the same 
> lexical space?
>
> Thanks for clarification,
>
> Axel
>
> *) no objection about coinflipping as suggested by Ian here, btw, if 
> we want to stick with it
>
>>>
>>> Another issue is that we have to see is how well this works with the 
>>> OWL design (I have explicitly added Boris on the cc list to draw his 
>>> attention:-). My understanding of the current datatype restriction 
>>> design[2] is that one can define facets for a specific datatype, but 
>>> not across several datatypes; on the other hand in this proposal the 
>>> datatype for 'en-us' and 'en-gb' would be different and both would 
>>> be different from 'en' (although 'en-us' and 'en-gb' would both be 
>>> subtypes of 'en'). How could I define facets that involves all 
>>> these? Would that work well with the OWL design? I actually hope we 
>>> can find a way, because the usage of these URI-s looks quite elegant...
>>>
>>> Cheers
>>>
>>> Ivan
>>>
>>>
>>> [1] http://www.w3.org/International/articles/language-tags/
>>> [2] http://www.w3.org/2007/OWL/wiki/Syntax#Datatype_Restrictions
>

Received on Monday, 14 July 2008 10:12:38 UTC