Re: I18N issues an OWL2

Many thanks for the clarifications, Addison. I was not sure whether 
statements like "en-US is a subtype of en" are about well-formed or 
valid language tags, about tagging guidelines (see "zh" vs. "zh-cmn") or 
about matching. From your comments I see that the later is the case, and 
for matching your proposal looks good to me. Just some questions below.

Phillips, Addison さんは書きました:
>>> Felix Sasaki wrote:
>> That is, for the language tags "art-lojban" and "jbo" there is no
>> hierarchy. The language tags express the same language.
> However: it is always permitted to treat these two tags as being different. BCP 47 defines two classes of conformance (well-formed and validating). Validating language tag processors may canonicalize deprecated tags into their preferred form before matching, but this isn't required. Most specifications are "well-formed" and not "validating" anyway: none of the matching schemes in BCP 47 are validating (some aren't even "well-formed").
>> That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or
>> "zh.
>> Again you have no clear "length to hierarchy" relation.
> This isn't quite accurate. "zh" doesn't mean "Mandarin Chinese". It only means Chinese. This doesn't mean that some content in Mandarin can't be tagged with it--quite the contrary--but that this doesn't necessarily enter into tag processing.
> The problem here is that language tag matching/selection/etc. is really quite simplistic. It is based solely on (case-insensitive) string matching, with no recourse to the registry, the ABNF, or anything else. If you know the tag's structure, you can use it to your advantage, since the tag's structure was designed to help you do various operations in a useful manner (such as selecting all tags with a given script or a given region).
> However, we often struggle with it in XML contexts because attribute matching is usually Boolean all-or-nothing matching rather than looking for the trailing hyphen. OWL has a predilection to this kind of matching, since, by design, it is about "is-a" relationships that exactly match strings.

Maybe can you give an example of the all-or-nothing matching?

> This suggests to me that language tags would most easily be implemented by using Basic Filtering from RFC 4647. That is, the language tag in an "internationalizedString" matches a given request iff the request is an exact prefix (where the next character is a hyphen or end of string) of the value.
> That is:
>   LanguageAssertion { hasLanguage "de" }
> Would match:
>   "gruß dich"^^lang:de
>   "gruß dich"^^lang:de-AT
>   "gruß dich"^^lang:de-1901
>   "gruß dich"^^lang:de-AT-1901

I guess in your framework these would constitute a sub type relation, 
from the beginning to the end of the list?

>   ... usw...
> But not:
>   "whoa"^^lang:del  # a string in the Delaware language; whoa isn't actually probably a word
>   "bonjour"^^"lang:fr
>   ... etc...
> This addresses the "macrolanguage problem".

Maybe you could give an example how, just so that I'm sure that I 
understand how.

Many thanks,


Received on Monday, 14 July 2008 16:14:41 UTC