Re: I18N issues an OWL2

Sandro Hawke さんは書きました:
> Axel:
>>> So, why could a lang: datatype hierarchy not simply state that the
>>> hierarchy is defined *implicitly*. We don't need to list this
>>> hierarchy explicitly, but could just define:
>>> <i>lang:tag1</i> is a supertype of </i>lang:tag2</i> if and only if
>>> <i>tag1</i> is a prefix of <i>tag2</i>, where both <i>tag1</i> and
>>> <i>tag2</i> are both valid language tags, following [BCP 47].
>>> Maybe, I am oversimplifying things here, but I really don't understand
>>> the deep problem with this approach - which probably there is, but I'd
>>> appreciate if someone could point me explicitly.
> Felix:
>> I'm looking at ,=20
>> the currently planned revision of BCP 47. See esp.
>>  Many of the grandfathered tags have been superseded by the subsequent
>>    addition of new subtags: each superseded record contains a Preferred-
>>    Value field that ought to be used to form language tags representing
>>    that value.  For example, the tag "art-lojban" is superseded by the
>>    primary language subtag 'jbo'.
>> That is, for the language tags "art-lojban" and "jbo" there is no=20
>> hierarchy. The language tags express the same language.
>> Another issue is with so-called Macro languages and extended language=20
>> subtags, see
>> I can't explain these concepts in detail here, but the problem with the=20
>> notion of "a longer sub tag =3D deeper hierarchy" arises here:
>> [
>> Each encompassed language's subtag SHOULD be used as the primary
>> language subtag. For example, a document in Mandarin Chinese
>> would be tagged "cmn" (the subtag for Mandarin Chinese) in
>> preference to "zh" (Chinese).
>> o If compatibility is desired or needed, the encompassed subtag MAY
>> be used as an extended language subtag. For example, a document
>> in Mandarin Chinese could be tagged "zh-cmn" instead of either
>> "cmn" or "zh".
>> ]
>> That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or "zh.=20
>> Again you have no clear "length to hierarchy" relation.
>> Addison can provide more examples and can judge if my concerns here are=20
>> valid.
> It seems to me that we can use datatypes like this and simply refer to
> other specs for what the sub-type and equivalent-type relations are.
> But, imagining a better future, ...
> It would be nice (but doesn't seem necessary) for W3C to publish these
> relations in machine-usable form.  

The subtag registry has a lot of relations available. E.g. the relation
between "art-lojban" and "jbo" is described in the entry for "art-lojban:

Type: grandfathered
Tag: art-lojban
Description: Lojban
Added: 2001-11-11
*Preferred-Value: jbo
Deprecated: 2003-09-02
Comments: replaced by ISO code jbo*

The provide access to XML processing, various formats of the registry
have been created in XML, see
I think it would be useful to have it also available in RDF, for
Semantic Web processing.

>  Since I'm on vacation, I'm just
> going to wonder about two things rather than look them up like I
> should.  :-)   
>     -  Does XSD give us a way to do that for data types? 

You mean "publishing the relations"? You could define a hierarchy of
simple types, e.g. with "en-US" being sub ordinate to "en". Though you
would run again into the "language tags are generative and hard to
enumerate as types" issue.

>     -  Can we do it with OWL by treating datatypes as
>        properties? 

I think you can, though the enumeration issue would probably be the same.

>   It seems clear to me that 
>             if bar is a datatype:
>                "foo"^^bar == [ bar foo ] 
>        ie bar is a property where the domain is the lexical space and
>        the range is the value space.  Read "xs:int" as "the integer
>        value serialized in this string".  If the RDF or OWL semantics
>        don't allow it, then we'd have to back off to
>                "foo"^^bar == [ bar2 foo ]
>        where there's a one-to-one correspondence between bar and bar2.
> That would allow people with a decent semantic web engine (which doesn't
> know anything about BCP 47)
> to query for lang=en and get results which
> were lang=en-US.

Sounds very reasonable to me.


Received on Monday, 14 July 2008 13:08:33 UTC