Re: I18N issues an OWL2 from Felix Sasaki on 2008-07-14 (public-owl-wg@w3.org from July 2008)

From: Felix Sasaki <fsasaki@w3.org>
Date: Mon, 14 Jul 2008 22:07:26 +0900
To: Sandro Hawke <sandro@w3.org>
CC: Axel Polleres <axel.polleres@deri.org>, Ivan Herman <ivan@w3.org>, "Phillips, Addison" <addison@amazon.com>, Jie Bao <baojie@cs.rpi.edu>, "public-owl-wg@w3.org" <public-owl-wg@w3.org>, "public-i18n-core-comments@w3.org" <public-i18n-core@w3.org>, "public-rif-comments@w3.org" <public-rif-comments@w3.org>, Boris Motik <boris.motik@comlab.ox.ac.uk>
Message-ID: <487B4F8E.3070804@w3.org>

Sandro Hawke さんは書きました:
> Axel:
>   
>>> So, why could a lang: datatype hierarchy not simply state that the
>>> hierarchy is defined *implicitly*. We don't need to list this
>>> hierarchy explicitly, but could just define:
>>>
>>> <i>lang:tag1</i> is a supertype of </i>lang:tag2</i> if and only if
>>> <i>tag1</i> is a prefix of <i>tag2</i>, where both <i>tag1</i> and
>>> <i>tag2</i> are both valid language tags, following [BCP 47].
>>>
>>> Maybe, I am oversimplifying things here, but I really don't understand
>>> the deep problem with this approach - which probably there is, but I'd
>>> appreciate if someone could point me explicitly.
>>>       
>
> Felix:
>   
>> I'm looking at http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16 ,=20
>> the currently planned revision of BCP 47. See esp.
>> http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-2.2.8
>>
>>  Many of the grandfathered tags have been superseded by the subsequent
>>    addition of new subtags: each superseded record contains a Preferred-
>>    Value field that ought to be used to form language tags representing
>>    that value.  For example, the tag "art-lojban" is superseded by the
>>    primary language subtag 'jbo'.
>>
>> That is, for the language tags "art-lojban" and "jbo" there is no=20
>> hierarchy. The language tags express the same language.
>>
>> Another issue is with so-called Macro languages and extended language=20
>> subtags, see
>> http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-4.1.2
>> I can't explain these concepts in detail here, but the problem with the=20
>> notion of "a longer sub tag =3D deeper hierarchy" arises here:
>>
>> [
>> Each encompassed language's subtag SHOULD be used as the primary
>> language subtag. For example, a document in Mandarin Chinese
>> would be tagged "cmn" (the subtag for Mandarin Chinese) in
>> preference to "zh" (Chinese).
>> o If compatibility is desired or needed, the encompassed subtag MAY
>> be used as an extended language subtag. For example, a document
>> in Mandarin Chinese could be tagged "zh-cmn" instead of either
>> "cmn" or "zh".
>> ]
>>
>> That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or "zh.=20
>> Again you have no clear "length to hierarchy" relation.
>>
>> Addison can provide more examples and can judge if my concerns here are=20
>> valid.
>>     
>
> It seems to me that we can use datatypes like this and simply refer to
> other specs for what the sub-type and equivalent-type relations are.
>
> But, imagining a better future, ...
>
> It would be nice (but doesn't seem necessary) for W3C to publish these
> relations in machine-usable form.  

The subtag registry has a lot of relations available. E.g. the relation
between "art-lojban" and "jbo" is described in the entry for "art-lojban:

%%
Type: grandfathered
Tag: art-lojban
Description: Lojban
Added: 2001-11-11
*Preferred-Value: jbo
Deprecated: 2003-09-02
Comments: replaced by ISO code jbo*
%%

The provide access to XML processing, various formats of the registry
have been created in XML, see
http://www.langtag.net/registries.html
I think it would be useful to have it also available in RDF, for
Semantic Web processing.

>  Since I'm on vacation, I'm just
> going to wonder about two things rather than look them up like I
> should.  :-)   
>     -  Does XSD give us a way to do that for data types? 
>   

You mean "publishing the relations"? You could define a hierarchy of
simple types, e.g. with "en-US" being sub ordinate to "en". Though you
would run again into the "language tags are generative and hard to
enumerate as types" issue.

>     -  Can we do it with OWL by treating datatypes as
>        properties? 

I think you can, though the enumeration issue would probably be the same.

>   It seems clear to me that 
>             if bar is a datatype:
>                "foo"^^bar == [ bar foo ] 
>        ie bar is a property where the domain is the lexical space and
>        the range is the value space.  Read "xs:int" as "the integer
>        value serialized in this string".  If the RDF or OWL semantics
>        don't allow it, then we'd have to back off to
>                "foo"^^bar == [ bar2 foo ]
>        where there's a one-to-one correspondence between bar and bar2.
>
> That would allow people with a decent semantic web engine (which doesn't
> know anything about BCP 47)
> to query for lang=en and get results which
> were lang=en-US.
>   


Sounds very reasonable to me.

Felix

Received on Monday, 14 July 2008 13:08:31 UTC