- From: Felix Sasaki <fsasaki@w3.org>
- Date: Mon, 14 Jul 2008 19:11:35 +0900
- To: Axel Polleres <axel.polleres@deri.org>
- CC: Ivan Herman <ivan@w3.org>, "Phillips, Addison" <addison@amazon.com>, Jie Bao <baojie@cs.rpi.edu>, "public-owl-wg@w3.org" <public-owl-wg@w3.org>, "public-i18n-core-comments@w3.org" <public-i18n-core@w3.org>, "public-rif-comments@w3.org" <public-rif-comments@w3.org>, Boris Motik <boris.motik@comlab.ox.ac.uk>
Axel Polleres さんは書きました: > > Felix Sasaki wrote: >> Ivan Herman ã•ã‚“ã¯æ›¸ãã¾ã—ãŸ: >>> >>> >>> Axel Polleres wrote: >>> [snip] >>> >>>> Sure! >>>> >>>> As for the namespace, I personally prefer rdf: sharing jos' >>>> arguments here that it is in my opinion NOT problematic to do so. >>>> Several rdf: namespaced properties already do not have a specified >>>> formal semantics (the reification having been mentioned already, so >>>> what). >>>> >>> >>> Yes, that is indeed a good point. >>> >>> [snip] >>> >>>> >>>> A probably more feasible solution would be to do a real type >>>> hierarchy, >>>> for language tags and - instead of a datatype >>>> owl:internationalizedString or rif:text which has pairs of strings >>>> and language tags as lexical space - define separate datatypes and >>>> (subtypes) for each lang-tag, ie. >>>> >>>> use: >>>> >>>> message("Hello"^^lang:en-US) >>>> >>>> where e.g. lang:en-US is a subtype of lang:en, i.e. >>>> that would also imply >>>> >>>> message("Hello"^^lang:en) >>>> >>>> (just as xsd:integer is a subtype of xsd:integer of xsd:decimal in >>>> the XML Schema type hierarchy, see >>>> http://www.w3.org/TR/xmlschema-2/#built-in-datatypes) >>>> >>>> Anything wrong with that? To me this seems much cleaner than this >>>> fiddling around with pairs of strings and lang-tags. >>>> >>> [snip] >>> >>> This is indeed quite nice, I must say. Addison already referred to >>> one caveat that I intended to raise, namely the possibly high number >>> of language tags (by the way, [1] gives a fairly readable overview >>> of those). Let us see where that discussion goes... >> >> >> This caveat might be a severe problem of this approach. The BCP 47 >> language tags are relying on a generate approach using the ABNF in >> BCP 47 (so-called "well formed" language tags), and in addition the >> registry of sub tags. I'm not sure if it will be feasible to put >> these two types of conformance in relation to the planned OWL2 data >> type hierarchy, though I think it would be highly desirable ... > > The question is more then, whether we still want to go for the > somewhat crooked detour of having language tags outside the datatypes? > I mean, in what sense does the generic datatype rif:text or > owl:internationalizedText*) solve the problem instead of just hiding it? > > BCP 47 says: "Subtags are distinguished and separated from one another > by a hyphen ("-", ABNF [RFC4234] %x2D)." > > So, why could a lang: datatype hierarchy not simply state that the > hierarchy is defined *implicitly*. We don't need to list this > hierarchy explicitly, but could just define: > > <i>lang:tag1</i> is a supertype of </i>lang:tag2</i> if and only if > <i>tag1</i> is a prefix of <i>tag2</i>, where both <i>tag1</i> and > <i>tag2</i> are both valid language tags, following [BCP 47]. > > Maybe, I am oversimplifying things here, but I really don't understand > the deep problem with this approach - which probably there is, but I'd > appreciate if someone could point me explicitly. I'm looking at http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16 , the currently planned revision of BCP 47. See esp. http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-2.2.8 Many of the grandfathered tags have been superseded by the subsequent addition of new subtags: each superseded record contains a Preferred- Value field that ought to be used to form language tags representing that value. For example, the tag "art-lojban" is superseded by the primary language subtag 'jbo'. That is, for the language tags "art-lojban" and "jbo" there is no hierarchy. The language tags express the same language. Another issue is with so-called Macro languages and extended language subtags, see http://tools.ietf.org/html/draft-ietf-ltru-4646bis-16#section-4.1.2 I can't explain these concepts in detail here, but the problem with the notion of "a longer sub tag = deeper hierarchy" arises here: [ Each encompassed language's subtag SHOULD be used as the primary language subtag. For example, a document in Mandarin Chinese would be tagged "cmn" (the subtag for Mandarin Chinese) in preference to "zh" (Chinese). o If compatibility is desired or needed, the encompassed subtag MAY be used as an extended language subtag. For example, a document in Mandarin Chinese could be tagged "zh-cmn" instead of either "cmn" or "zh". ] That is, Mandarine Chinese could be tagged as "zh-cmn" or "cmn" or "zh. Again you have no clear "length to hierarchy" relation. Addison can provide more examples and can judge if my concerns here are valid. Felix > Would it be a problem if all these datatypes would have the same > lexical space? > > Thanks for clarification, > > Axel > > *) no objection about coinflipping as suggested by Ian here, btw, if > we want to stick with it > >>> >>> Another issue is that we have to see is how well this works with the >>> OWL design (I have explicitly added Boris on the cc list to draw his >>> attention:-). My understanding of the current datatype restriction >>> design[2] is that one can define facets for a specific datatype, but >>> not across several datatypes; on the other hand in this proposal the >>> datatype for 'en-us' and 'en-gb' would be different and both would >>> be different from 'en' (although 'en-us' and 'en-gb' would both be >>> subtypes of 'en'). How could I define facets that involves all >>> these? Would that work well with the OWL design? I actually hope we >>> can find a way, because the usage of these URI-s looks quite elegant... >>> >>> Cheers >>> >>> Ivan >>> >>> >>> [1] http://www.w3.org/International/articles/language-tags/ >>> [2] http://www.w3.org/2007/OWL/wiki/Syntax#Datatype_Restrictions >
Received on Monday, 14 July 2008 10:12:39 UTC