W3C home > Mailing lists > Public > public-rif-comments@w3.org > July 2008

Re: I18N issues an OWL2

From: Axel Polleres <axel.polleres@deri.org>
Date: Wed, 09 Jul 2008 22:45:55 +0100
Message-ID: <48753193.2020509@deri.org>
To: "Phillips, Addison" <addison@amazon.com>
CC: Jie Bao <baojie@cs.rpi.edu>, "public-owl-wg@w3.org" <public-owl-wg@w3.org>, "public-i18n-core-comments@w3.org" <public-i18n-core@w3.org>, "public-rif-comments@w3.org" <public-rif-comments@w3.org>

Phillips, Addison wrote:
> Hi,
> 
> Would you consider including I18N WG in your joint task force? These issues seem to arise fairly frequently. We'd like to see consistent solutions develop.
> 
> Addison

Sure!

As for the namespace, I personally prefer rdf:  sharing jos' arguments 
here that it is in my opinion NOT problematic to do so. Several rdf: 
namespaced properties already do not have a specified formal semantics 
(the reification having been mentioned already, so what).

My point is rather that our intention was what we wanted to reflect is 
the lang tagged literals from RDF from the beginning, realizing that 
there is some un-uniformity with these compared with other (typed) literals.

However, your raised concerns, issues make we shift a bit towards a 
completely different proposal, which at first glance, seems much much 
cleaner to me. As for you proposal/issues:

1) subtags: if we stick with the current understanding that we talk 
about a single datatype with pairs in its lexical space,  I am not sure 
whether this needs to be semantically reflected... do you mean to say 
that e.g. a fact

message("Hello"@en-US^^owl:internationalizedString)

  should in our semantics also imply

message("Hello"@en^^owl:internationalizedString)

???
  In RDF's semantics this is certainly not the case now and I think it 
would complicate things there ...

  A probably more feasible solution would be to do a real type hierarchy,
for language tags and - instead of a datatype 
owl:internationalizedString or rif:text which has pairs of strings and 
language tags as lexical space - define separate datatypes and 
(subtypes) for each lang-tag, ie.

use:

message("Hello"^^lang:en-US)

where e.g. lang:en-US is a subtype of lang:en, i.e.
that would also imply

message("Hello"^^lang:en)

  (just as xsd:integer is a subtype of xsd:integer of xsd:decimal in the 
XML Schema type hierarchy, see 
http://www.w3.org/TR/xmlschema-2/#built-in-datatypes)

Anything wrong with that? To me this seems much cleaner than this 
fiddling around with pairs of strings and lang-tags.

The new lang: namespace could then be something in the control of
our task force and define a datatype-IRI for each of the lang-tags 
defined in BCP 47.

Anything wrong with that??? Seems simpler than what we are into now and 
feasible to me. The problem of addressing our concerns would then boil 
down to deriving a type hierarchy from the BCP 47 document, rather than 
defining a single datatype... and defining the lang: namespace.
which could in its namespace-URI somehow refer to BCP 47.

This said, summarizing: If that works, I am in faor of an own new 
namespace above rif or owl, I guess. If there is some issue problem with 
that and we want to stick with the current pairs in the lex space, I am 
in favor of the rdf namespace.

2) As for your second issue, i.e. to add several language tags
in a priority list... I am not sure how/whether we could address this or 
whether this is really in the scope of what we want to define. We are 
talking a bout typed constants which belong to a certain language or 
not, also, it would complicate things probably... unless we define in 
the lang-type hierarchy that e.g. a type "en,fr" would be implicitly a 
supertype of both en and fr... would that work?

example:

   metal("Gold"^^lang:en,de)

would imply both

   metal("Gold"^^lang:de)

   metal("Gold"^^lang:en)

  Does that make sense? I am not sure whether I understand your issue 
well enough though and what more complications such an extension would 
bring. Still, it looks to me as if this considerably complicates things.


best,
Axel

> Addison Phillips
> Globalization Architect -- Lab126
> 
> Internationalization is not a feature.
> It is an architecture.
> 
> 
>> -----Original Message-----
>> From: baojie@gmail.com [mailto:baojie@gmail.com] On Behalf Of Jie
>> Bao
>> Sent: Wednesday, July 09, 2008 11:33 AM
>> To: Phillips, Addison
>> Cc: public-owl-wg@w3.org; public-i18n-core-comments@w3.org; public-
>> rif-comments@w3.org
>> Subject: Re: I18N issues an OWL2
>>
>> Hi Addison
>>
>> Thank you for the suggestions. The OWL and RIF WGs are planning to
>> have a joint task force on internationalized strings. There are a
>> short state-of-the-art summary[2]  and a specification draft [1].
>> Further revisions will be made after further discussions between
>> the
>> WGs. Your comments are valuable and will definitely be considered.
>> I
>> will let you updated if there is any progress.
>>
>> [1] http://www.w3.org/2007/OWL/wiki/InternationalizedStringSpec
>> [2] http://www.w3.org/2007/OWL/wiki/InternationalizedString
>>
>> Best
>>
>> Jie
>>
>> On Tue, Jul 8, 2008 at 6:54 PM, Phillips, Addison
>> <addison@amazon.com> wrote:
>>> All,
>>>
>>> I am writing this note in response to Jeremy Carroll's note of 21
>> May [1] and in response to an action item from the
>> Internationalization Core WG [2]
>>> I've reviewed the various issue tracker materials you have and
>> have some comments. I hope you find these useful. Please note that
>> these are currently personal and not WG comments.
>>> First, a bit of summary/background. IETF BCP 47 defines language
>> tags. BCP 47 used to be RFC 3066. Currently, it is two RFCs: 4646
>> and 4647. The latter of these is about "Matching of Language Tags",
>> which is primarily the issue at hand. Generally speaking, there are
>> several forms of matching that you might describe in OWL2. Given
>> the general type of operations you provide, I think you'd be best
>> off if you implemented something similar to "extended filtering" in
>> 4647. This is the most "regular expression-like" syntax and allows
>> for the most flexibility for applications using it.
>>> The problem with the proposals I've seen so far are similar to
>> issues I have often seen with language tags elsewhere at W3C:
>> language tags have an internal structure made up of subtags
>> separated by hyphens. If one specifies "en*" (or, better, "en" or
>> "en-*"), this should match tags like "en-US" or "en-GB", but not
>> "ena" or "enf-US". That is, the tokens should be interpreted as
>> subtags.
>>> In reviewing plans, I noticed this message as the most recent
>> reference about formats and such [3]. This gave me a few concerns:
>>> 1. I'm not sure I like the name "internationalizedString". I
>> realize that this is an expansion on xsd:string and thus needs a
>> different name. However, it implies that other strings are somehow
>> "not internationalized". Perhaps something along the lines of
>> "languageString", "nlString" (nl for natural language), or similar.
>>> 2. Definitely langPattern should be case insensitive.
>> Alternatively, it is permitted to normalized both the literal and
>> the pattern to lowercase for matching purposes.
>>> 3. It would be best to use the terminology from RFC 4647 to the
>> extent possible. One question would be whether langPattern could be
>> a true "language priority list" (i.e. have more than one "language
>> range" in it). That would allow one to say something like:
>>>    DatatypeRestriction(owl:internationalizedString langPattern
>> "en,fr")
>>> ... which would mean: any string in some flavor of English or
>> French (but not, say, German or Japanese), and inclusive of tags
>> such as "fr-CA" and "EN-us".
>>> This may be difficult, since I don't think other pattern strings
>> allow for internal structure.
>>> I'd be happy, personally and on behalf of the I18N Core WG, to
>> spend time discussing this with your WG as appropriate. Please note
>> that I'm also the editor of BCP 47 and that a new revision is
>> coming up. It won't affect this discussion, but it is a good reason
>> why one should reference the BCP number and not the RFC :-)
>>> Best Regards,
>>>
>>> Addison
>>>
>>> [1] http://lists.w3.org/Archives/Public/public-i18n-
>> core/2008AprJun/0065.html
>>> [2] http://www.w3.org/2008/06/04-core-minutes.html#item07
>>> [3] http://lists.w3.org/Archives/Public/public-owl-
>> wg/2008May/0019.html
>>>
>>> Addison Phillips
>>> Globalization Architect -- Lab126
>>>
>>> Internationalization is not a feature.
>>> It is an architecture.
>>>
>>>
>>>


-- 
Dr. Axel Polleres, Digital Enterprise Research Institute (DERI)
email: axel.polleres@deri.org  url: http://www.polleres.net/

Everything is possible:
rdfs:subClassOf rdfs:subPropertyOf rdfs:Resource.
rdfs:subClassOf rdfs:subPropertyOf rdfs:subPropertyOf.
rdf:type rdfs:subPropertyOf rdfs:subClassOf.
rdfs:subClassOf rdf:type owl:SymmetricProperty.
Received on Wednesday, 9 July 2008 21:46:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 July 2008 21:46:38 GMT