Re: I18N issues an OWL2


I am writing this note in response to Jeremy Carroll's note of 21 May [1] and in response to an action item from the Internationalization Core WG [2]

I've reviewed the various issue tracker materials you have and have some comments. I hope you find these useful. Please note that these are currently personal and not WG comments.

First, a bit of summary/background. IETF BCP 47 defines language tags. BCP 47 used to be RFC 3066. Currently, it is two RFCs: 4646 and 4647. The latter of these is about "Matching of Language Tags", which is primarily the issue at hand. Generally speaking, there are several forms of matching that you might describe in OWL2. Given the general type of operations you provide, I think you'd be best off if you implemented something similar to "extended filtering" in 4647. This is the most "regular expression-like" syntax and allows for the most flexibility for applications using it.

The problem with the proposals I've seen so far are similar to issues I have often seen with language tags elsewhere at W3C: language tags have an internal structure made up of subtags separated by hyphens. If one specifies "en*" (or, better, "en" or "en-*"), this should match tags like "en-US" or "en-GB", but not "ena" or "enf-US". That is, the tokens should be interpreted as subtags.

In reviewing plans, I noticed this message as the most recent reference about formats and such [3]. This gave me a few concerns:

1. I'm not sure I like the name "internationalizedString". I realize that this is an expansion on xsd:string and thus needs a different name. However, it implies that other strings are somehow "not internationalized". Perhaps something along the lines of "languageString", "nlString" (nl for natural language), or similar.

2. Definitely langPattern should be case insensitive. Alternatively, it is permitted to normalized both the literal and the pattern to lowercase for matching purposes.

3. It would be best to use the terminology from RFC 4647 to the extent possible. One question would be whether langPattern could be a true "language priority list" (i.e. have more than one "language range" in it). That would allow one to say something like:

    DatatypeRestriction(owl:internationalizedString langPattern "en,fr")

... which would mean: any string in some flavor of English or French (but not, say, German or Japanese), and inclusive of tags such as "fr-CA" and "EN-us".

This may be difficult, since I don't think other pattern strings allow for internal structure.

I'd be happy, personally and on behalf of the I18N Core WG, to spend time discussing this with your WG as appropriate. Please note that I'm also the editor of BCP 47 and that a new revision is coming up. It won't affect this discussion, but it is a good reason why one should reference the BCP number and not the RFC :-)

Best Regards,





Addison Phillips
Globalization Architect -- Lab126

Internationalization is not a feature.
It is an architecture.

Received on Tuesday, 8 July 2008 22:55:22 UTC