A comment about I18N proposal by Axel Polleres (ISSUE-126 and ISSUE-71) from Boris Motik on 2008-07-17 (public-owl-wg@w3.org from July 2008)

From: Boris Motik <boris.motik@comlab.ox.ac.uk>
Date: Thu, 17 Jul 2008 09:54:22 +0100
To: <public-owl-wg@w3.org>
Message-ID: <000f01c8e7ea$b4eb6b70$7212a8c0@wolf>

Hello,

At yesterday's teleconf, Ivan drew my attention to the proposal by Axel Polleres for owl:internationalizedString:

http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0223.html

If I understood the idea correctly, Axel is proposing to have a datatype per language tag. Thus, you'd have something like lang:en
datatype, which would contain all strings in English (lang: is a namespace prefix yet to be defined). Furthermore, you might have
the lang:en-US datatype, which would contain all strings in the US variant of English. The datatype lang:en-US would be a
subdatatype of lang:en; hence, if you asked for all strings in English, you would obtain also all strings in the US variant as well.
Please correct me if I summarized the proposal incorrectly -- I apologize in advance.

I'm not really sure what the value space of all these datatypes would be. If you want to make literals of the form "aaa"@en and
"aaa"@en-US be different things (i.e., if you want to give them different identity), then you need to have different objects in the
value space. Axel's e-mail is silent about the value spaces; however, I assume that each literal with a language tag is still mapped
to a pair of the form

(text,langTag).

If this were not the case -- for example, if you mapped "aaa"@en and "aaa"@en-US to the same object "aaa" -- then there would be no
way you can distinguish different values in the interpretation of lang:en and lang:en-US. Hence, it seems reasonable for me to
assume that the value space of datatypes in the Axel's proposal is identical to the value space of my proposal in
(http://lists.w3.org/Archives/Public/public-owl-wg/2008Jul/0306.html).

Furthermore, Axel's proposal is silent about the treatment of xsd:string. Since the value spaces in my and his proposal are the
same, however, I don't see any problem in mapping literals of the form "aaa"^^xsd:string into ("aaa","") -- that is, into pairs with
the empty value tag.

In fact, it seems to me that Axel's proposal is more related to ISSUE-71, which asks for a mechanism for identifying all strings in
a particular language. My proposal hasn't so far addressed this issue at all. In fact, I believe that ISSUE-71 is orthogonal to the
problem of structuring the value space of internationalized strings (which is the main goal of ISSUE-126). To be more precise, I
believe that, if we addressed ISSUE-126 in the way I outlined earlier, there would be nothing preventing us from employing Axel's
proposal for addressing ISSUE-71. The only thing we need to do is define the value spaces for of each of different lang:* datatypes.
For example, the value space of lang:en would be defined as the set of pairs of the form

("*","en[-*]")

(I hope you understand my pidgin regular expressions). To summarize, I believe we can go forward with ISSUE-126 and come back to
ISSUE-71 later.

Regarding Axel's proposal for addressing ISSUE-71, it seems quite reasonable. I would like, however, to point out that ISSUE-71 can
be addressed in a rather simple way by simply adding another facet langTagPattern. This facet would take a regular expression and
would restrict the value space of owl:internationalizedString to the set of pairs in which the language tag matches the regular
expression. For example, the datatype restriction

DatatypeRestriction( owl:internationalizedString langTagPattern "en[-*]" )

would have as the value space the set of pairs of the form

("*","en[-*]")

and would thus select all strings written in some variant of English. In contrast,

DatatypeRestriction( owl:internationalizedString langTagPattern "en" )

would have as the value space the set of pairs of the form

("*","en")

and would select only the strings that have no sublanguage specified. The regular expressions would thus provide us with quite a bit
of flexibility; in particular, it would allow us to explicitly distinguish between values with no language tags, only the language
tag, language+sublanguage lag, and so. I believe also the proposal would be really simple to implement: the extensions to the
datatype reasoning algorithm from my ISWC 2008 paper are rather trivial.

Regards,

Boris

Regards,

Boris

Received on Thursday, 17 July 2008 08:56:00 UTC