Re: [RIF][Usecases?] RE: I18N requirements from Jeremy Carroll on 2006-02-09 (public-rif-wg@w3.org from February 2006)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Thu, 09 Feb 2006 12:56:27 +0000
To: "Vincent, Paul D" <PaulVincent@fairisaac.com>
CC: public-rif-wg@w3.org
Message-ID: <43EB3BFB.1060002@hpl.hp.com>

Vincent, Paul D wrote:
> I have always assumed that I18N or indeed localization issues are a user 
 > interface aspect, not a rule representation issue. Certainly this is the
 > case in the commercial vendor space (that I am in).
> 
> However, it could be that this is another facet of the human-readable rule 
 > use case (ie interchange of rule documentation, like policy 
statements for
 > custom regulations). However, I have not seen "translation of rules" in
 > any requirement.
> 

The sort of problem I had in mind concerned the use of Unicode 
compatibility characters and the various Unicode normal form issues.

The type of scenario I had in mind was:

====
User A creates a set of rules using a Chinese input system.

These rules are exchanged with user B who uses a Japanese input system.
User B prints out the rules, and starts using referring to them by 
typing in their names.

Without some care, this scenario does not work, because users A and B 
may well have their systems set up so that:
User A maps chinese characters into the Unicode code space with round 
trip compatibility with their system's default encoding (which may be 
Big5); whereas user B maps chinese characters into the Unicode code 
space with round trip compatibility with Shift-JIS maybe .... My 
understanding (which is limited) is that this can result in essentially 
the same visual representation being encoded in different ways in 
Unicode, so that when user B enters a rule name, they have in fact 
entered a typo.
====

To have a similar scenario on grounds where I feel slightly more 
confident, but because of the more limited scope, I would be surprised 
if this is a problem in practice is the following:

====
User A creates a set of rules using a US Keyboad.

These rules are exchanged with user B who uses a Japanese input system.
User B prints out the rules, and starts using referring to them by 
typing in their names.

Without some care, this scenario does not work, because users A and B 
may well have their systems set up so that:
User A: produces the standard ASCII character codes in the range 0x00XX.
User B: produces the compatibility 'wide' versions of these characters 
in the range 0xFFXX.

====

I believe the bulk of these problems can be addressed by identifying 
rules by IRIs [1], particularly if we specify that SHOULDs in the IRI 
spec should be treated as MUSTs for the purposes of RIF (or perhaps even 
require Unicode normal form NFKC, which is encouraged but not formally 
RECOMMENDED in the IRI spec.

More information about Unicode normalization can be found at [2].

Jeremy

[1]
RFC 3987, International Resource Identifiers
http://www.apps.ietf.org/rfc/rfc3987.html

[2]
Character Model for the World Wide Web 1.0: Normalization
http://www.w3.org/TR/charmod-norm/

Received on Thursday, 9 February 2006 12:56:56 UTC