- From: Jeremy Carroll <jjc@hpl.hp.com>
- Date: Thu, 09 Feb 2006 12:56:27 +0000
- To: "Vincent, Paul D" <PaulVincent@fairisaac.com>
- CC: public-rif-wg@w3.org
Vincent, Paul D wrote: > I have always assumed that I18N or indeed localization issues are a user > interface aspect, not a rule representation issue. Certainly this is the > case in the commercial vendor space (that I am in). > > However, it could be that this is another facet of the human-readable rule > use case (ie interchange of rule documentation, like policy statements for > custom regulations). However, I have not seen "translation of rules" in > any requirement. > The sort of problem I had in mind concerned the use of Unicode compatibility characters and the various Unicode normal form issues. The type of scenario I had in mind was: ==== User A creates a set of rules using a Chinese input system. These rules are exchanged with user B who uses a Japanese input system. User B prints out the rules, and starts using referring to them by typing in their names. Without some care, this scenario does not work, because users A and B may well have their systems set up so that: User A maps chinese characters into the Unicode code space with round trip compatibility with their system's default encoding (which may be Big5); whereas user B maps chinese characters into the Unicode code space with round trip compatibility with Shift-JIS maybe .... My understanding (which is limited) is that this can result in essentially the same visual representation being encoded in different ways in Unicode, so that when user B enters a rule name, they have in fact entered a typo. ==== To have a similar scenario on grounds where I feel slightly more confident, but because of the more limited scope, I would be surprised if this is a problem in practice is the following: ==== User A creates a set of rules using a US Keyboad. These rules are exchanged with user B who uses a Japanese input system. User B prints out the rules, and starts using referring to them by typing in their names. Without some care, this scenario does not work, because users A and B may well have their systems set up so that: User A: produces the standard ASCII character codes in the range 0x00XX. User B: produces the compatibility 'wide' versions of these characters in the range 0xFFXX. ==== I believe the bulk of these problems can be addressed by identifying rules by IRIs [1], particularly if we specify that SHOULDs in the IRI spec should be treated as MUSTs for the purposes of RIF (or perhaps even require Unicode normal form NFKC, which is encouraged but not formally RECOMMENDED in the IRI spec. More information about Unicode normalization can be found at [2]. Jeremy [1] RFC 3987, International Resource Identifiers http://www.apps.ietf.org/rfc/rfc3987.html [2] Character Model for the World Wide Web 1.0: Normalization http://www.w3.org/TR/charmod-norm/
Received on Thursday, 9 February 2006 12:56:56 UTC