Questions/feedback on character normalization from Erik Rissanen on 2008-10-30 (www-international@w3.org from October to December 2008)

From: Erik Rissanen <erik@axiomatics.com>
Date: Thu, 30 Oct 2008 15:40:07 +0100
To: www-international@w3.org
Message-ID: <4909C747.6090104@axiomatics.com>

Hello,

I am editing the upcoming 3.0 version of the OASIS XACML Standard. XACML 
is an XML based language for access control policies. Since it is XML 
based we have to deal with the issues of unicode strings. In version 2.0 
and earlier the XACML specification did not specify these unicode 
issues, so we are now working on being more clear in the next version. 
Our need is mostly string identity matching and also case insensitive 
matching.

I have found the document at 
http://www.w3.org/TR/2005/WD-charmod-norm-20051027/ very useful, but 
there are some things I don't understand in it.

1. For string identity matching, in section C312: Why must the 
normalization be done by the producers of the strings to be compared? 
For XACML, this is difficult, since the strings are produced by 
components outside the XACML specification scope, such as LDAP servers 
for instance. Maybe I don't understand what is meant.

2. Also, isn't it possible that step 3 of the algorithm for string 
identity mapping results in a non-normalized string? For example, let 
the one string consist of the single character U+00E7 and the second 
string consist of the character U+0063 followed by the string "&#807;", 
which is an XML character escape. These two strings should match, right? 
Yet as I can understand the algorithm, they won't.

Best regards,
Erik

Received on Thursday, 30 October 2008 16:46:34 UTC