- From: Phillips, Addison <addison@amazon.com>
- Date: Fri, 21 Nov 2008 09:31:48 -0800
- To: Erik Rissanen <erik@axiomatics.com>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Hi Erik, No problem. Both I and the working group are glad to be of help. Addison Addison Phillips Globalization Architect -- Lab126 Internationalization is not a feature. It is an architecture. > -----Original Message----- > From: Erik Rissanen [mailto:erik@axiomatics.com] > Sent: Friday, November 21, 2008 12:26 AM > To: Phillips, Addison > Cc: www-international@w3.org; public-i18n-core@w3.org > Subject: Re: Questions/feedback on character normalization > > Hi Addison, > > At the XACML TC meeting yesterday we decided to adopt the text > which you > are proposing. Thank you very much to the I18N WG for helping us > get > this issue straight in the XACML 3.0 spec. > > Best regards, > Erik > > Phillips, Addison wrote: > > Hi Erik, > > > > Sorry about the delay in responding. Note that this response is > on behalf of the I18N Core WG. > > > > <quot> > > That clears up my confusion. But it also means that the current > draft > > does not satisfy the requirements of XACML. Instead I will write > like > > this in a section on unicode issues: > > </quot> > > > > That's correct. Your text isn't quite correct yet. Here's your > proposal: > > > > --8<-- > > In Unicode it is possible to represent some letters by different > > character sequences. The process of converting Unicode strings > into > > canonical character sequences is called normalization. An > operation is > > normalization-sensitive if its output(s) are different depending > on the > > state of normalization of the input(s); if the output(s) are > textual, > > they are deemed different only if they would remain different > were they > > to be normalized. (Quoted from > > [http://www.w3.org/TR/2005/WD-charmod-norm-20051027]) > > > > For more information on normalization see > > [http://www.w3.org/TR/2005/WD-charmod-norm-20051027]. > > > > An XACML implementation MUST NOT perform any normalization- > sensitive > > operations unless it has ensured that the inputs are normalized. > An > > XACML implementation MUST behave as if each normalization- > sensitive > > operation normalizes the string into Unicode normalization form > NFC. An > > implementation MAY use some other form of internal processing as > long as > > the externally visible results are identical to this > specification. > > --8<-- > > > > I would propose something more like: > > > > -- > > In Unicode, some equivalent characters can be represented by more > than one different Unicode character sequence. See > [http://www.w3.org/TR/CharMod]. The process of converting Unicode > strings into equivalent character sequences is called > "normalization" [http://www.unicode.org/reports/tr15]. Some > operations, such as string comparison, are sensitive to > normalization. An operation is normalization-sensitive if its > output(s) are different depending on the state of normalization of > the input(s); if the output(s) are textual, they are deemed > different only if they would remain different were they to be > normalized. > > > > For more information on normalization see > > [http://www.w3.org/TR/2005/WD-charmod-norm-20051027]. > > > > An XACML implementation MUST behave as if each normalization- > sensitive > > operation normalizes input strings into Unicode Normalization > Form C ("NFC"). An > > implementation MAY use some other form of internal processing > (such as using a non-Unicode, "legacy" character encoding) as long > as > > the externally visible results are identical to this > specification. > > -- > > > > Later you note: > > > > <quot> > > For our string equility function I will write "The two strings > are > > equal, if they result in identical binary sequences when encoded > into a > > common Unicode encoding form". > > </quot> > > > > This isn't complete. It should probably say instead something > like: > > > > -- > > The two strings are equal, if they are composed of identical code > point sequences when normalized to Unicode Normalization Form C. > > -- > > > > It is true that two binary-identical strings in the same encoding > are equal. However, some strings that are not binary identical are > still 'equal', even when the same character encoding is used > (that's what normalizing the strings exposes). > > > > <quot> > > One question though: Is it possible that there are some strings > which > > cannot be normalized into NFC? If so, we need to define error > behavior > > where this can occur. > > </quot> > > > > This can never occur. All strings can be normalized to NFC. > > > > I hope this helps. Please don't hesitate to contact our WG or me > for more feedback or information. > > > > Best Regards, > > > > Addison > > > > Addison Phillips > > Globalization Architect -- Lab126 > > Chair -- W3C Internationalization Core WG > > > > Internationalization is not a feature. > > It is an architecture. > >
Received on Friday, 21 November 2008 17:32:26 UTC