- From: Simon Montagu <smontagu@smontagu.org>
- Date: Thu, 01 Jun 2006 07:57:32 +0200
- To: www-international@w3.org
I am trying to understand the practical implications of the "Character Model Normalization" document, with particular reference to web browsers and DOM interfaces. http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#C302 says: |A text-processing component that receives suspect text MUST NOT |perform any normalization-sensitive operations unless it has first |either confirmed through inspection that the text is in normalized form |or it has re-normalized the text itself. Private agreements MAY, |however, be created within private systems which are not subject to |these rules, but any externally observable results MUST be the same as |if the rules had been obeyed. I wrote a testcase based on my understanding of this paragraph: http://smontagu.org/testcases/normalizationTest.html The testcase uses 5 different forms of the text "ngữ", using different combinations and ordering of "u", U+01B0 (LATIN SMALL LETTER U WITH HORN), U+0303 (COMBINING TILDE), U+169 (LATIN SMALL LETTER WITH TILDE), U+031B (COMBINING HORN), and U+1EEF (LATIN SMALL LETTER U WITH HORN AND TILDE). Taking the examples of "normalization-sensitive operations" from http://www.w3.org/TR/2005/WD-charmod-norm-20051027/#def-normalization-sensitive, I tested counting the number of characters, deleting the last character and string comparisons. My understanding of C302 is that in all cases, the number of characters should be 3, deleting the last character should give "ng", and comparing any of the strings to any of the others should find them equal. I also tested creating a URL query string from the different forms of the text. Here, the browser is the producer of the query, so (by C312), it MUST perform full normalization. No browser that I tested (Firefox, IE6, Konquerer, Opera) performs normalization in any of the testcases. I realize that "CharNorm" is a Working Draft and it's early days to expect compliance, but are my assumptions at least correct in theory? Simon Montagu Mozilla i18n
Received on Thursday, 1 June 2006 04:51:29 UTC