- From: Robert J Burns <rob@robburns.com>
- Date: Wed, 4 Feb 2009 16:35:01 -0600
- To: fantasai <fantasai.lists@inkedblade.net>
- Cc: public-i18n-core@w3.org, W3C Style List <www-style@w3.org>
HI fantasai, On Feb 4, 2009, at 4:13 PM, fantasai wrote: > Robert J Burns wrote: >> However Unicode has a SHOULD requirement that two canonically >> equivalent but codepoint differing strings match. Unicode's Chapter >> 3 (C6 norm) says: >>> A process shall not assume that the interpretations of two >>> canonical-equivalent character sequences are distinct. > > That's a MUST requirement. SHALL == MUST, see RFC2119. You're right thanks for pointing that out. I don't often encounter 'shall' and incorrectly assumed it mapped to 'should'. However, my other arguments remain since this is some quite convoluted prose in the Unicode Standard that requires one understand the reason canonical equivalent characters exist in the first place (i.e., why separate characters have been designated canonical equivalents). This norm really then creates a requirement that UAs not make assumptions that would undermine the proper treatment of canonically equivalent character sequences. So it is still not a MUST that canonically equivalent character sequences be treated properly (which I wish it was), but a MUST that UAs not get in the way of such proper treatment. The performance issues raised by Henri could actually work in our favor. That is that if we (avoiding the bike shed debates) simply pick NFC as the W3C endorsed normalization form for authoring, then we can require UAs normalize to NFC. Then the performance hits are the responsibility of the authors themselves who go against recommendations and produce NFD or non-normalized content. Confirming NFC upon parsing is not a performance hit worth discussing. If any performance hit arises it would be due to needing to rearrange combining characters and replace them with their canonical equivalents. Take care, Rob
Received on Wednesday, 4 February 2009 22:35:41 UTC