- From: Uche Ogbuji <uche@ogbuji.net>
- Date: Wed, 12 Sep 2012 07:38:46 -0600
- To: Mike Sokolov <sokolov@falutin.net>
- Cc: Tony Graham <tgraham@mentea.net>, public-microxml@w3.org
- Message-ID: <CAPJCua0xnCbY_o-uAoi0gNt9n2djuDLfubTou0_J1o3Lhv0HwQ@mail.gmail.com>
On Wed, Sep 12, 2012 at 7:25 AM, Mike Sokolov <sokolov@falutin.net> wrote: > On 09/12/2012 09:16 AM, Tony Graham wrote: > >> >> I don't know whether this has been discussed, but while the current draft >> specifies UTF-8 only, but another way to simplify the character processing >> (post-parser) would be to also specify Normalization Form C [2][3], which >> would mean there would be only one way in MicroXML documents to represent >> particular characters. >> >> > NFC is called out in the Editor's Draft; I think the idea is you can use > what you want, but parsers are free to normalize, caveat emptor, you might > not get what you expect unless you use NFC. At least that was my breezy > interpretation :) Read the spec if you want precision... > It's rather stronger than that: 4.1 Document conformance ... [Unicode] says that canonically equivalent sequences of characters ought to be treated as identical. However, documents that are canonically equivalent according to Unicode but that use distinct code point sequences are considered distinct by MicroXML parsers. This gives rise to the possibility that the user might unintentionally create sequences of characters that are canonically equivalent but are treated as distinct by MicroXML parsers. To avoid this possibility, all documents SHOULD be in Normalization Form C as described by [Unicode]. That's an RFC SHOULD, which means use Normalization Form C unless you have an absolutely compelling reason not to. I do wonder why not just strengthen it that little bit to a MUST. -- Uche Ogbuji http://uche.ogbuji.net Founding Partner, Zepheira http://zepheira.com http://wearekin.org http://www.thenervousbreakdown.com/author/uogbuji/ http://copia.ogbuji.net http://www.linkedin.com/in/ucheogbuji http://twitter.com/uogbuji
Received on Wednesday, 12 September 2012 13:39:18 UTC