- From: John Cowan <jcowan@reutershealth.com>
- Date: Wed, 16 Oct 2002 08:49:35 -0400 (EDT)
- To: www-xml-blueberry-comments@w3.org
At 7:09 AM -0400 10/16/02, John Cowan wrote: >> Unicode character normalization should be performed on XML documents, >> unless you don't feel like it, in which case you can ignore it. This almost >> makes sense. Basically it says that parsers may change an e followed by a >> combining accent acute into the single character é if they want to or the >> client asks for it. The details are quite complicated, but at least it's >> optional. > >No, not at all! XML 1.1 says that parsers should *check* normalization, >not that they should *perform* it. So a parser that sees an e followed >by a combining acute should report the lack of normalization to the >calling application. > No, I still think there's an issue here, though maybe I don't have my finger on it yet. Even if the document isn't transformed into normalized form, the processor might still validate against the normalized form. Maybe the correct behavior just needs to be spelled out better. This is another one of those annoying errors that isn't exactly a well-formedness error but it isn't exactly a validity error or a warning either. At least as written, it's in the grey area of XML error reporting, and that's caused problems before. The exact behavior of a parser encountering non-normalized text should be locked down, probably as a warning, not an error of any kind. That is, parsers should be required to continue processing correctly after encountering non-normalized text. Of course this is really the wrong solution to the problem. The right solution is to kill XML 1.1 completely. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML in a Nutshell, 2nd Edition (O'Reilly, 2002) | | http://www.cafeconleche.org/books/xian2/ | | http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ | +----------------------------------+---------------------------------+ -- John Cowan http://www.ccil.org/~cowan <jcowan@reutershealth.com> "Any legal document draws most of its meaning from context. A telegram that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in 5-bit Baudot code plus appropriate headers) is as good a legal document as any, even sans digital signature." --me
Received on Wednesday, 16 October 2002 08:51:09 UTC