Re: [xml-dev] The XML 1.1 Candidate Recommendation (fwd)

At 7:09 AM -0400 10/16/02, John Cowan wrote:

>>  Unicode character normalization should be performed on XML documents,
>>  unless you don't feel like it, in which case you can ignore it. This almost
>>  makes sense. Basically it says that parsers may change an e followed by a
>>  combining accent acute into the single character é if they want to or the
>>  client asks for it. The details are quite complicated, but at least it's
>>  optional.
>
>No, not at all!  XML 1.1 says that parsers should *check* normalization,
>not that they should *perform* it.  So a parser that sees an e followed
>by a combining acute should report the lack of normalization to the
>calling application.
>

No, I still think there's an issue here, though maybe I don't have my 
finger on it yet. Even if the document isn't transformed into 
normalized form, the processor might still validate against the 
normalized form. Maybe the correct behavior just needs to be spelled 
out better.

This is another one of those annoying errors that isn't exactly a 
well-formedness error but it isn't exactly a validity error or a 
warning either.  At least as written, it's in the grey area of XML 
error reporting, and that's caused problems before. The exact 
behavior of a parser encountering non-normalized text should be 
locked down, probably as a warning, not an error of any kind. That 
is, parsers should be required to continue processing correctly after 
encountering non-normalized text.

Of course this is really the wrong solution to the problem. The right 
solution is to kill XML 1.1 completely.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+


-- 
John Cowan    http://www.ccil.org/~cowan   <jcowan@reutershealth.com>
    "Any legal document draws most of its meaning from context. A telegram
    that says 'SELL HUNDRED THOUSAND SHARES IBM SHORT' (only 190 bits in
    5-bit Baudot code plus appropriate headers) is as good a legal document
    as any, even sans digital signature." --me

Received on Wednesday, 16 October 2002 08:51:09 UTC