W3C home > Mailing lists > Public > www-xml-blueberry-comments@w3.org > July 2003

Re: XML 1.1 CR comment response for Harold-02

From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Thu, 3 Jul 2003 08:39:58 -0400
Message-Id: <p04330103bb29d2783987@[192.168.254.4]>
To: Paul Grosso <pgrosso@arbortext.com>
Cc: www-xml-blueberry-comments@w3.org

At 4:02 PM -0500 6/23/03, Paul Grosso wrote:
>In response to your email to the XML 1.1 CR recorded at
>http://lists.w3.org/Archives/Public/www-xml-blueberry-comments/2002Oct/0002.html
>the XML Core WG generated, discussed, and resolved the follow issue:
>
>Issue Harold-02:
>normalization in XML 1.1
>
>Summary resolution: rejected
>
>Response
>--------
>Regarding "Even if the document isn't transformed into normalized form, the
>processor might still validate against the normalized form." This is not true;
>the definition of match (unchanged in XML 1.0) is explicit:  Characters with
>multiple possible representations in ISO/IEC 10646 (e.g., characters with
>both precomposed and base+diacritic forms) match only if they have the same
>representation in both strings. Regarding "parsers should be 
>required to continue
>processing correctly after encountering non-normalized text." While 
>interoperability
>is of great importance, the XML specification tries to allow leeway 
>for various
>processors to handle questionable input in ways appropriate to their designed
>use. As noted, this is not the only place in the specification where such
>flexibility is permitted. The XML Core WG has consensus on the status quo
>wording here which leaves it open to implementations how to "report to the
>application" in the case of input that is not fully normalized.
>========
>
>Please let us know whether you accept our resolution of our comment,
>or wish to have an objection formally recorded.  If we do not hear
>from you within 10 days we will assume that you accept our response
>(though we would prefer to hear from you in any case if practical).
>

I absolutely do not accept this one. I think you have a major problem 
here, and I very much would like to record a formal objection. I went 
back and reread the currently published draft spec of XML 1.1. The 
current published version of this document leaves no room for the 
interpretation that parsers may validate and check for 
well-formedness against the normalized forms of characters when the 
unnormalized forms are present. As written <e'></> is malformed 
(where e' means e followed by combining accent acute).

This is actually what I think should be the case. However, it appears 
that some members of the working group do not believe this is true, 
and think it is optional for parsers to report a fatal error when 
encountering such an element. This may be what the working group 
intended to say, but it is not what the spec does say. If this is 
your intent, then you need to change the language of the spec to 
indicate that the BNF productions, well-formedness constraints, and 
validity rules are verified only after normalization has taken place.
-- 

   Elliotte Rusty Harold
   elharo@metalab.unc.edu
   Processing XML with Java (Addison-Wesley, 2002)
   http://www.cafeconleche.org/books/xmljava
   http://www.amazon.com/exec/obidos/ISBN%3D0201771861/cafeaulaitA
Received on Thursday, 3 July 2003 08:46:22 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Sunday, 22 March 2009 12:11:47 GMT