Erratum in summary list of Canonical XML spec

In the summary in section 1.1, the eleventh point states:

Special characters in attribute values and character content are
replaced by character references 

This should probably state something like:

Special characters in attribute values and character content are
replaced by predefined entity references, or if no such reference is
defined, by hexadecimal character references.

The exact wording can be worked on. However, most of the time
canonicalization replaces special characters in attribute values and
character content with entity references, not with character references. 

As currently written the statement implies that a character such < would
be replaced by &#x3C;. However, section 2.3 gives the more complete
description:

Attribute Nodes- a space, the node's QName, an equals sign, an open
quotation mark (double quote), the modified string value, and a close
quotation mark (double quote). The string value of the node is modified
by replacing all ampersands (&) with &amp;, all open angle brackets (<)
with &lt;, all quotation mark characters with &quot;, and the whitespace
characters #x9, #xA, and #xD, with character references. The character
references are written in uppercase hexadecimal with no leading zeroes
(for example, #xD is represented by the character reference &#xD;). 
Text Nodes- the string value, except all ampersands are replaced by
&amp;, all open angle brackets (<) are replaced by &lt;, all closing
angle brackets (>) are replaced by &gt;, and all #xD characters are
replaced by &#xD;. 

Note that section 2.4 of the XML 1.0 spec, 2nd edition, clearly
indicates that entity references are not a special kind of character
reference:

Text consists of intermingled character data and markup. [Definition:
Markup takes the form of start-tags, end-tags, empty-element tags,
entity references, character references, comments, CDATA section
delimiters, document type declarations, processing instructions, XML
declarations, text declarations, and any white space that is at the top
level of the document entity (that is, outside the document element and
not inside any other markup).]

Or from, the BNF Grammar:

Reference ::= EntityRef | CharRef

-- 
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+ 
|               Java I/O (O'Reilly & Associates, 1999)               |
|            http://metalab.unc.edu/javafaq/books/javaio/            |
|   http://www.amazon.com/exec/obidos/ISBN=1565924851/cafeaulaitA/   |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://metalab.unc.edu/javafaq/ | 
|  Read Cafe con Leche for XML News: http://metalab.unc.edu/xml/     |
+----------------------------------+---------------------------------+

Received on Wednesday, 9 May 2001 17:57:16 UTC