Attribute normalisation from Richard Tobin on 2000-02-03 (xml-editor@w3.org from January to March 2000)

From: Richard Tobin <richard@cogsci.ed.ac.uk>
Date: Thu, 3 Feb 2000 14:21:43 GMT
To: xml-editor@w3.org
Cc: richard@cogsci.ed.ac.uk, ht@cogsci.ed.ac.uk
Message-Id: <5544.200002031421@doyle.cogsci.ed.ac.uk>

There is some uncertainty (discussed in xml-dev) about normalisation
of attributes containing character references.

Is the algorithm described in section 3.3.3 (updated in E24) applied
after entity expansion has already been done?  Presumably not, since
it includes processing entity references.  (It could mean that there
is an extra pass of entity expansion for attributes, but that would be
odd.)

Are the five bullet points in the algorithm intended to be mutually
exclusive alternatives for each character and reference, or are they
applied in sequence?  They appear to be alternatives, but some parsers
have interpreted it as meaning that the conversion of whitespace
characters to #x20 is done even to characters resulting from character
entity references.

To ensure that this is clear, how does the algorithm apply to this
example:

<!DOCTYPE el [
<!ELEMENT el ANY>
<!ATTLIST el at NMTOKENS #IMPLIED>
]>
<el at="a &#9; b"/>

Is the character reference converted to a space?

If it is not, is the document valid, and what value is returned to the
application?

If it is valid and the value returned is the sequence

  a space tab space b

then the normalisation has not had the (presumably desired) effect of
converting a tokenised atttribute to a sequence of NMTOKENs separated
by single space characters.

-- Richard

Received on Thursday, 3 February 2000 09:21:47 UTC