Attribute value normalization post E70

The rules for attribute value normalization need to be looked at yet again.

Yes I have seen Erratum E70, and it is fine as far as it goes.

The issue I am raising here is that the method of attribute value
normalization does not make sense and should be improved for the sake of
design consistency.

Consider an attribute containing whitespace sprinkled throughout an
attribute of type other than CDATA.  According to both the XML spec and
Erratum E70, the whitespace is further normalized by stripping all leading
and trailing whitespace and by reducing all strings of consecutive
whitespace chars to a single space.

I think that this whitespace normalization scheme only works for NMTOKENS.
The remaining non-CDATA attribute types like ID would benefit from just
taking all of the whitespace out period, as well as any other characters not
permitted by the Name production.  Otherwise, why bother normalizing these
attributes at all since the result after normalization is that the validity
constraints are still being violated?

This issue comes up in determining the difference between validating and
non-validating processors when generating a canonical form for XML to be
used by an XML signature.  If the signer uses a non-validating processor, he
may be able to create a signature over data that cannot be validated by a
verifier who uses a validating processor.

Sure it's a weird case where the signer gets what he pays for, but it's also
a consistency thing.

John Boyer
PureEdge Solutions Inc.
jboyer@PureEdge.com

Received on Thursday, 14 September 2000 18:03:09 UTC