Issue: inconsistency of S production and treatment of line endings.

In XML 1.0, the S production includes:

S := #x9 | #xA | #xD | #x20

In the discussion of handling of line endings, it is stated that #xD #xA
is normalized to #xA, as is #xD alone.

No specification seems to be made for the order of processing.  Because
#xD is included in the space production, a processor might tokenize
before normalization of line endings.

In the 1.1 draft, the S production is unchanged.  However, the handling
of line endings now includes normalization of #xD #x85, #x85, and
#x2028.

This creates an inconsistency with XML 1.0 which needs to be addressed. 
I can see three possible resolutions:

1) Add language requiring line ending normalization before tokenization
(impose processing order requirements).  For consistency, redefine S to
remove #xD, which cannot appear after line ending normalization.

2) Change the S production to include all line ending characters before
normalization:

S := #x9 | #xA | #xD | #x20 | #x85 | #x2028

3) Do not change line endings in 1.1.

Amy!
-- 
Amelia A. Lewis
Architect, TIBCO/Extensibility, Inc.
alewis@tibco.com

Received on Wednesday, 16 October 2002 10:06:02 UTC