[Bug 1307] [XQuery] Line Endings

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1307


scott_boag@us.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|chamberl@almaden.ibm.com    |scott_boag@us.ibm.com
             Status|ASSIGNED                    |NEW




------- Additional Comments From scott_boag@us.ibm.com  2005-05-17 19:53 -------
I can't see any technical issue with the grammar with doing uniform line ending
normalization, except the need for pre-processing of the VersionDecl in the case
of XQuery, which has to be done in any event.  One has to assume the encoding is
known for XPath, that that isn't an issue.

Since the normalization occurs essentially out-of-band to the syntax parsing
process, I don't think there has to be any effect to the rest of the document. 
Of course, a real world parser would not do two passes... it's just cleaner to
specify it this way.

I suggest a new section immediately above the section on whitespace, where we
pretty much do the same as the XML specifications.  I'm not very happy with how
the XML 1.0 vs. XML 1.1 wording is done in the first paragraph... this would be
easier if we had a proper XML 1.1 named feature, or the like.  Any suggestions
on ways to better handle this would be much appreciated.

=========
A.2.2 End-of-Line Handling

The [XPath/XQuery] processor MUST behave as if it normalized all line breaks on
input, before parsing. The normalization should be done according to the choice
to support [XML 1.0], or [XML 1.1] lexical processing.

A.2.2.1 XML 1.0 End-of-Line Handling

For [XML 1.0] processing, all of the following MUST be translated to a single
#xA character:

   1.  the two-character sequence #xD #xA
   2.   any #xD character that is not immediately followed by #xA.

A.2.2.2 XML 1.1 End-of-Line Handling

For [XML 1.1] processing, all of the following MUST be translated to a single
#xA character:

   1.  the two-character sequence #xD #xA
   2.  the two-character sequence #xD #x85
   3.  the single character #x85
   4.   the single character #x2028
   5.   any #xD character that is not immediately followed by #xA or #x85.

(XQuery-only)The characters #x85 and #x2028 cannot be reliably recognized and
translated until the VersionDecl declaration (if present) has been read. 
===========

Received on Tuesday, 17 May 2005 19:57:31 UTC