W3C home > Mailing lists > Public > public-qt-comments@w3.org > November 2015

[Bug 29217] [SER31] Serialization of newlines

From: <bugzilla@jessica.w3.org>
Date: Tue, 17 Nov 2015 02:33:51 +0000
To: public-qt-comments@w3.org
Message-ID: <bug-29217-523-Z9DrHF6kwE@http.www.w3.org/Bugs/Public/>
https://www.w3.org/Bugs/Public/show_bug.cgi?id=29217

--- Comment #10 from C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> ---
For the record, the editors believe that the answers to the initial question
raised in this report is:

- We believe that in the text output method, CR is to be emitted literally (as
are also NEL and LINE SEPARATOR, if anyone wonders), and #xA (LF or newline)
MAY be emitted as any string expected by the environment.

It follows from this that the test case mentioned here will need to be revised
(see bug 29249).

The questions raised in comment 2:

  1. What is the default for output methods other than XML or text?

For the XHTML output method, the rules are as for XML.  (This follows, we
think, from the statement in section 6.1.3 on the encoding parameter.)

For the HTML method, the text already says that any sequence of whitespace
characters can be output as any sequence that has the same effect in a browser.

For the JSON method, the issue appears to arise only with strings; the rules
for JSON escaping call for #xD to be represented \r and #xA as \n.  Whitespace
added by the serializer (e.g. when indent="yes") can contain whatever
characters the implementation likes.

For the adaptive method, the issue of newline handling occurs only for the item
separator.  This is specified by the user, and the spec does not provide for
the implementation to override the user's specification.

  2. Do newline characters need to be normalized (see my initial comment)?

No, not if "need to be" means "MUST be".  They MAY be, under the rules for the
XML, XHTML, HTML, and Text methods.  

  3. Does "newline" always refer to "&#xa;" sequences in the input, or 
  does it also refer to "&#xd&#xa;" ? 

When the word is used of characters in the XDM instance, we take it to mean
only #xA.  No instances of #xD in the XDM instance can have been part of line
ending sequences in any XML input:  they would have been omitted when the
newlines were normalized as part of XML parsing.  So any #xD in an XDM instance
created from XML will have had the XML form of a character reference; we think
it would be odd to refer to it as a newline.  (As for #xD characters in XDM
instances created from a non-XML source, we assume the creator of the XDM
instance will have been aware that XDM uses #xA as a line separator.  So
analogous considerations will apply.)

  4. Would it make sense to specify newline handling globally for 
  all rules in the spec?  

Perhaps, if we were drafting the spec from scratch.  But the cost/benefit ratio
seems to us too high to make us want to do it now. 

At tomorrow's joint call, the editors expect to present a change to the
spec that addresses this issue by adding the following note to section 8 
immediately before section 8.1:

  Note:

  The rule just stated applies to newline characters (#xA); it does not apply 
  to occurrences in the data model instance of carriage return (CR), NEL, 
  or LINE SEPARATOR characters; these should be output literally, regardless 
  of the conventions for line endings in the system environment.

  To illustrate, the following table shows the expected output for various 
  character sequences in environments which conventionally use #xA (LF, as in 
  Linux systems), #xD followed by #xA (CR+LF, Windows), #xD (CR only, older 
  versions of Mac OS), #x85 (NEL, some IBM operating systems), or #x2028 (LINE 
  SEPARATOR) to separate lines:

  -------------------------------------------------------------------------
   Input     | #xA       | #xD#xA    | #xD       | #x85      | #x2028 
             | systems   | systems   | systems   | systems   | systems
  -------------------------------------------------------------------------
   character | character | character | character | character | character
    #xD      |  #xD      |  #xD      |  #xD      |  #xD      |  #xD      
  -------------------------------------------------------------------------
   character | character | string    | character | character | character
    #xA      |  #xA      |  #xD+#xA  |  #xD      |  #x85     |  #x2028
  -------------------------------------------------------------------------
   string    | string    | string    | string    | string    | string
    #xD+#xA  |  #xD+#xA  |  #xD+#xD  |  #xD+#xD  |  #xD+#x85 |  #xD+#x2028
             |           |  +#xA     |           |           |           
  -------------------------------------------------------------------------
   string    | string    | string    | string    | string    | string
    #xD+#xD  |  #xD+#xD  |  #xD+#xD  |  #xD+#xD  |  #xD+#xD  |  #xD+#xD  
    +#xA     |  +#xA     |  +#xD+#xA |  +#xD     |  +#x85    |  +#x2028         
  -------------------------------------------------------------------------

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
Received on Tuesday, 17 November 2015 02:33:55 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 17 November 2015 02:33:56 UTC