W3C home > Mailing lists > Public > public-html@w3.org > August 2007

Re: CR and LF in the input stream / as NCRs (detailed review of parsing algorithm)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Thu, 2 Aug 2007 15:46:20 -0400
Message-ID: <6b9c91b20708021246r284b48bcpef8a42046a33c14d@mail.gmail.com>
To: "Simon Pieters" <simonp@opera.com>
Cc: public-html <public-html@w3.org>

On 7/31/07, Simon Pieters <simonp@opera.com> wrote:
> Personally, I think attribute values should be parsed the same way as data
> is parsed wrt linebreaks.

That would be cool, but...

I think the ideal newline normalization for parsing (even attribute
values) into the DOM is:

line1\rline2 -> line1\nline2
line1\nline2 -> line1\nline2
line1\r\nline2 -> line1\nline2
line1\n\rline3 -> line1\n\nline3

line1&#13;line2 -> line1\nline2
line1&#10;line2 -> line1\nline2
line1&#13;&#10;line2 -> line1\nline2
line1&#10;&#13;line3 -> line1\n\nline3

line1&#x0D;line2 -> line1\nline2
line1&#x0A;line2 -> line1\nline2
line1&#x0D;&#x0A;line2 -> line1\nline2
line1&#x0A;&#x0D;line3 -> line1\n\nline3

Firefox and Safari pretty much do this except for the \r\n situation.
A raw \r\n pair will be normalized to \n, but if \r\n is made up of
entities, it gets normalized to \n\n.

So, for stuff besides attribute values, the Firefox and Safari
normalization generally looks the same as above except for:

line1&#13;&#10;line2 -> line1\n\nline2
line1&#x0D;&#x0A;line2 -> line1\n\nline2

Opera generally doesn't do any normalization except for textareas
where it normalizes newlines to \r\n to be form data friendly I guess.
IE generally normalizes like Firefox and Safari, but normalizes to \r instead.

For attributes values, Firefox, Safari, Opera and IE generally accept
what's there and don't normalize anything. (exceptions being certain
elements like the param element for Opera , IE and I think certain
situations with applets and params in Firefox.)

So, based on all that, so far, I'd say that newlines in attributes
(raw or entity-based) should not be normalized (only because doing so
would be too big of a change?). For the rest of the markup, Firefox
and Safari's way of normalization (just talking storage, not
rendering) (minus the entity-based \r\n quirk) would probably be
ideal.

At any rate, if all browsers normalized the same, that would be something.

-- 
Michael
Received on Thursday, 2 August 2007 19:46:36 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 9 May 2012 00:16:03 GMT