[whatwg] CR "entities" and LFCR from Michael A. Puls II on 2007-06-08 (public-whatwg-archive@w3.org from June 2007)

From: Michael A. Puls II <shadow2531@gmail.com>
Date: Fri, 8 Jun 2007 14:53:54 -0400
Message-ID: <6b9c91b20706081153re8d947ag42c4419623dc9967@mail.gmail.com>

On 6/8/07, Anne van Kesteren <annevk at opera.com> wrote:
> On Thu, 07 Jun 2007 23:12:38 +0200, Michael A. Puls II
> <shadow2531 at gmail.com> wrote:
> > On 6/7/07, Anne van Kesteren <annevk at opera.com> wrote:
> >> These should be converted to LF too. One thing that might be interesting
> >> to look into is the handling of LFCR in browsers (as opposed to CRLF). I
> >> haven't done that yet... Some browsers (just tested Opera) also
> >> normalize
> >> two newline entities following each other (CRLF pair).
> >
> > Not sure if it'll help, but whenever I do newline normalization to LF, I:
> >
> > Convert all CR + LF pairs to LF.
> > Then, I convert any CRs left over to LF.
>
> Sure, that's what the specification says to do as well. I was wondering if
> some user agents do something special for LFCR. For instance, if I
> remember correctly using \n\r in JavaScript gives a single newline in
> Firefox and two in Opera.

I believe Boris told me for FF, newline normalization (including
entities) is only done for parsing into the DOM and that any setting
of a string property in JS does zero newline normalization. So, if you
set \n\r, \n\r is stored as-is (which we visually equivalent as having
2 newlines) and if there needs to be any normalization, it needs to be
done by the author of the JS code.

As a side note, when checking how newlines are stored in js, I usually
do alert(encodingURIComponent(element.nodeValue)) for example, so I
can for sure see what newline characters are present.

-- 
Michael

Received on Friday, 8 June 2007 11:53:54 UTC