[whatwg] CR "entities" and LFCR

On 6/7/07, Anne van Kesteren <annevk at opera.com> wrote:
> These should be converted to LF too. One thing that might be interesting
> to look into is the handling of LFCR in browsers (as opposed to CRLF). I
> haven't done that yet... Some browsers (just tested Opera) also normalize
> two newline entities following each other (CRLF pair).

Not sure if it'll help, but whenever I do newline normalization to LF, I:

Convert all CR + LF pairs to LF.
Then, I convert any CRs left over to LF.

Examples:

LF + CR + LF + CR -> LF + LF + LF.

CR + CR + LF -> LF + LF.

Anyway,

In the case of <!DOCTYPE
html><html><head><title></title></head><body><div>1&#10;&#13;2</div></body></html>

Opera produces LF + CR in the dom for the div nodeValue.

Firefox produces LF + LF (What I'd expect.)

IE6 produces a space. (If the div consists of only those 2 entities
(without the 1 and the 2), IE6 throws the newlines away and there will
be no childNodes for the div.)

FF's way seems right IMO.

-- 
Michael

Received on Thursday, 7 June 2007 14:12:38 UTC