[whatwg] document.write("\r"): the spec doesn't say how to handle it. from Henri Sivonen on 2011-12-19 (public-whatwg-archive@w3.org from December 2011)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 19 Dec 2011 13:28:13 +0200
Message-ID: <CAJQvAucQyrpu0zjJW1O=cbbSf_GP88EONs_5TE3CK_R_WDjvwQ@mail.gmail.com>

On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson <ian at hixie.ch> wrote:
> I can remove the text "one at a time", if you like. Would that be
> satisfactory? Or I guess I could change the spec to say that the parser
> should process the characters, rather than the tokenizer, since really
> it's the whole shebang that needs to be involved (stream preprocessor and
> everything). Any opinions on what the right text is here?

I'd like the CRLF preprocessing to be defined as an eager stateful
operation so that there's one bit of state: "last was CR". Then, input
is handled as follows:
If the input character is CR, set "last was CR" to true and emit LF.
If the input character is LF and "last was CR" is true, don't emit
anything and set "last was CR" to false.
If the input character is LF and "last was CR" is is false, emit LF.
Else set "last was CR" to false and emit the input character.

Where "emit" feeds into the tokenizer. By "eager", I mean that the
operation described above doesn't buffer. I.e. the first case emits an
LF upon seeing a CR without waiting for an LF also to appear in the
input.

-- 
Henri Sivonen
hsivonen at iki.fi
http://hsivonen.iki.fi/

Received on Monday, 19 December 2011 03:28:13 UTC