- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 19 Dec 2011 13:28:13 +0200
On Wed, Dec 14, 2011 at 2:00 AM, Ian Hickson <ian at hixie.ch> wrote: > I can remove the text "one at a time", if you like. Would that be > satisfactory? Or I guess I could change the spec to say that the parser > should process the characters, rather than the tokenizer, since really > it's the whole shebang that needs to be involved (stream preprocessor and > everything). Any opinions on what the right text is here? I'd like the CRLF preprocessing to be defined as an eager stateful operation so that there's one bit of state: "last was CR". Then, input is handled as follows: If the input character is CR, set "last was CR" to true and emit LF. If the input character is LF and "last was CR" is true, don't emit anything and set "last was CR" to false. If the input character is LF and "last was CR" is is false, emit LF. Else set "last was CR" to false and emit the input character. Where "emit" feeds into the tokenizer. By "eager", I mean that the operation described above doesn't buffer. I.e. the first case emits an LF upon seeing a CR without waiting for an LF also to appear in the input. -- Henri Sivonen hsivonen at iki.fi http://hsivonen.iki.fi/
Received on Monday, 19 December 2011 03:28:13 UTC