[whatwg] document.write("\r"): the spec doesn't say how to handle it. from Ian Hickson on 2011-12-14 (public-whatwg-archive@w3.org from December 2011)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 14 Dec 2011 00:00:26 +0000 (UTC)
Message-ID: <Pine.LNX.4.64.1112132356370.18028@ps20323.dreamhostps.com>

On Wed, 2 Nov 2011, David Flanagan wrote:
>
> The spec for document.write()
> http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#dom-document-write
> says: "... have the tokenizer process the characters that were inserted, one
> at a time, processing resulting tokens as they are emitted, and stopping when
> the tokenizer reaches the insertion point..."
> 
> But what happens if the last character written by document.write() is a
> carriage return?
> 
> The HTML parsing spec says that CR followed by LF is ignored but CR 
> followed by anything else is converted to LF.  So if the last character 
> is CR, then the tokenizer can't process all characters up to the 
> insertion point because it needs to lookahead at the next character, 
> right?

I can remove the text "one at a time", if you like. Would that be 
satisfactory? Or I guess I could change the spec to say that the parser 
should process the characters, rather than the tokenizer, since really 
it's the whole shebang that needs to be involved (stream preprocessor and 
everything). Any opinions on what the right text is here?


> Similarly, what should the tokenizer do if the document.write emits half 
> of a UTF-16 surrogate pair as the last character?

Can you elaborate on what difficulty this would present?

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 13 December 2011 16:00:26 UTC