[whatwg] document.write("\r"): the spec doesn't say how to handle it. from David Flanagan on 2011-11-02 (public-whatwg-archive@w3.org from November 2011)

From: David Flanagan <dflanagan@mozilla.com>
Date: Wed, 02 Nov 2011 16:57:43 -0700
Message-ID: <4EB1D8F7.6050909@mozilla.com>

The spec for document.write() 
http://www.whatwg.org/specs/web-apps/current-work/multipage/elements.html#dom-document-write 
says: "... have the tokenizer process the characters that were inserted, 
one at a time, processing resulting tokens as they are emitted, and 
stopping when the tokenizer reaches the insertion point..."

But what happens if the last character written by document.write() is a 
carriage return?

The HTML parsing spec says that CR followed by LF is ignored but CR 
followed by anything else is converted to LF.  So if the last character 
is CR, then the tokenizer can't process all characters up to the 
insertion point because it needs to lookahead at the next character, right?

Firefox, Chrome and Safari all seem to do the right thing: wait for the 
next character before tokenizing the CR.  And I think this means that 
the description of document.write needs to be changed.  (Opera, on the 
other hand, just gets this wrong and emits a CR character).

Similarly, what should the tokenizer do if the document.write emits half 
of a UTF-16 surrogate pair as the last character?

     David

Received on Wednesday, 2 November 2011 16:57:43 UTC