Re: An HTML language specification

On Nov 21, 2008, at 21:31, Mark Baker wrote:

> On Fri, Nov 21, 2008 at 11:10 AM, Boris Zbarsky <bzbarsky@mit.edu>  
> wrote:
>> (Henri and you being quoted respectively).  Sounds to me like Henri  
>> is
>> correct: you have to keep state that includes at least all the  
>> bytes you had
>> before and the entire DOM you've built so far.
>
> So?  That's just state.


It's just state in the sense that any process is just memory/register  
state and program counter state.

The actual model in browsers is that parsing and script execution  
happens synchronously on one thread (or as if on one thread). This not  
only applies to HTML with document.write(), but it applies to HTML  
without document.write() *and* it applies to XML (XHTML and SVG).

Even with XHTML and SVG scripts, the script is executed when the  
script end tag event is reported by the parser to the tree constructor.

In both the HTML and XML cases the following are true (once the  
character encoding has been established confidently):
  1) The only thing that appends to the *byte* stream is the network  
library. (document.write does not write into the byte stream.)
  2) The tokenizer sees exactly one sequence of characters. (That is,  
the tokenizer doesn't go back and reparse another sequence of  
characters at any point.)
  3) An object reference/pointer to the document object obtained at  
the start of the parse is equal to an object reference/pointer to the  
document object obtained at the end or in the middle of the parse.
  4) A script can observe an incomplete document tree before the parse  
has finished.
  5) A script can mutate the document tree before the parse has  
finished.

I posit that the mental model of this put forward by the spec should  
match the implementation model and it would be entirely counter- 
productive to try to explain away the fundamentally synchronous  
relationship of parsing and script execution my coming up with an  
alternative declarative or functional model.

(The case where the parser actually does end up examining different  
character sequences relates to declaring the character encoding late  
in the byte stream using a <meta> element.)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Saturday, 22 November 2008 13:20:48 UTC