- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 17 Nov 2008 17:56:07 +0200
- To: Ian Hickson <ian@hixie.ch>
- Cc: "public-html@w3.org WG" <public-html@w3.org>
On Jun 17, 2008, at 23:58, Ian Hickson wrote:
>> Aside: I find the concept of "insertion point" in a stream to be
>> harder
>> to track than a concept of a stack of pending streams where each
>> document.write() pushes a new stream onto the pending stack.
>
> I don't know if a stack can be equivalent to the insertion point
> concept.
Right. A stack is insufficient.
> It depends whether you keep track of how much you have tokenised for
> each
> item in your stack, and whether you can append to an item on the
> stack.
>
> Consider:
>
> <script>
> document.write("a<script src=b><\/script>c");
> document.write("d");
> </script>...
>
> When the inline script is about to be done executing, the input stream
> looks like:
>
> v v
> ...ript>a<script src=b></script> cd ...
> ^ ^
> T I
>
> ...where T is the tokeniser's position ("c" is the "next input
> character")
> and I is the insertion point. However as soon as it is done
> executing the
> UA will pause for 'b', and if b does a document.write() it'll go
> where "T"
> is, not where "I" is.
OK. Would the following work?
There's a queue of UTF-16 buffers and keyed placeholders. That is,
there's one queue that contains an interleaving of objects that are
UTF-16 buffers or objects holding a magic key value.
The buffers have a start position that the tokenizer advances. A
buffer can be partially consumed, have its start position advanced
accordingly and be left in the queue for further consumption later.
The normal tokenization process consumes data from the front of the
queue. When a buffer is empty, it is dequeued and the next buffer is
consumed. Objects holding magic key values count as empty buffers for
the purpose of dequeuing.
Exception: There's always at least one buffer object in the queue and
the last buffer is never dequeued. Instead, it is left in the queue
when it is empty.
The network stream always adds data to the last buffer or appends a
new buffer to the queue.
Each document.write call to the parser comes with a magic key value.
The magic key is guaranteed to be the same for all document.write
calls from a given script and different from different scripts within
a document.
On document.write, if there is a pending external script, the queue is
searched for a magic key holder with the same key value as the
document.write call. If there is such an object in the queue, the text
of the document.write call is inserted as an UTF-16 buffer into the
queue immediately before the key holder object. If there's no such
object in the queue, a key holder with the key for this document.write
call is inserted in the front of the queue and then the text is
inserted as an UTF-16 buffer in front of that of the key holder.
If there's no pending external script, the tokenization of the text
argument is attempted immediately with parser suspension for event
loops spins disabled. If tree builder causes the parser to block and
there are untokenized characters in the text argument, the untokenized
tail of the argument is treated as in the previous paragraph.
Invariant: The last buffer of the queue is always a buffer that was
put in the queue by the parser initializer or by the method that
appends data from the network. The last object in the queue never
holds a magic key value.
(The motivation for not using the same concepts as the spec is that
the magic keys is the mechanism Gecko already provides for managing
the context of document.writes, and this queuing mechanism never
requires a moving UTF-16 data once it has been written into a buffer.)
--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 17 November 2008 15:57:00 UTC