Re: Speculative tokenization and foreign content

Henri Sivonen <hsivonen@iki.fi> wrote on 12/10/2008 06:45:13 PM:
>
> document.write is not the problem here. There's a problem with
> speculating past <svg> ... <style> even when no document.writes occur.
>
> If we arrive at <style> without seeing <svg> or <math> before it, we
> know for sure that the tokenizer goes into CDATA variant of the data
> state next. However, if we see a <style> start tag after having seen
> <svg> or <math>, we don't (trivially) know if actually performing the
> tree building would have bailed out of foreign content before reaching
> <style>. Therefore, we don't know if the tokenizer should go into the
> CDATA or PCDATA variant of the data state for continued speculation.
>
> The obvious course of action is to stop saving the tokens from that
> point onwards even if still looking for more src values to GET with
> less accuracy, but it would be nice to be able to do better.

Speculative evaluation of instruction streams on a modern CPU given the
presence of conditional branch instructions doesn't mean determining with
certainty the correct path every time, it simply means getting it right
enough of the time to make a difference.

Even if you can't reliably determine if you "would have bailed out", you
might be able to do better than the rather pessimistic approach mentioned
above.  Considerably better.  The design of HTML 5 is focused on
robustness, even in the face of errors, and even if those errors are
relatively infrequent.  A simple approximation: <svg> or <math> starts
foreign content, </svg> and </math> stops foreign content may be right
enough of the time to make a difference.  You still would have to decide
what to do with nesting, and how to detect whether the prediction was
incorrect (i.e., any time after the tree builder bails even once, it must
stop trusting the token stream at the point it encounters a <style> tag).

> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

- Sam Ruby

Received on Thursday, 11 December 2008 11:08:24 UTC