W3C home > Mailing lists > Public > public-html@w3.org > December 2008

Re: Speculative tokenization and foreign content

From: Sam Ruby <rubys@us.ibm.com>
Date: Thu, 11 Dec 2008 05:39:05 -0500
To: Henri Sivonen <hsivonen@iki.fi>
Cc: HTML WG <public-html@w3.org>
Message-ID: <OF03AFCFB8.F1E1B6F3-ON8525751C.003937F4-8525751C.003A82B4@us.ibm.com>

Henri Sivonen <hsivonen@iki.fi> wrote on 12/10/2008 06:45:13 PM:
>
> document.write is not the problem here. There's a problem with
> speculating past <svg> ... <style> even when no document.writes occur.
>
> If we arrive at <style> without seeing <svg> or <math> before it, we
> know for sure that the tokenizer goes into CDATA variant of the data
> state next. However, if we see a <style> start tag after having seen
> <svg> or <math>, we don't (trivially) know if actually performing the
> tree building would have bailed out of foreign content before reaching
> <style>. Therefore, we don't know if the tokenizer should go into the
> CDATA or PCDATA variant of the data state for continued speculation.
>
> The obvious course of action is to stop saving the tokens from that
> point onwards even if still looking for more src values to GET with
> less accuracy, but it would be nice to be able to do better.

Speculative evaluation of instruction streams on a modern CPU given the
presence of conditional branch instructions doesn't mean determining with
certainty the correct path every time, it simply means getting it right
enough of the time to make a difference.

Even if you can't reliably determine if you "would have bailed out", you
might be able to do better than the rather pessimistic approach mentioned
above.  Considerably better.  The design of HTML 5 is focused on
robustness, even in the face of errors, and even if those errors are
relatively infrequent.  A simple approximation: <svg> or <math> starts
foreign content, </svg> and </math> stops foreign content may be right
enough of the time to make a difference.  You still would have to decide
what to do with nesting, and how to detect whether the prediction was
incorrect (i.e., any time after the tree builder bails even once, it must
stop trusting the token stream at the point it encounters a <style> tag).

> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

- Sam Ruby
Received on Thursday, 11 December 2008 11:08:24 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:00 UTC