- From: Rafael Weinstein <rafaelw@google.com>
- Date: Mon, 30 Apr 2012 19:27:12 -0700
- To: "Tab Atkins Jr." <jackalmage@gmail.com>
- Cc: Anne van Kesteren <annevk@opera.com>, public-webapps@w3.org, Ryosuke Niwa <rniwa@webkit.org>, Yehuda Katz <wycats@gmail.com>, Ms2ger <ms2ger@gmail.com>, Henri Sivonen <hsivonen@iki.fi>
On Mon, Apr 30, 2012 at 6:51 PM, Tab Atkins Jr. <jackalmage@gmail.com> wrote: > On Mon, Apr 30, 2012 at 5:43 PM, Anne van Kesteren <annevk@opera.com> wrote: >> I personally think it would be better if HTML kept defining all entry points >> to the HTML parser. And at least conceptually this is a new insertion mode I >> think contrary to what you suggest in >> http://lists.w3.org/Archives/Public/public-webapps/2012AprJun/0334.html as >> only insertion modes handle emitted tokens. And although I guess it does not >> matter here for now, given that the tree builder can change the behavior of >> the tokenizer decoupling them seems rather odd to me. > > This is simply invoking the fragment parsing algorithm that's already > defined in DOMParsing, but intelligently supplying a context element. > There's no need to worry about emitting tokens or anything, except > insofar as DOMParsing already has to worry about that. I think Anne's concern is that in order to find the first start tag, the tokenizer must be used. In this case, the tokenizer would be used absent of a parser. I'm actually ok with that because the tokenizer is not a risk of changing states (it starts in the DATA state and stops searching on the first start tag, so for this use it can't change state), but I understand the conceptual novelty. We can put this in the parser spec, but I'm not yet convinced it deserves a new insertion mode. UA's may implement it that way, so as to avoid duplication of some tokenization work, but it seems cleaner to describe it as running the tokenizer to "look ahead to the first start tag". > >> The "Any other * tagName" design also seems somewhat fragile to me. I think >> those lists need to be explicit and coordinated. We should at least put some >> checks in place to make sure we are not introducing more overlapping element >> names in the future. > > I'm fine with that, as long as implementations are okay with updating > their lists of elements as the underlying languages (SVG and MathML) > change. This *will* potentially cause a behavior difference, as > elements that previously parsed as HTMLUnknownElement instead parse as > some specific SVG or MathML element. > > ~TJ
Received on Tuesday, 1 May 2012 02:27:42 UTC