- From: Rafael Weinstein <rafaelw@chromium.org>
- Date: Sat, 5 May 2012 04:12:57 -0700
[This time from the right email] On Sat, May 5, 2012 at 3:39 AM, Rafael Weinstein <rafaelw at google.com> wrote: > Let me back up here and say that I'm fine accomplishing the goal in a > variety of ways. If this way isn't the best, I'm happy to go another > way -- I'd just like help understanding the reasons why. > > On Fri, May 4, 2012 at 3:26 PM, Ian Hickson <ian at hixie.ch> wrote: >> On Fri, 4 May 2012, Rafael Weinstein wrote: >>> On Fri, May 4, 2012 at 2:46 PM, Ian Hickson <ian at hixie.ch> wrote: >>> > On Fri, 4 May 2012, Rafael Weinstein wrote: >>> >> >>> >> This is the current proposal: >>> >> >>> >> http://lists.w3.org/Archives/Public/public-webapps/2012AprJun/0334.html >>> > >>> > I don't really understand the proposal. >>> > >>> > How does it relate to the template feature? >>> >>> The contents of <template> need to parse context-free (or implied >>> context, or whatever). This adds the notion to HTML parsing so that >>> <template> can use it. >>> >>> e.g. <template><tr><td>Foo</td></tr></template> >> >> I don't understand how this would work in the parser. The parser doesn't >> have a "context element" concept, that's only for fragment parsing. If you >> reset the insertion mode in the parser, it uses the stack of open >> elements, which would always be a <template> element in this case when >> you parse the <tr>. > > It would essentially be nested fragment parsing. As soon as the tree > construction encounters a <template>, it goes into a nested fragment > case. Conceptually, it pushes a DocumentFragment onto the stack of > open elements, (leaves the tokenizer in the DATA state), then queues > tokens which cannot change the state (DOCTYPE, endTag, comment, > character) until it finds the first start tag. Set's the context > element, resets the insertion mode appropriately, then processes the > queued tokens and continues processing from the input stream. > > That said, I was intending to focus on DocumentFragment.innerHTML as a > first step because I think the <template> element is more complicated > and less certain, so it kind confuses the issue. I feel confident that > whatever solution we come up with for this will work for the > <template> element and the other issues with <template> element are > orthogonal. > >> >> >>> > What does it do in the case of: >>> > >>> > ? var frag = document.createDocumentFragment(); >>> > ? frag.innerHTML = 'bla bla .. 1GB of text .. bla <caption> bla' ? >>> >>> Queue up pending tokens until you see the first start tag token or the >>> end of file. The webkit implementation is here: >>> >>> https://bugs.webkit.org/attachment.cgi?id=140125&action=review >> >> So: >> >> ? frag.innerHTML = 'bla bla .. 1GB of text .. bla <caption> bla'; >> >> ...results in a document fragment with one node containing " bla", while: >> >> ? frag.innerHTML = 'bla bla .. 1GB of text .. bla <caqtion> bla'; >> >> ...results in a document fragment with a 1GB text node, an unknown element >> <caqtion>, and another text node? >> >> That seems pretty weird. > > This isn't introducing the weirdness. It's already in the HTML parser. > > Show me any solution that uses the HTML parser and I'll show you input > that produces weird output. > >> >> >>> > Why do we imply a tbody if the input is "<tr></tr><div></div>"? >>> >>> Because there's nothing better to do. >> >> I think almost anything else would be better. :-) >> >> In particular, I think having the output be a <tr> element and <div> >> element as siblings would be better, as would having the output be just a > > That is what you get. The output is equivalent to: > > document.createElement('tbody').innerHTML = "<tr></tr><div></div>"; > > which is a <tr> with a <div> nextSibling > >> <tr> element or just a <div> element. >> >> >>> > Since you need the context element to know how to initialise the >>> > tokeniser, how do you find the first tag? >>> >>> You always start in the DATA state. Can you think of a case where this >>> won't work? >> >> You describe the change as a "mere addition", but it sounds much more >> invasive than that if you're going to assume a context element and then >> change it later. >> >> It sounds like what you're really proposing is not to change the context >> element but to have the parser start off in some new mode where we just >> wait for the first open tag, and then we do some substitution to get a >> surrogate node, and try to reset based on that surrogate node's name >> instead of the stack of open elements. > > These seems like subjective evaluations. I defer to your judgement, > but it'd be helpful to understand the objection at a more concrete > level. > > For example, if there is a technical problem with the approach, what > is it? Does it introduce inflexibility in extending the parser later? > Honestly, the parser is pretty dense -- I'm just trying to be the > midwife of a sensible solution here, and I'm not married at all to > this approach. > > As far as how invasive the change is, I'd like to think that the > Webkit patch above points to it's non-invasiveness, in that it changes > no logic in the tokenizer, and no construction logic -- only adding > the notion that the tree construction is currently waiting to know > what the next start tag is before it can continue construction. > >> >> That seems pretty weird to me, but certainly isn't the weirdest thing >> that's been proposed. >> >> Do we have a page or e-mail somewhere that documents all the cases we're >> trying to support? > > The best write up is Yehuda's initial request: > > http://lists.w3.org/Archives/Public/public-webapps/2011OctDec/0663.html > > I'm happy to build a complete list of input examples that need to > produce specific input, if that will be helpful. > > Another thing to note that may not be apparent from Yehuda's request > or the <template> element spec that Dimitri wrote is that, while the > context element isn't known programmatically at innerHTML or parse > time, it *is* known by the author of the content. In other words, the > input markup is always intended to be children of a specific element. > That's why jQuery implements this exactly the way I'm proposing: > Generally, the context element will be implied by *all* of the > top-level start tags -- picking the first one is just a sensible way > to have deterministic output. I know of no use cases for attempting to > do something "useful" with input that has a "mixed implied context > element". > >> >> -- >> Ian Hickson ? ? ? ? ? ? ? U+1047E ? ? ? ? ? ? ? ?)\._.,--....,'``. ? ?fL >> http://ln.hixie.ch/ ? ? ? U+263A ? ? ? ? ? ? ? ?/, ? _.. \ ? _\ ?;`._ ,. >> Things that are impossible just take longer. ? `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 5 May 2012 04:12:57 UTC