Re: [webcomponents] More backward-compatible templates from Maciej Stachowiak on 2012-11-01 (public-webapps@w3.org from October to December 2012)

From: Maciej Stachowiak <mjs@apple.com>
Date: Thu, 01 Nov 2012 09:29:59 +0100
To: Adam Barth <w3c@adambarth.com>
Cc: Anne van Kesteren <annevk@annevk.nl>, "public-webapps@w3.org WG" <public-webapps@w3.org>
Message-id: <29092FE0-E81B-4565-A7BE-E46D315846F9@apple.com>
On Oct 31, 2012, at 7:45 PM, Adam Barth <w3c@adambarth.com> wrote:

> On Wed, Oct 31, 2012 at 11:27 AM, Anne van Kesteren <annevk@annevk.nl> wrote:
>> On Wed, Oct 31, 2012 at 7:23 PM, Adam Barth <w3c@adambarth.com> wrote:
>>> Then maybe I don't understand how parsing will work.  How does the
>>> parser know when the "Folder i" template stops?  It can't just scan
>>> ahead for <\/scirpt> if we're using <\/script> to terminate the "Email
>>> i" template.  Similarly, it can't just match <script> and <\/script>
>>> tags because then nested templates will parse differently than
>>> top-level templates...
>> 
>> You'd need to special case nested templates. But you need to do that
>> anyway as you're inside a <script> element that normally only emits
>> character data.
> 
> I don't really understand what sort of parsing rules you're imagining.
> If you explain them concretely, I can try to provide problematic
> examples.
> 
> Trying to hijack <script> for this purpose works well if you don't
> need to nest templates.  Once you have nested templates, you either
> end up in an escaping nightmare or you need to start hacking up how
> HTML parsing works inside of <script type=template>.  If you go that
> route, you're in bad shape because these hacks need to work
> consistently with how <script> parses in a normal HTML parser.  As you
> can tell by looking at the number of states that <script> requires in
> the HTML tokenizer, that's not a simple thing.

Hi Adam,

This was a proposal based on a few minutes discussion, so it does not yet exist in fleshed out form. But here's an attempt to flesh it out just a little to attempt to demonstrate some key properties.

First, I think I should clarify the "escaping" of script close tags. This should be thought of not as as a special escape character, but just an alternate way to spell "</script>" that non-template-aware browsers will not interpret as closing the script tag. To avoid confusion due to the escapey feeling of the backslash, let's call it "</+script>" for purposes of this post, but really it could be anything as long as it doesn't contain "</script>" as a literal substring, and does not match any real close or open tag.

Given this assumption, here is a whack at some rules:

(1) The normal HTML parser is completely unchanged.

(2) The script element gains a new IDL attribute:

HTMLScriptElement {
    DocumentFragment template;
}

(3) When the template attribute is accessed, perform the following steps:
    (a) If the template attribute of this script has been accessed previously, and neither the textContent of the <script> nor the presence of the template attribute have changed:
        (a.i) Return the script element's [cached template return value]
        (a.ii) terminate these steps.
    (b) Else, if the script does not have a DOM attribute named "template":
        (b.i) Set the script element's [cached template return value] to null.
        (b.ii) Return the script element's [cached template return value]
        (b.iii) terminate these steps.
     (c) Else (in this case due to (b) the script must have a "template") attribute:
        (c.i) Parse the textContents of the script using the template fragment parser, and set the resulting DOM fragment to be the script element's [cached template return value].
        (c.ii) Return the script element's [cached template return value]
        (c.iii) Terminate these steps.

(4) The template fragment parser operates like the HTML fragment parser, but with the following differences:
    (a) It parses as if for an XMLHttpRequest document response - scripting is disabled and referenced resources are not loaded.
    (b) A close tag named "+script" is treated as if it were a close tag named "script".
    (c) When an open tag named "script" with a markup attribute named "template" is encountered, then instead of parsing the contents as a script, perform the following steps:
        (i) Start parsing with the nested template fragment parser until it [returns to the parent template fragment parser].
        (ii) Set the newly created script element's [cached template return value] to the results of step (i).
        (iii) Continue parsing the fragment where the nested template fragment parser left off.

(5) The nested template fragment parser operates like the template fragment parser, but with the following additional difference:
     (a) When a close tag named "+script" is encountered which does not match any currently open script tag:
         (a.i) Consume the token for the close tag named "+script".
         (a.ii) Crate a DocumentFragment containing that parsed contents of the fragment.
         (a.iii) [return to the parent template fragment parser] with the result of step (a.ii) with the parent parser to resume after the "+script" close tag.


This is pretty rough and I'm sure I got some details wrong. But I believe it demonstrates the following properties:
(B) Allows for perfect fidelity polyfills, because it will manifestly end the template in the same place that an unaware browser would close the <script> element.
(C) Does not require multiple levels of escaping.
(A) Can be implemented without changes to the core HTML parser (though you'd need to introduce a new fragment parsing mode).
(D) Can be implemented with near-identical behavior for XHTML, except that you'd need an XML fragment parser.

I hope this clarifies the proposal.

Notes: 
- Just because it's described this way doesn't mean it has to be implemented this way - implementations could do template parsing in a single pass with HTML parsing if desired. I wrote it this way mainly to demonstrate the desired properties/
- It would also be possible to require nested scripts to be named <+script> or whatever for consistency with the close tag.

Regards,
Maciej
Received on Thursday, 1 November 2012 08:30:43 UTC