Re: Working Group Decision on ISSUE-100 srcdoc from Andrew Fedoniouk on 2010-10-15 (public-html@w3.org from October 2010)

From: Andrew Fedoniouk <andrew.fedoniouk@live.com>
Date: Thu, 14 Oct 2010 21:18:07 -0700
To: "Julian Reschke" <julian.reschke@gmx.de>, "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: "HTML WG" <public-html@w3.org>
Message-ID: <BAY141-DS34415F931186B33290275F8570@phx.gbl>

--------------------------------------------------
From: "Julian Reschke" <julian.reschke@gmx.de>
Sent: Thursday, October 14, 2010 9:30 AM
To: "Tab Atkins Jr." <jackalmage@gmail.com>
Cc: "Andrew Fedoniouk" <news@terrainformatica.com>; "HTML WG" 
<public-html@w3.org>
Subject: Re: Working Group Decision on ISSUE-100 srcdoc

> On 14.10.2010 18:23, Tab Atkins Jr. wrote:
>> ...
>> No, it does not.  This has exactly the same issues as the most naive
>> solution to the problem, a<sandbox>  element.  Namely, the content you
>> include inside the script can have an unmatched</script>, breaking it
>> out of the sandbox and letting anything that follows be treated as
>> part of the normal page.  Arbitrary XSS follows in the obvious way.
>>
>> The discussion surrounding this issue went over this in depth, and my
>> Change Proposal quickly summarized the issues around it and several
>> similar solutions.  I suggest reading my Change Proposal first before
>> making further suggestions, as it is very likely that your idea has
>> already been discussed and found wanting.  This is a hard area where
>> the solutions are pulled in several different directions.
>> ...
>
> Well.
>
> Putting the user-supplied text into @srcdoc requires escaping. Putting it 
> into an element requires escaping as well, but once you understand you 
> need to escape, where's the big difference?
>

It is technically feasible to parse content of <script type="text/html"> 
without
need of any escapement at all. The only principal exception is the 
<plaintext>
thing.

But in srcdoc we will need to do escapement in any case.

For example following requires crazy amount of escapements
(even recursive escapements, sic):

<iframe srcdoc="<html><style>body[mode="edit"]{}</style>
   <script>var a="<a href='javascript:click(\"ok\")' ";</script>
   <body><iframe srcdoc='....'></body></html>">

In any case this will make markup not anymore human
readable/comprehensible .

And another consideration: let's imagine that you will need two <iframe>s on 
the
page that require the same source document to be loaded. You will need to 
repeat
the same  markup twice. That is actually the problem of data URLs too.

Actually we already have inclusion mechanism in HTML that allow to include
fragments of data into "host" markup. I mean <style> and <script> elements.
<script> is widely used already for inclusion of XML data islands.
<script type="text/html"> is also used for inclusion of e.g. client side
templates, etc.

So it is not clear why do we need to invent anything else?

If to think about self-contained HTML documents that are not using
any external resources then it is possible even now to use something
like this:

<html>
    <head>
         <style> .... </style>
         <script type="text/javascript"> .... </script>
         <script type="image/png" encoding="base64"> ... base64 stuff... 
</script>
    </head>
    <body>
         Markup.
    </body>
</html>

It means that in principle HTML alone can be used instead of mime
encoded email messages, for example:
<html  from='...' to='...'>....</html>
Or as a format for persisting content of browser window into single file.
HTML can be parsed/processed relatively easy .

I really do not see why we need that markup-inside-attributes to be honest.

-- 
Andrew Fedoniouk

http://terrainformatica.com

Received on Friday, 15 October 2010 04:18:43 UTC