RE: <iframe doc=""> from Ian Hickson on 2010-01-24 (public-html@w3.org from January 2010)

From: Ian Hickson <ian@hixie.ch>
Date: Sun, 24 Jan 2010 11:43:41 +0000 (UTC)
To: "public-html@w3.org WG" <public-html@w3.org>
Message-ID: <Pine.LNX.4.64.1001241054200.6554@ps20323.dreamhostps.com>
I've now added <iframe srcdoc=""> to the spec.

On Wed, 13 Jan 2010, Leonard Rosenthol wrote:
>
> I don't understand how you can assume that the destination of the doc 
> URL is going to be text/HTML?

It's not a URL, it is just embedded content.


> Why couldn't the iFrame be pointing to an SVG image, for example, or a 
> PDF?  Those are also valid (and in the latter case of PDF, quite common) 
> things one would put in an iFrame and wish to refer to...

Sure. That is possible, and supported by the src="" attribute.


On Wed, 13 Jan 2010, Doug Schepers wrote:
> 
> The question still remains... would @doc allow SVG code, for example?

Insofar as text/html allows SVG, yes.

The main use case here is blog comments; people don't generally support 
SVG in blog comments anyway, and I'm not aware of any requests to support 
this, so I don't think we need to support XML-based SVG (or MathML, or 
XForms, or XBL2, or anything else really) explicitly. However, it's 
obviously something we could extend in a future verison, e.g. by adding a 
"srcdoctype" attribute, if people really wanted this.


On Fri, 15 Jan 2010, Leif Halvard Silli wrote:
> >
> > I'm not sure what to do for the XML variant;
> 
> May be Kornel had a solution to several problems at once: replace @doc 
> with @body?

I wrestled with this for several days, but in the end I decided that the 
proposal I put into the spec should start with no XML magic, so in XHTML, 
the attribute takes an XML string, with no implied elements or namespace 
prefixes. This was mostly predicated on the following reasoning:

 * XML people tend to not like magic, so if you're using XML, you're 
   likely to want to see everything explicitly anyway, so saving a few 
   bytes by removing the boilerplate doesn't win over many authors, 
   unlike in the text/html case.

 * You don't really save that many bytes anyway after compression, since 
   the repeated boilerplate would compress really well.

 * Not implying anything means that CMSes don't have to cut things out, 
   with the various dangers that that implies -- they can just embed the 
   content directly in. It is presumed that anyone using XML as an output 
   format that contains user-generated content is almost certainly using 
   lots an lots of support tools, such that they wouldn't see the 
   verbosity anyway.

 * Not having magic going on means that there's no artificial barrier to 
   using non-XHTML vocabularies like SVG.

Given this, I decided to stick with "doc" in the proposal, rather than use 
"body", since in XML it wouldn't be body-specific. (I then added the "src" 
prefix to make it clearer that it was related to "src".)


> >> Would the code iniside @doc be validated?
> 
> I meant by validators.

Yes, at least in theory.


> And in that regard: You and Boris discussed the similarity with data 
> URIs. And it was said that quotes inside @doc would have to be escaped. 
> But what kind of escaping?

&quot;


On Mon, 18 Jan 2010, Henri Sivonen wrote:
> On Jan 15, 2010, at 04:26, Ian Hickson wrote:
> 
> > * Have doc="" in XML documents (and DOM-created documents that aren't 
> > flagged as an "HTML document") be parsed as XML. This has the 
> > advantage of being unsurprising.
> 
> Is it really unsurprising? I'd extrapolate from innerHTML that it's bad 
> for stuff you can script via the DOM to change behavior depending on the 
> HTMLness bit of the Document object.

I agree that it makes the transition worse, but one supposed that if 
someone is trying to transition from HTML to XHTML, they would want to do 
so wherever they use HTML. Otherwise, what's the point?


> It's bad that you have to revise innerHTML access in the bowels of a JS 
> library if you want to use the library with XHTML.

It's bad that you have to revise your markup throughout the document, 
everywhere in your CMS, in your template files, and in any content encoded 
in your database, too, but that's what transitioning to a different markup 
language _means_. I don't really understand why you would use XHTML if you 
didn't want to use it everywhere.


> As a general principle, I think new stuff that depends on the HTMLness 
> bit shouldn't be introduced.

In general I agree, but when the "new stuff" is whether something is 
parsed as HTML or not, it seems to be a valid exception.


> > * Have doc="" in all documents always be parsed as text/html. This 
> > would mean that you couldn't implement an XML-only HTML UA, which I 
> > think would be unfortunate.
> 
> Shouldn't XML-only UAs fall into the theoretical purity bucket for the 
> purposes of the Design Principles?

Pretty much everything to do with XML in HTML5 falls into the "theoretical 
purity bucket". However, since people are going to walk down that path 
with or without us, it seems best for us to at least contain the possible 
damage by making it all self-contained.


> > * Have some sort of selector, so you could embed HTML in XML and XML 
> > in HTML. It's not clear what the use case for this is, and it has the 
> > same disadvantage as the previous one -- it would mean that 
> > implementations would always be required to implement both text/html 
> > and XML, which we've so far avoided.
> 
> If you have html='...' and xml='...' attributes, you could say that an 
> HTML-only UA isn't required to implement xml='...' and an XHTML-only UA 
> isn't required to implement html='...'.

As theoretical as the problem is, we should still do things the right way. 
It would be bad to allow a situation in which you can have a conforming 
XHTML document that cannot be rendered by a conforming XHTML processor.


> Such non-requirements would be equally impractical as loading HTML in an 
> iframe in an XHTML-only UA or vice versa with the spec as it stands 
> today.

The difference is that loading HTML into an XHTML page in an XHTML-only UA 
is the same as loading DocBook into an HTML page in an HTML-only UA. (Or 
indeed, as loading XHTML into an HTML-only UA.) It's a situation the user 
understands: there's content that the UA doesn't support. But loading an 
XHTML page that relies on an optional-to-implement feature leads to 
content that -- to the user -- looks like it _should_ work not working.

Anyway, as you say, this is all academic, so let's not spend more time on it.


On Sun, 17 Jan 2010, Lachlan Hunt wrote:
> 
> One big disadvantage with putting markup in attributes, especially for 
> the doc proposal, is that ampersands will often have to be double 
> escaped as &amp;amp;, due to the content of doc effectively being parsed 
> twice - once as the content of the attribute, and then again to parse 
> the string as a document.
> 
> e.g.
> 
> Consider marking up a link containing this URL:
> 
>   ?name=foo&title=bar&sect=1
> 
> By only escaping the ampersands once like this, the following happens:
> 
>   <iframe doc="<a href="?name=foo&amp;title=bar&amp;sect=1">link</a>">
> 
> The &amp; entites are decoded as they parsed the first time to obtain the
> attribute value.  This results in the following string:
> 
>   "<a href="?name=foo&title=bar&sect=1">link</a>"
> 
> This is then parsed again by a new instance of the HTML parser, which results
> in the first ampersand being flagged as a parse error, and the second being
> interperted as §.  This is then equivalent to the following:
> 
>   <a href="?name=foo&title=bar§=1">link</a>

Indeed, if you forget to escape the &s in the user input you get 
weirdness. I actually forgot to escape one in the spec, since of course I 
have an example like the above in the markup of the spec, so it has to be 
escaped a third time, leading to an amusing "&amp;amp;amp;" now that I've 
done it right (I think).


> The parse error might be deemed acceptable in text/html because it's 
> non-fatal and ends up with the correct result, even though it would be 
> non-conforming, but the latter misinterpretation would break the link.

Right.


> But for XHTML, it gets worse, because the first ampersand would be 
> fatal. There are also other similar problems that would be caused by 
> using &lt; isntead of double escaping it as &amp;lt;.

Yup.

If the content is being generated by a CMS, then the first level of 
escaping will have to be done earlier, it's not like you escape them twice 
in a row. In fact, typically the first escapes will be done by hand (if at 
all) by the site user. The second &-escaping would be done by the same 
code as the "-escaping.


On Mon, 18 Jan 2010, Boris Zbarsky wrote:
> 
> For what it's worth, my personal preference is that the @doc be 
> parsed/loaded asynchronously, just like @src would be.  At least in 
> Gecko's case we could actually pretty much reuse our normal pageload 
> code, with just a small shim to create a "network load" that feeds in 
> the @doc data, to implement that.  I can't speak to other 
> implementations.

I've specced it as a navigation, which is always async. (Except for 
about:blank, but I will probably just that in the coming week anyway.)


On Mon, 18 Jan 2010, Boris Zbarsky wrote:
> > 
> > This is more or less what I had in mind, except that I would blow away 
> > the old browsing context and create a new one, rather than navigating 
> > the previous one. I believe this is what happens with setting src="", 
> > too.
> 
> I'm not sure what the distinction is here...  Can you point me to the 
> relevant spec sections?

Nevermind what I wrote there. I didn't do it. There's one browsing context 
and it gets navigated whenever relevant.


> > Unless someone can indicate a reason why not to do this, I expect that 
> > I'll make about:blank asynchronous, and only make initial browsing 
> > context creation synchronous (like it already is -- this doesn't 
> > involve actually loading about:blank using the regular "navigation" 
> > steps).
> 
> That sounds probably fine to me, depending on when loads of about:blank 
> are triggered by the UA and when they aren't.

Not sure what you mean by "triggered by the UA". I'm proposing all 
about:blank loads would be async except for the very first load in a 
browsing context, which is _always_ about:blank (though that is often 
immediately replaced by another resource), and is always synchronous. But 
let's discuss this later this week in the other thread. :-)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Sunday, 24 January 2010 11:44:14 UTC