W3C home > Mailing lists > Public > public-html@w3.org > June 2013

Re: During HTML parsing, are *all* named character references replaced by their corresponding glyph?

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 26 Jun 2013 11:47:39 +0200
To: "Jukka K. Korpela" <jukka.k.korpela@kolumbus.fi>
Cc: public-html@w3.org
Message-ID: <20130626114739638472.862b4f2a@xn--mlform-iua.no>
Jukka K. Korpela, Wed, 26 Jun 2013 10:03:24 +0300:
> 2013-06-26 2:31, Leif Halvard Silli kirjoitti:
>> Jukka K. Korpela, Mon, 24 Jun 2013 22:57:19 +0300:
>> 
>>> it seems
>>> to me that script, style, and xmp elements have special parsing rules
>>> whereas iframe, noembed, noframes, and noscript don’t.
>> It seems to me that Mike was definitely right:
>> http://software.hixie.ch/utilities/js/live-dom-viewer/saved/2371#dom

> 
> Right as regards to actual browser behavior, or as regards to draft 
> specifications?

Browser behavior (with exceptions). And spec.

> The latter seem to describe this only in the parsing rules, 

You mean, in the spec’s parsing section? [1] I don't think that is 100% 
correct. E.g. the specification of iframe, has some paragraphs about 
its content model, quoting: [2]

]]
   When used in HTML documents, the allowed content model of
   iframe elements is text, [ … snip … ]
   
   The iframe element must be empty in XML documents.

   NOTE: The HTML parser treats markup inside iframe elements
         as text.
[[

But what is difficult to understand is why - for HTML, it is 
*permitted* to place seemingly *any* text (but the string </iframe>) 
inside iframe. I.e. what’s the usecase. I wouldn't mind if someone shed 
some light on that …

I also don't understand why content is forbidden in XHTML. I mean, it 
is easy to think that content could function as fallback, for instance. 
But if it can’t, then why would someone place anything there at all? 
But if the content *can* be used as fallback, then it should be allowed 
even in XML!

In HTML4, it was used for fallback, but only for browsers that didn't 
understand iframe, see HTML4’s example.[3] And as it turns out, text 
browsers *do* parse the child content of iframe "normally" - that is, 
not as text but as markup.

> which are rather complicated and confusing.

For Polyglot Markup, I think we will describe content that is treated 
as text under a common heading.[4] This because I agree that the 
subject is confusing and difficult to get overview over.

> On IE 9, iframe, noembed, noframes, and noscript are parsed by normal rules.
> Isn’t this the browser tradition and required by all HTML 
> specifications up to
> HTML 4.01 and XHTML 1.1 (to the extent that they allow these elements
> in the first place)?
> 
> It’s a bit shocking that Firefox and Chrome as well as IE 10 deviate 
> from this.

I have not checked how IE9 (and below) behave. But you are mistaken if 
you think that HTML5 always aligned with what legacy IE do/did - e.g. 
sometimes it was one or more other legacy browsers that "won" the deal.

> The practical impact is very small, since the browser apply normal parsing
> to <noscript> content when scripting is disabled. It is normally irrelevant
> how <noscript> has been parsed when scripting is enabled. For <noembed>,
> and <noframes> as well as for content of <iframe>, the “fallback” content
> is not used in any normal situations in browsers, so it does not matter
> whether &auml; gets parsed literally or as å.

As you and I note above, some browser behave different ...

[1] http://www.w3.org/html/wg/drafts/html/CR/syntax.html#parsing

[2] 
http://www.w3.org/html/wg/drafts/html/CR/embedded-content-0.html#iframe-content-model

[3] http://www.w3.org/TR/REC-html40/present/frames.html#edef-IFRAME

[4] https://www.w3.org/Bugs/Public/show_bug.cgi?id=22436

-- 
leif halvard silli
Received on Wednesday, 26 June 2013 09:48:09 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:33 UTC