Re: During HTML parsing, are *all* named character references replaced by their corresponding glyph?

2013-06-26 2:31, Leif Halvard Silli kirjoitti:
> Jukka K. Korpela, Mon, 24 Jun 2013 22:57:19 +0300:
>
>> it seems
>> to me that script, style, and xmp elements have special parsing rules
>> whereas iframe, noembed, noframes, and noscript don’t.
> It seems to me that Mike was definitely right:
> http://software.hixie.ch/utilities/js/live-dom-viewer/saved/2371#dom


Right as regards to actual browser behavior, or as regards to draft 
specifications?
The latter seem to describe this only in the parsing rules, which are rather
complicated and confusing.

On IE 9, iframe, noembed, noframes, and noscript are parsed by normal rules.
Isn’t this the browser tradition and required by all HTML specifications 
up to
HTML 4.01 and XHTML 1.1 (to the extent that they allow these elements
in the first place)?

It’s a bit shocking that Firefox and Chrome as well as IE 10 deviate 
from this.

The practical impact is very small, since the browser apply normal parsing
to <noscript> content when scripting is disabled. It is normally irrelevant
how <noscript> has been parsed when scripting is enabled. For <noembed>,
and <noframes> as well as for content of <iframe>, the “fallback” content
is not used in any normal situations in browsers, so it does not matter
whether &auml; gets parsed literally or as å.

It could matter to search engines, however. I’m mainly thinking of 
<noframes>
content. What might be the rationale of not recognizing character 
references there?
This, too, is largely theoretical on two grounds: search engines 
probably won’t
start applying such parsing rules; and <noframes> content is in 
practical almost
meaningless or just a statement like “this page uses frames”.

-- 
Yucca, http://www.cs.tut.fi/~jkorpela/

Received on Wednesday, 26 June 2013 07:03:48 UTC