[Bug 22436] New: Give rules for content that is treated as text under a common heading

https://www.w3.org/Bugs/Public/show_bug.cgi?id=22436

            Bug ID: 22436
           Summary: Give rules for content that is treated as text under a
                    common heading
    Classification: Unclassified
           Product: HTML WG
           Version: unspecified
          Hardware: PC
               URL: http://dev.w3.org/html5/html-xhtml-author-guide/
                OS: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML/XHTML Compatibility Authoring Guide (ed: Eliot
                    Graff)
          Assignee: eliotgra@microsoft.com
          Reporter: xn--mlform-iua@xn--mlform-iua.no
        QA Contact: public-html-bugzilla@w3.org
                CC: eliotgra@microsoft.com, mike@w3.org,
                    public-html-admin@w3.org,
                    public-html-wg-issue-tracking@w3.org

See the thread ”During HTML parsing, are *all* named character references
replaced by their corresponding glyph?”, and in particular this answer from
Michael: 

http://www.w3.org/mid/20130624113437.GB37583@sideshowbarker

What Michael said, is easy to forget. Thus, I think this subject needs a little
more description in Polyglot Markup. Right now, only <script> and <style> are
covered - and also <noscript>.

I would propose to

  a) ad a section that describes the general issue of content
     that, unlike in XML, is treated as text by the HTML parser
     Motivation: This a an important and general gotcha and 
     difference, both within pure HTML, but especialy when
     creating polyglots.

  b) In practise, this means listing all the elements
     that themselves - or their children, are treated
     as text by the HTML parsers. (This includes
     all elements that begins with the string “<no”, such
     as <noscript> and <noframe>, as well as <script>,
     <style>, <xmp>, <iframe> and perhaps some more (?)

     NB: It may also make sense to mention, in a note
         that the “sane” elements, such as <object>,
         <video> etc, are not treated that way.

  c) The section should give the various usage rules 
     - some elements are forbidden etc, while others
     have special rules for polyglots under this
     heading. (Thus, the script/style should go there
     - or at least be represented with a link to the
     section where their rules are described.)

Btw, note that HTML5 already says that the content of iframe must be empty in
XML, so describing iframe should be a nobrainer. See
http://www.w3.org/TR/html5/embedded-content-0.html#iframe-content-model
And HTML5 has similar things to say about most - if not of these elements, so
it is mostly a collection job.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Received on Monday, 24 June 2013 17:02:22 UTC