W3C home > Mailing lists > Public > public-html-xml@w3.org > January 2011

Understanding HTML5 parsing

From: Norman Walsh <ndw@nwalsh.com>
Date: Thu, 06 Jan 2011 10:10:31 -0500
To: public-html-xml@w3.org
Message-ID: <m2vd22gja0.fsf_-_@nwalsh.com>
The HTML5 spec is a document of non-trivial complexity. I'm going to
go out on a limb and hypothesize that (1) I'm not the only one who
fails to appreciate all of its details and (2) that thare are people
on this list who do.

Please indulge me.

Assuming we're inside the an HTML <body> element and that no error
correction has yet been required, the following content

  <div>
    <span>Text</span>
  </div>

produces a DOM that is isomorphic to what an XML parser would
produce for this content

  <div xmlns="http://www.w3.org/1999/xhtml">
    <span>Text</span>
  </div>

Is that right?

Does this content:

  <div>
    <para xmlns="http://docbook.org/ns/docbook">
       This is some text.
    </para>
  </div>

produce something isomorphic to what an XML parser would produce for
this:

  <div xmlns="http://www.w3.org/1999/xhtml">
    <para>
       This is some text.
    </para>
  </div>

And, moving into the way elements with specific local names are
recognized, is this:

  <div>
    <para xmlns="http://docbook.org/ns/docbook">
       This is some <link>text</link>.
    </para>
  </div>

Like this:

  <div xmlns="http://www.w3.org/1999/xhtml">
    <para>
       This is some
       <link></link>
       text.
    </para>
  </div>

Or does more fixup occur, like ending the para too? (I'm experimenting
with "inspect element" in Google Chrome 8.0.552.231 on the Mac to
inform my guesses, but I don't assert anything about how Chrome deals
with HTML5, so...)

What about this:

  <div>
    <script type="application/xml">
      <para xmlns="http://docbook.org/ns/docbook">
        This is a <link>link</link>.
      </para>
    </script>
  </div>

Is it like this?

  <div xmlns="http://www.w3.org/1999/xhtml">
    <script type="application/xml">
      &lt;para&gt;
        This is a &lt;link&gt;link&lt;/link&gt;.
      &lt;/para&gt;
    </script>
  </div>

Or can I get the content of the script parsed into the DOM object I
might naively expect such that I can access it with JavaScript?

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
Lead Engineer
MarkLogic Corporation
www.marklogic.com

Received on Thursday, 6 January 2011 15:11:05 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 6 January 2011 15:11:05 GMT