Re: Understanding HTML5 parsing

On 01/06/2011 10:10 AM, Norman Walsh wrote:
> The HTML5 spec is a document of non-trivial complexity. I'm going to
> go out on a limb and hypothesize that (1) I'm not the only one who
> fails to appreciate all of its details and (2) that thare are people
> on this list who do.
>
> Please indulge me.

Here are two good utilities, the first will use the browser you are 
currently running:

http://software.hixie.ch/utilities/js/live-dom-viewer/

I would also normally suggest:

http://livedom.validator.nu/

... but at the moment, it doesn't seem to be working for me.

If you would prefer offline tolls, you can download html5lib from google 
code or validator.nu from validator.nu.

> Assuming we're inside the an HTML<body>  element and that no error
> correction has yet been required, the following content
>
>    <div>
>      <span>Text</span>
>    </div>
>
> produces a DOM that is isomorphic to what an XML parser would
> produce for this content
>
>    <div xmlns="http://www.w3.org/1999/xhtml">
>      <span>Text</span>
>    </div>
>
> Is that right?

Correct.

> Does this content:
>
>    <div>
>      <para xmlns="http://docbook.org/ns/docbook">
>         This is some text.
>      </para>
>    </div>
>
> produce something isomorphic to what an XML parser would produce for
> this:
>
>    <div xmlns="http://www.w3.org/1999/xhtml">
>      <para>
>         This is some text.
>      </para>
>    </div>

David Carlisle is correct that this document will also have an attribute 
named 'xmlns' on the <para> element.  Such an attribute is impossible to 
create by an xmlns aware XML parser.

> And, moving into the way elements with specific local names are
> recognized, is this:
>
>    <div>
>      <para xmlns="http://docbook.org/ns/docbook">
>         This is some<link>text</link>.
>      </para>
>    </div>
>
> Like this:
>
>    <div xmlns="http://www.w3.org/1999/xhtml">
>      <para>
>         This is some
>         <link></link>
>         text.
>      </para>
>    </div>
>
> Or does more fixup occur, like ending the para too? (I'm experimenting
> with "inspect element" in Google Chrome 8.0.552.231 on the Mac to
> inform my guesses, but I don't assert anything about how Chrome deals
> with HTML5, so...)

Closer to the following (where all elements are in the 
http://www.w3.org/1999/xhtml namespace):

<div>
   <para xmlns="http://docbook.org/ns/docbook">
      This is some <link/>text.
   </para>
</div>

The only difference is the whitespace.

> What about this:
>
>    <div>
>      <script type="application/xml">
>        <para xmlns="http://docbook.org/ns/docbook">
>          This is a<link>link</link>.
>        </para>
>      </script>
>    </div>
>
> Is it like this?
>
>    <div xmlns="http://www.w3.org/1999/xhtml">
>      <script type="application/xml">
>        &lt;para&gt;
>          This is a&lt;link&gt;link&lt;/link&gt;.
>        &lt;/para&gt;
>      </script>
>    </div>

Correct

> Or can I get the content of the script parsed into the DOM object I
> might naively expect such that I can access it with JavaScript?

Yes:

http://blogs.msdn.com/b/ie/archive/2010/10/15/domparser-and-xmlserializer-in-ie9-beta.aspx

(Note: this works across browsers that support HTML5)

>                                          Be seeing you,
>                                            norm

- Sam Ruby

Received on Thursday, 6 January 2011 16:27:40 UTC