Re: Understanding HTML5 parsing

On Thu, 06 Jan 2011 16:10:31 +0100, Norman Walsh <ndw@nwalsh.com> wrote:
> Assuming we're inside the an HTML <body> element and that no error
> correction has yet been required, the following content
>
>   <div>
>     <span>Text</span>
>   </div>
>
> produces a DOM that is isomorphic to what an XML parser would
> produce for this content
>
>   <div xmlns="http://www.w3.org/1999/xhtml">
>     <span>Text</span>
>   </div>
>
> Is that right?

Right.


> Does this content:
>
>   <div>
>     <para xmlns="http://docbook.org/ns/docbook">
>        This is some text.
>     </para>
>   </div>
>
> produce something isomorphic to what an XML parser would produce for
> this:
>
>   <div xmlns="http://www.w3.org/1999/xhtml">
>     <para>
>        This is some text.
>     </para>
>   </div>

Yes. The <para> element will also have an "xmlns" attribute in no  
namespace (rather than the XMLNS namespace as would be the case in XML  
processors with namespace support) specified with as value  
"http://docbook.org/ns/docbook".


> And, moving into the way elements with specific local names are
> recognized, is this:
>
>   <div>
>     <para xmlns="http://docbook.org/ns/docbook">
>        This is some <link>text</link>.
>     </para>
>   </div>
>
> Like this:
>
>   <div xmlns="http://www.w3.org/1999/xhtml">
>     <para>
>        This is some
>        <link></link>
>        text.
>     </para>
>   </div>

Yes. Because <link> has no end tag in HTML. The same would happen if you  
used e.g. <img> or <meta>. (<br> would be slightly different as </br> is  
treated specially.)


> Or does more fixup occur, like ending the para too? (I'm experimenting
> with "inspect element" in Google Chrome 8.0.552.231 on the Mac to
> inform my guesses, but I don't assert anything about how Chrome deals
> with HTML5, so...)

I believe Chrome has a pretty much compliant HTML5 parser since version 7.


> What about this:
>
>   <div>
>     <script type="application/xml">
>       <para xmlns="http://docbook.org/ns/docbook">
>         This is a <link>link</link>.
>       </para>
>     </script>
>   </div>
>
> Is it like this?
>
>   <div xmlns="http://www.w3.org/1999/xhtml">
>     <script type="application/xml">
>       &lt;para&gt;
>         This is a &lt;link&gt;link&lt;/link&gt;.
>       &lt;/para&gt;
>     </script>
>   </div>

Yes.


> Or can I get the content of the script parsed into the DOM object I
> might naively expect such that I can access it with JavaScript?

You can pass it to a DOMParser object or some such.


-- 
Anne van Kesteren
http://annevankesteren.nl/

Received on Thursday, 6 January 2011 16:25:31 UTC