Re: Comments on HTML WG face to face meetings in France Oct 08

On Fri, 14 Nov 2008, Jonas Sicking wrote:
> On Thu, Nov 13, 2008 at 8:18 AM,  <noah_mendelsohn@us.ibm.com> wrote:
> > 
> > For example, all of the following will be parsed into DOMs, and 
> > presented to users if retrieved as text/html:
> >
> > a) <!-- clearly OK -->
> >   <html>
> >   <body>
> >   <div>
> >   <p>Para</p>
> >   </div>
> >   </body>
> >   </html>
> >
> > b) <html>
> >   <body>
> >   <div>
> >   <p>Para</div>   <!-- note bad nesting of tags -->
> >   </p>  <!-- note bad nesting of tags -->
> >   </body>
> >   </html>
> >
> > c) <html>
> >   <body>
> >   <!-- quoted attr -->
> >   <img src="http://example.com/img.jpg">
> >   </body>
> >   </html>
> >
> > d) <html>
> >   <body>
> >   <!-- unquoted attr -->
> >   <img src=http://example.com/img.jpg>
> >   </body>
> >   </html>
> >
> > e>  XXXXXX (Isn't obviously HTML at all,
> >            but browser will presumably
> >            build a DOM and render XXXXXX)
> >
> > The best example I have of 'unclean' are (b), in which the close tags 
> > are in the wrong order, and (e), which has no tags at all.
> 
> Disregarding the <title> issue, HTML5 will only consider (a), (c) and 
> (d) valid. (well, and maybe (e) too if you add the <title> due to all 
> other tags being optional as per HTML4, not quite sure).

In the interests of accuracy, I should note that the HTML5 spec considers 
all five of the above examples invalid (non-conforming) as they are 
lacking a DOCTYPE.

In particular:

> > a) 
> >   <html>
> >   <body>
> >   <div>
> >   <p>Para</p>
> >   </div>
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>.


> > b) <html>
> >   <body>
> >   <div>
> >   <p>Para</div>   <!-- note bad nesting of tags -->
> >   </p>  <!-- note bad nesting of tags -->
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, unexpected </p>. (The missing </p> 
before the </div> is fine because HTML has always let the </p> end tag be 
optional, ever since HTML2 or earlier.)


> > c) <html>
> >   <body>
> >   <!-- quoted attr -->
> >   <img src="http://example.com/img.jpg">
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, missing alt="".


> > d) <html>
> >   <body>
> >   <!-- unquoted attr -->
> >   <img src=http://example.com/img.jpg>
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, missing alt="".


> > e>  XXXXXX (Isn't obviously HTML at all,
> >            but browser will presumably
> >            build a DOM and render XXXXXX)

Missing DOCTYPE, missing <title>. (Note that the other tags, <html>, 
<head>, <body>, and their end tags, are optional in HTML, at least since 
HTML2 if not earlier. The SGML parser, in earlier versions, and the HTML5 
parser, in HTML5, will imply them.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 14 November 2008 21:31:54 UTC