- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 14 Nov 2008 21:31:16 +0000 (UTC)
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: noah_mendelsohn@us.ibm.com, public-html <public-html@w3.org>, www-tag@w3.org
On Fri, 14 Nov 2008, Jonas Sicking wrote: > On Thu, Nov 13, 2008 at 8:18 AM, <noah_mendelsohn@us.ibm.com> wrote: > > > > For example, all of the following will be parsed into DOMs, and > > presented to users if retrieved as text/html: > > > > a) <!-- clearly OK --> > > <html> > > <body> > > <div> > > <p>Para</p> > > </div> > > </body> > > </html> > > > > b) <html> > > <body> > > <div> > > <p>Para</div> <!-- note bad nesting of tags --> > > </p> <!-- note bad nesting of tags --> > > </body> > > </html> > > > > c) <html> > > <body> > > <!-- quoted attr --> > > <img src="http://example.com/img.jpg"> > > </body> > > </html> > > > > d) <html> > > <body> > > <!-- unquoted attr --> > > <img src=http://example.com/img.jpg> > > </body> > > </html> > > > > e> XXXXXX (Isn't obviously HTML at all, > > but browser will presumably > > build a DOM and render XXXXXX) > > > > The best example I have of 'unclean' are (b), in which the close tags > > are in the wrong order, and (e), which has no tags at all. > > Disregarding the <title> issue, HTML5 will only consider (a), (c) and > (d) valid. (well, and maybe (e) too if you add the <title> due to all > other tags being optional as per HTML4, not quite sure). In the interests of accuracy, I should note that the HTML5 spec considers all five of the above examples invalid (non-conforming) as they are lacking a DOCTYPE. In particular: > > a) > > <html> > > <body> > > <div> > > <p>Para</p> > > </div> > > </body> > > </html> Missing DOCTYPE, missing <title>. > > b) <html> > > <body> > > <div> > > <p>Para</div> <!-- note bad nesting of tags --> > > </p> <!-- note bad nesting of tags --> > > </body> > > </html> Missing DOCTYPE, missing <title>, unexpected </p>. (The missing </p> before the </div> is fine because HTML has always let the </p> end tag be optional, ever since HTML2 or earlier.) > > c) <html> > > <body> > > <!-- quoted attr --> > > <img src="http://example.com/img.jpg"> > > </body> > > </html> Missing DOCTYPE, missing <title>, missing alt="". > > d) <html> > > <body> > > <!-- unquoted attr --> > > <img src=http://example.com/img.jpg> > > </body> > > </html> Missing DOCTYPE, missing <title>, missing alt="". > > e> XXXXXX (Isn't obviously HTML at all, > > but browser will presumably > > build a DOM and render XXXXXX) Missing DOCTYPE, missing <title>. (Note that the other tags, <html>, <head>, <body>, and their end tags, are optional in HTML, at least since HTML2 if not earlier. The SGML parser, in earlier versions, and the HTML5 parser, in HTML5, will imply them.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 14 November 2008 21:31:52 UTC