- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 14 Nov 2008 17:07:56 -0500
- To: Ian Hickson <ian@hixie.ch>
- Cc: Jonas Sicking <jonas@sicking.cc>, public-html <public-html@w3.org>, www-tag@w3.org
Yes, I should have been more careful in crafting my examples. I hope the spirit of the points came through in any case. As everyone seems to agree, some unicode documents are legal HTML 5, while many others are handled in HTML 5 browsers using error recovery logic. The point was to encourage the working group to focus on producing, in addition to the draft already being prepared, a document that would be a specification specifically for legal HTML 5. Michael Smith indicates that he is experimenting with the creation of such a draft (though I haven't yet looked at it in detail). So, I feel that my concerns have not only been heard, they have been acted upon. Thank you. Noah -------------------------------------- Noah Mendelsohn IBM Corporation One Rogers Street Cambridge, MA 02142 1-617-693-4036 -------------------------------------- Ian Hickson <ian@hixie.ch> Sent by: www-tag-request@w3.org 11/14/2008 04:31 PM To: Jonas Sicking <jonas@sicking.cc> cc: noah_mendelsohn@us.ibm.com, public-html <public-html@w3.org>, www-tag@w3.org Subject: Re: Comments on HTML WG face to face meetings in France Oct 08 On Fri, 14 Nov 2008, Jonas Sicking wrote: > On Thu, Nov 13, 2008 at 8:18 AM, <noah_mendelsohn@us.ibm.com> wrote: > > > > For example, all of the following will be parsed into DOMs, and > > presented to users if retrieved as text/html: > > > > a) <!-- clearly OK --> > > <html> > > <body> > > <div> > > <p>Para</p> > > </div> > > </body> > > </html> > > > > b) <html> > > <body> > > <div> > > <p>Para</div> <!-- note bad nesting of tags --> > > </p> <!-- note bad nesting of tags --> > > </body> > > </html> > > > > c) <html> > > <body> > > <!-- quoted attr --> > > <img src="http://example.com/img.jpg"> > > </body> > > </html> > > > > d) <html> > > <body> > > <!-- unquoted attr --> > > <img src=http://example.com/img.jpg> > > </body> > > </html> > > > > e> XXXXXX (Isn't obviously HTML at all, > > but browser will presumably > > build a DOM and render XXXXXX) > > > > The best example I have of 'unclean' are (b), in which the close tags > > are in the wrong order, and (e), which has no tags at all. > > Disregarding the <title> issue, HTML5 will only consider (a), (c) and > (d) valid. (well, and maybe (e) too if you add the <title> due to all > other tags being optional as per HTML4, not quite sure). In the interests of accuracy, I should note that the HTML5 spec considers all five of the above examples invalid (non-conforming) as they are lacking a DOCTYPE. In particular: > > a) > > <html> > > <body> > > <div> > > <p>Para</p> > > </div> > > </body> > > </html> Missing DOCTYPE, missing <title>. > > b) <html> > > <body> > > <div> > > <p>Para</div> <!-- note bad nesting of tags --> > > </p> <!-- note bad nesting of tags --> > > </body> > > </html> Missing DOCTYPE, missing <title>, unexpected </p>. (The missing </p> before the </div> is fine because HTML has always let the </p> end tag be optional, ever since HTML2 or earlier.) > > c) <html> > > <body> > > <!-- quoted attr --> > > <img src="http://example.com/img.jpg"> > > </body> > > </html> Missing DOCTYPE, missing <title>, missing alt="". > > d) <html> > > <body> > > <!-- unquoted attr --> > > <img src=http://example.com/img.jpg> > > </body> > > </html> Missing DOCTYPE, missing <title>, missing alt="". > > e> XXXXXX (Isn't obviously HTML at all, > > but browser will presumably > > build a DOM and render XXXXXX) Missing DOCTYPE, missing <title>. (Note that the other tags, <html>, <head>, <body>, and their end tags, are optional in HTML, at least since HTML2 if not earlier. The SGML parser, in earlier versions, and the HTML5 parser, in HTML5, will imply them.) -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 14 November 2008 22:08:42 UTC