- From: <noah_mendelsohn@us.ibm.com>
- Date: Fri, 14 Nov 2008 17:07:56 -0500
- To: Ian Hickson <ian@hixie.ch>
- Cc: Jonas Sicking <jonas@sicking.cc>, public-html <public-html@w3.org>, www-tag@w3.org
Yes, I should have been more careful in crafting my examples. I hope the
spirit of the points came through in any case. As everyone seems to
agree, some unicode documents are legal HTML 5, while many others are
handled in HTML 5 browsers using error recovery logic. The point was to
encourage the working group to focus on producing, in addition to the
draft already being prepared, a document that would be a specification
specifically for legal HTML 5. Michael Smith indicates that he is
experimenting with the creation of such a draft (though I haven't yet
looked at it in detail). So, I feel that my concerns have not only been
heard, they have been acted upon. Thank you.
Noah
--------------------------------------
Noah Mendelsohn
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------
Ian Hickson <ian@hixie.ch>
Sent by: www-tag-request@w3.org
11/14/2008 04:31 PM
To: Jonas Sicking <jonas@sicking.cc>
cc: noah_mendelsohn@us.ibm.com, public-html
<public-html@w3.org>, www-tag@w3.org
Subject: Re: Comments on HTML WG face to face meetings in
France Oct 08
On Fri, 14 Nov 2008, Jonas Sicking wrote:
> On Thu, Nov 13, 2008 at 8:18 AM, <noah_mendelsohn@us.ibm.com> wrote:
> >
> > For example, all of the following will be parsed into DOMs, and
> > presented to users if retrieved as text/html:
> >
> > a) <!-- clearly OK -->
> > <html>
> > <body>
> > <div>
> > <p>Para</p>
> > </div>
> > </body>
> > </html>
> >
> > b) <html>
> > <body>
> > <div>
> > <p>Para</div> <!-- note bad nesting of tags -->
> > </p> <!-- note bad nesting of tags -->
> > </body>
> > </html>
> >
> > c) <html>
> > <body>
> > <!-- quoted attr -->
> > <img src="http://example.com/img.jpg">
> > </body>
> > </html>
> >
> > d) <html>
> > <body>
> > <!-- unquoted attr -->
> > <img src=http://example.com/img.jpg>
> > </body>
> > </html>
> >
> > e> XXXXXX (Isn't obviously HTML at all,
> > but browser will presumably
> > build a DOM and render XXXXXX)
> >
> > The best example I have of 'unclean' are (b), in which the close tags
> > are in the wrong order, and (e), which has no tags at all.
>
> Disregarding the <title> issue, HTML5 will only consider (a), (c) and
> (d) valid. (well, and maybe (e) too if you add the <title> due to all
> other tags being optional as per HTML4, not quite sure).
In the interests of accuracy, I should note that the HTML5 spec considers
all five of the above examples invalid (non-conforming) as they are
lacking a DOCTYPE.
In particular:
> > a)
> > <html>
> > <body>
> > <div>
> > <p>Para</p>
> > </div>
> > </body>
> > </html>
Missing DOCTYPE, missing <title>.
> > b) <html>
> > <body>
> > <div>
> > <p>Para</div> <!-- note bad nesting of tags -->
> > </p> <!-- note bad nesting of tags -->
> > </body>
> > </html>
Missing DOCTYPE, missing <title>, unexpected </p>. (The missing </p>
before the </div> is fine because HTML has always let the </p> end tag be
optional, ever since HTML2 or earlier.)
> > c) <html>
> > <body>
> > <!-- quoted attr -->
> > <img src="http://example.com/img.jpg">
> > </body>
> > </html>
Missing DOCTYPE, missing <title>, missing alt="".
> > d) <html>
> > <body>
> > <!-- unquoted attr -->
> > <img src=http://example.com/img.jpg>
> > </body>
> > </html>
Missing DOCTYPE, missing <title>, missing alt="".
> > e> XXXXXX (Isn't obviously HTML at all,
> > but browser will presumably
> > build a DOM and render XXXXXX)
Missing DOCTYPE, missing <title>. (Note that the other tags, <html>,
<head>, <body>, and their end tags, are optional in HTML, at least since
HTML2 if not earlier. The SGML parser, in earlier versions, and the HTML5
parser, in HTML5, will imply them.)
--
Ian Hickson U+1047E )\._.,--....,'``. fL
http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,.
Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Friday, 14 November 2008 22:08:40 UTC