Re: Comments on HTML WG face to face meetings in France Oct 08 from noah_mendelsohn@us.ibm.com on 2008-11-14 (www-tag@w3.org from November 2008)

From: <noah_mendelsohn@us.ibm.com>
Date: Fri, 14 Nov 2008 17:07:56 -0500
To: Ian Hickson <ian@hixie.ch>
Cc: Jonas Sicking <jonas@sicking.cc>, public-html <public-html@w3.org>, www-tag@w3.org
Message-ID: <OFB84E638A.06837E8A-ON85257501.00793A14-85257501.007993C4@lotus.com>

Yes, I should have been more careful in crafting my examples.  I hope the 
spirit of the points came through in any case.   As everyone seems to 
agree, some unicode documents are legal HTML 5, while many others are 
handled in HTML 5 browsers using error recovery logic.  The point was to 
encourage the working group to focus on producing, in addition to the 
draft already being prepared, a document that would be a specification 
specifically for legal HTML 5.  Michael Smith indicates that he is 
experimenting with the creation of such a draft (though I haven't yet 
looked at it in detail).  So, I feel that my concerns have not only been 
heard, they have been acted upon.  Thank you.

Noah

--------------------------------------
Noah Mendelsohn 
IBM Corporation
One Rogers Street
Cambridge, MA 02142
1-617-693-4036
--------------------------------------

Ian Hickson <ian@hixie.ch>
Sent by: www-tag-request@w3.org
11/14/2008 04:31 PM

        To:     Jonas Sicking <jonas@sicking.cc>
        cc:     noah_mendelsohn@us.ibm.com, public-html 
<public-html@w3.org>, www-tag@w3.org
        Subject:        Re: Comments on HTML WG face to face meetings in 
France Oct 08

On Fri, 14 Nov 2008, Jonas Sicking wrote:
> On Thu, Nov 13, 2008 at 8:18 AM,  <noah_mendelsohn@us.ibm.com> wrote:
> > 
> > For example, all of the following will be parsed into DOMs, and 
> > presented to users if retrieved as text/html:
> >
> > a) <!-- clearly OK -->
> >   <html>
> >   <body>
> >   <div>
> >   <p>Para</p>
> >   </div>
> >   </body>
> >   </html>
> >
> > b) <html>
> >   <body>
> >   <div>
> >   <p>Para</div>   <!-- note bad nesting of tags -->
> >   </p>  <!-- note bad nesting of tags -->
> >   </body>
> >   </html>
> >
> > c) <html>
> >   <body>
> >   <!-- quoted attr -->
> >   <img src="http://example.com/img.jpg">
> >   </body>
> >   </html>
> >
> > d) <html>
> >   <body>
> >   <!-- unquoted attr -->
> >   <img src=http://example.com/img.jpg>
> >   </body>
> >   </html>
> >
> > e>  XXXXXX (Isn't obviously HTML at all,
> >            but browser will presumably
> >            build a DOM and render XXXXXX)
> >
> > The best example I have of 'unclean' are (b), in which the close tags 
> > are in the wrong order, and (e), which has no tags at all.
> 
> Disregarding the <title> issue, HTML5 will only consider (a), (c) and 
> (d) valid. (well, and maybe (e) too if you add the <title> due to all 
> other tags being optional as per HTML4, not quite sure).

In the interests of accuracy, I should note that the HTML5 spec considers 
all five of the above examples invalid (non-conforming) as they are 
lacking a DOCTYPE.

In particular:

> > a) 
> >   <html>
> >   <body>
> >   <div>
> >   <p>Para</p>
> >   </div>
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>.

> > b) <html>
> >   <body>
> >   <div>
> >   <p>Para</div>   <!-- note bad nesting of tags -->
> >   </p>  <!-- note bad nesting of tags -->
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, unexpected </p>. (The missing </p> 
before the </div> is fine because HTML has always let the </p> end tag be 
optional, ever since HTML2 or earlier.)

> > c) <html>
> >   <body>
> >   <!-- quoted attr -->
> >   <img src="http://example.com/img.jpg">
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, missing alt="".

> > d) <html>
> >   <body>
> >   <!-- unquoted attr -->
> >   <img src=http://example.com/img.jpg>
> >   </body>
> >   </html>

Missing DOCTYPE, missing <title>, missing alt="".

> > e>  XXXXXX (Isn't obviously HTML at all,
> >            but browser will presumably
> >            build a DOM and render XXXXXX)

Missing DOCTYPE, missing <title>. (Note that the other tags, <html>, 
<head>, <body>, and their end tags, are optional in HTML, at least since 
HTML2 if not earlier. The SGML parser, in earlier versions, and the HTML5 
parser, in HTML5, will imply them.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Friday, 14 November 2008 22:08:42 UTC