Re: Semicolon after entities from Lachlan Hunt on 2007-04-30 (www-html@w3.org from April 2007)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Tue, 01 May 2007 01:04:40 +1000
To: tina@greytower.net
CC: www-html@w3.org
Message-ID: <46360588.6060009@lachy.id.au>
Tina Holmboe wrote:
> On 30 Apr, Lachlan Hunt wrote:
> 
>> All browsers have bugs, but we need to determine the most reasonable 
>> behaviour based on the way existing browsers work.
> 
>  and
> 
>> Yes it is a direct result of undefined error handling.  Authors build 
>> pages that rely on bugs in whatever browser they use, so in the eyes
>> of the author, the buggy browser behaviour is correct.  That has
>> been
> 
>  While admitting that, will you still stand by the philosophy that the
>  most reasonable behaviour should be based on *what existing browsers
>  do*?

Yes.  Any behaviour we define has to be compatible with the existing 
content on the web, much of which relies on such behaviour.  This 
doesn't mean we need to document every single bug in every browser.  We 
need to define a common set

>> Those specs contain unimplemented features because they cannot be 
>> implemented.  Some things are left undefined, others are defined in
> 
>  Can you enumerate the features that /cannot/ be implemented?

There are many, but ok.

*Parsing*

* SHORTTAG NET cannot be implemented without breaking existing content.
   e.g. <br/>, <meta/>, etc. would result in '>' characters scattered
   throughout millions of pages.
* <title/Foo Bar/ which is short for <title>Foo Bar</title>
   That syntax would break links with unquoted href attributes.
   <a href=http://example.com>...</a> is equivalent to
   <a href="http:"></a>example.com</a>
* <div>...</>
* Marked sections <![CDATA[]]>, etc.
* Processing Instructions <?foo ...>
* Internal subsets in the DOCTYPE.

* Omitted attribute names.
   e.g. <table border> is actually shorthand for <table frame="border">,
   but is required to be treated as <table border="1"> for compatibility.

* <noscript> parsing requirements are not compatible with SGML.  It has
   to be parsed differently depending on whether or not script is
   enabled.

*Error Recovery*
(I've been told that SGML doesn't actually define any any error 
processing, which makes these cases ambiguous, and thus unimplementable.)

* Comments parsing e.g. <!-- -- --> -->
* Misnested elements <b>bold<i>bold italic</b>italic</i>.
* Elements like <title>, <meta>, <link>, etc. that occur within the body
   get moved to the HEAD in the DOM.

* <script>
     document.write("... </p>");
   </script>
   (The </p> would actually close the <script> in SGML)

* No error handling for overlapping cells in tables was defined.
* Table layout algorithm is not well defined
   (it's not even well defined in CSS yet)

*Elements and Attributes*

* <object> is not defined well enough to be implemented and suffers from
   a serious lack of interop in the real world.
* <td char="" charoff="">
* <a shape="" coords=""> for image maps.
* <meta scheme>
* <head profile=""> (I know, it's been used by microformats, but nothing
   really made any use of it)
* <param valuetype="">

*Character Encoding*

* Character Encoding detection using <meta> were not defined.
* HTML4 doesn't take the BOM into account in the algorithm.

The spec states:
   "user agents must not assume any default value for the "charset"
    parameter."
   -- http://www.w3.org/TR/html401/charset.html#h-5.2.2

but it does not define an algorithm to sniff the encoding, and the 
reality is that UAs are usually forced to default to Windows-1252.


The conformance requirements are contradictory where it states:

   "A conforming user agent for HTML 4 is one that observes the
    mandatory conditions ("must") set forth in this specification,
    including the following points:"
    -- http://www.w3.org/TR/html401/conform.html

and then proceeds to list a SHOULD and a RECOMMENDED requirement.

There is also many sections that completely lack conformance criteria.

I'm sure there is much more, but I don't have the time to go through the 
whole spec.  They're just the issues I can remember.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Monday, 30 April 2007 15:09:50 UTC