[whatwg] [html5] Pre-Last Call Comments

On Sun, 5 Apr 2009, Giovanni Campagna wrote:
> A few comments, as requested by Ian Hickson.
> - End of 2.2.1, a typo: JavsScript instead of Javascript


> - From section 2.4.2 I don't understand if boolean attributes with
> invalid values represent "true" or "false". In addition, I don't
> understand if an empty value is false (as in XHTML1.0) or true (as in
> HTML4, because of the minimized syntax).
> >From my experience, I expect that the empty string (which is
> equivalent to not specify the attribute at all) is false, and any
> other value is true.

The spec says "The presence of a boolean attribute on an element 
represents the true value, and the absence of the attribute represents the 
false value"; is that not clear?

> - In 2.4.3 I don't see the point of all the digression about 
> contentEditable, since it is noted that it doesn't work like that. I 
> would leave the note to just "Note: The empty string can be one of the 
> keywords" or "Note: The empty string can a valid keyword"


> - In (and maybe in other places) I would prefer [A|E]BNF
> instead of the prose description of a floating point number.

It's not obvious to me that this would be any clearer.

> I'm also not sure that the normative algorithm is needed.

You mean for parsing? How else would you know how to parse it? In some of 
the cases the algorithms don't accept any errorneous content at all, but 
in many cases we have to define how you handle bogus data, and I don't see 
how to do that any other way.

> I've also searched IEEE, IETF, ECMA, ISO and ANSI for another normative 
> version of the syntax and processing, but I've found none. If you think 
> that it is important to have it specified completely, you may submit an 
> ID, so future technologies won't need to rewrite it again.

I'm not sure to what you refer. I certainly wouldn't want anyone reusing 
most of these definitions; many are the result of years of bugs causing 
legacy content to depend on weird quirks.

> - The second paragraph in is hard to understand because the
> verb is at the end. I would rewrite as "A week-year with a number *yr*
> has 53 weeks if corresponds to a year *yr* in the proleptic Gregorian
> calendar that has a Thursday as its first day (January 1st), or if
> *yr* where *yr* is a number divisible by 400, or a number divisible by
> 4 but not by 100. In all other cases it has 52 weeks"


> Also, don't rely on styles alone, use different words for identifiers
> and prose. This includes also the Note following, where no styles are
> applied and it is difficult to understand that "year year" is not a
> typo but rather is the year numbered "year".

I made the note use "y", but in general I find using anything but "year" 
here gets really ugly.

> - Can't be simply referenced CSS3 Color in 2.4.6?
> This way, implementors could have body[bgcolor] { background-color:
> attr(bgcolor,color,white); } in the default CSS instead of using HTML5
> specific rules.

The rules for parsing a legacy color value are very constrained and don't 
match CSS, no.

> - In 2.4.9 a valid hash reference must be equal to an ID, name is 
> supported only for backward compatibility.

No, <map> uses name="".

> - Section 2.6 is superfluous: handling of application cache is specified 
> in the appropriate section, handling of HTTP requests and caches is 
> defined in RFC2616, handling of cookie is defined in the appropriate RFC 
> (I don't remember the number), handling of about:blank is in the 
> proposed about-uri-scheme ID. In addition, serialized queue-based 
> handling of resources should not be mandated by the HTML5 specification 
> (can't UAs be multi-threaded?)

Section 2.6 (fetching) is needed to define how the fetching algorithm 
(HTTP, etc) fit into the event loop mechanism and the storage mutex.

> - Rewriting 2.6.1 without the HTTP word is definitely better. Browsers 
> are not required to support HTTP, AFAIK. You can write "a GET method" 
> (because GET is anyway an English word), "a response code" (most 
> protocols have response codes) and "metadata" (instead of headers, that 
> SMTP, POP, FTP don't support)

I think that would be far less clear.

> - 2.6.2 should be implied by the HTTP-over-TLS RFC

Apparently implying it isn't good enough, given current implementations.

> - In section 2.7.1, in sentence "Extensions must not be used for 
> determining resource types for resources fetched over HTTP.", do you 
> mean "File extensions", like .txt or .png, or "User agent extensions" 
> (additions to the algorithm)?

This is fixed in Adam's draft now.

> - Still in section 2.7.1, why the algorithm is a violation of RFC2616? 
> Because it is case insensitive? Because it allows spaces? Because it 
> does not imply ISO-8859-1 if no charset is explicit? Because it does not 
> imply ASCII for text/* mime types?

Because it means not blindly honouring Content-Type.

> - Why don't you add "<?xml" to the sniffing table?

I'll leave this up to Adam.

> - In section 2.8, "x-x-big5" is not a different encoding than "big5",
> it rather seems an alias (and as such should be submitted to IANA)

Agreed; if anyone would like to volunteer to do this that would be very 

> - Later in the same section, I don't understand why you don't support 
> those encodings, if the encoding declaration is explicit in the protocol 
> layer or is allowed by a different specification. For example, XML 
> allows EBDIC based encodings.

UTF-32 is widely misimplemented. EBCDIC isn't widely supported. Generally 
speaking we're trying to reduce encoding proliferation.

> In addition, I don't understand why supporting UTF-32 or EBDIC means a 
> change to the algorithm, that are defined in terms of Unicode code 
> points (very similar to UTF-32 characters)

Supporting UTF-32 or EBCDIC would mean changes to the character encoding 
sniffing algorithms.

> - In section 2.9.1, I completely don't understand the part about DOM 
> attributes of type HTMLElement, especially the subpart about setting.

I'm not sure how to clarify it... What don't you understand? Or rather, 
what _do_ you understand?

> - In section 2.9.5, instead of define DOMStringMap only for EcmaScript, 
> use explicit indexing operation in the IDL, add them the [NameGetter] / 
> [NameSetter] / [NameDeleter] attributes, and add a [NoIndexingOperation] 
> to the whole interface.


> - In section 2.9.6 you discourage use of hasFeature. Firstly, if an 
> implementation says true and it is not compliant, it is not a spec bug, 
> it is an implementation bug.

This isn't much of comfort to authors.

> Secondly, to allow implementation granularity, you could define more 
> features (for example HTML 5.0, XHTML 5.0, HTMLCanvas2D 5.0, HTMLSection 
> 5.0, HTMLDatagrid 5.0, HTMLMediaObject 5.0 etc.)

Why not rely on the features themselves instead? The whole hasFeature() 
idea is deeply flawed, IMHO.

> - In section 3.2.1, seems that interfaces other than Document and 
> HTMLDocument should be exposed by the object only if different 
> namespaces are found in the document. This is not true: SVG UAs for 
> example must always expose the SVGDocument interface on Document.

What SVG requires is defined by SVG; the spec here is just saying that 
HTML5 isn't attempting to push the other specs away.

> - document.lastModified should return null or the empty string if the 
> last modification date is not known (what if the document was really 
> last modified on January 1st 1970?)

This was changed to match implementations.

> - Parsing is outside the scope of section 3.2.3

I'm not sure I follow.

> and I don't understand why CSS1Compat vs BackCompat if the quirks are 
> limited to parsing

The names were invented by Microsoft long ago.

> - On setting document.charset, if the specified charset is not supported 
> it should be treated as non registered.


> - Why do we have both document.charset and document.characterSet?

I'd rather have neither, but implementations have both.

> - In section 3.2.4, about title in the author-only text, remember that 
> Document always implements SVGDocument and HTMLDocument.


> - What on earth does "incumbent" mean? (about document.body)

It's the one currently holding the office of "the body element", as 
opposed to the one that's about to replace it.

> - Is it necessary to have that mess of property indexing on HTMLDocument 
> (that, by the way, may be implemented along with other language specific 
> interfaces)? Just drop them at all: existing browser will continue to 
> implement it, but new browser won't, and neither new sites will use it.

The idea is to define what it takes to write a browser that supports 
legacy pages, which is more or less what browsers do now, so 
unfortunately, we can't drop a feature just because we don't like it.

> - Named elements is defined twice: once before the algorithm, and once 
> after

I can only find one definition for HTMLDocument; can you elaborate?

> - In section, instead of defining the syntax of style
> attributes, reference <http://www.w3.org/TR/css-style-attr>

That draft is not actively maintained, so it's not clear that it is a good 
draft to reference yet.

> - In section, a document may have a default language even if it 
> doesn't have a content-language http-equiv, if it has a Content-Language 
> HTTP header.

No, the Content-Language HTTP header doesn't set the default language.

> - Section 4.2.7 should be completely delegated to CSSOM

This section defines the interface to CSSOM; why would CSSOM define the 
HTML behaviour?

> - Noscript should be allowed in XML, just without the complexity (and 
> simply treated as display:none if scripting is enabled)


> - And is a grammar mistake in "These juicy, green apples and make a 
> great filling for apple pies" (the example in 4.4.2)


> - I completely cannot understand

Assuming you mean the section that is now Distinguishing 
site-wide headings from page headings, could you elaborate? What is the 
first problem?

> - I would like to disagree with the man who disagreed with the other man 
> who disagreed with Ian Hickson (who said that things that are impossible 
> just take longer) (section about <q>)

Not sure if this is a joke or a request to change the spec. :-)

> - I don't think it is of any use to link a BBC article in 4.6.20

It just helps give context.

> - Section 4.8.3 still refers to the Window Object specification, which I 
> think has been superseded by HTML5

Yeah, I have a note about fixing this in the source. This will be fixed 
in due course.

> - classid is not a conforming attribute for object, and yet it is used 
> in the algorithm to find a plugin. AFAIK, classid is only used by IE 
> (along with COM) so I don't think it is a problem to drop it completely.

Actually it's used by a number of implementations much as described by the 

> - in HTMLFormElement, the function item should accept an integer, not a 
> DOMString (because it is an IndexGetter) Same in HTMLSelectElement


> - In section 4.10.4, the table about which attributes applies to the 
> various input types overflows in Opera 9.64 (1280x768 being the 
> resolution, 12pt the font size) and it is very hard to read

Not sure how to improve this. There's a lot of data here.

> - In I expect that neither the user is able to see the password

If the user agent is able to hide the password from everyone but the user, 
that would be a conforming implementation (and a far more useful one than 
today's), so I disagree.

> - In an "A" is missing in the part number example


> I hope that this will help someone

Indeed, thanks!

Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Wednesday, 3 June 2009 02:15:58 UTC