- From: Jim Jewett <jimjjewett@gmail.com>
- Date: Thu, 20 Nov 2008 12:09:21 -0500
- To: "HTML WG" <public-html@w3.org>
(Sorry if you saw an earlier version already; I accidentally sent it to the wrong list) Julian wrote: > I agree that the language HTML5 should have a singular normative > definition. I'd prefer it not to be the same document that describes all > the rest. I'll go farther and say that http://www.w3.org/html/wg/markup-spec/ is such a good start that I'm ready to start commenting on it. Many of these comments would apply to the original spec as well, but I kept getting sort of lost there, because of the size. And that size (plus the more general audience for HTML) is the reason it is so important to separate out the various parts -- more important than for some other specifications. Section 2, Terminology: Should "case-insensitive" be "ASCII-case-insensitive", which is the term used in section 3.6. attributes? Should "space characters" be called "spacing characters", to distinguish them from the specific character named SPACE? Should it be called out explicitly that these are only a subset of the unicode characters having the White_Space property? Section 3, Syntax: "an XML parser" should probably be in a dfn tag, if "an HTML parser" is, unless you are intentionally delegating that definition ... but then be explicit. The HTML parser definition should probably also be delegated to the parsing (or at least processing/error-correction) document. General, but first noticed in Section 3.1: Should "MUST", "MUST NOT", "SHOULD", etc be capitalized, as in other recent specs? Section 3.4, Character Encoding This should not always assume HTTP, so "... and if its encoding is not explicitly given by Content-Type metadata," => "... and if its encoding is not explicitly known from external information, such as the HTTP Content-Type header," I couldn't quite make sense of the "or ... " clause for the meta element. My suggestion is then the encoding must be specified using a meta element with a charset attribute or a meta element in the Encoding declaration state. => then the encoding must be specified using a meta element with a charset attribute. Section 3.5, Elements Attributes may be separated from each other => Attributes MUST be separated from each other Section 3.5, Rule 6 implies that elements which *could* have content cannot be self-closing. Therefore, <div /> is illegal. That is OK with me, but it is worth being explicit, because this is arguably a change. Section 3.6, Attributes Is the Attribute Names rule correct? It seems to imply that each of the following single-character is a legitimate attribute name: ";" "(" "<" "\" If so, should there at least be a SHOULD on using XML-compatible names? Section 3.7, Text. Why the extra work to ensure that <!--> is a valid escaping text span? (Similar question on comments.) I understand that it is an edge case which the parser needs to handle, but is there a reason to have such an empy text span be valid? Section 3.8, character references. Why are non-ambiguous ampersands allowed? Are they useful enough to justify the extra complexity? (Maybe... but I'm not sure. To me, the fact that &< is OK just makes the rules seem arbitrary.) Section 4, the HTML elements The assertions sections are very useful. It is probably worth adding a classification subsection to each element. For example, a is interactive, and can be either phrasing or block. b is inline, is not interactive, and is a formatting element. There should probably then be an informative paragraph to explain that these element classifications are used by other standards, such as extra error recovery for formatting elements in the processing standard. Element a: I think a.elem.phrase is a strict subset of a.elem.prose, so it might be worth adding a short note explaining the difference. (Even if that does violate the separation of concerns... but I think it doesn't. I think the difference is that using prose makes the tag itself a block-level element instead of a phrase-level element.) Element abbr: There is a stray ` character after the name Philip in the example -- this seems to be copied straight from a similar typo in the full spec. Element acronym: Should this just be dropped from the valid markup spec, and included only in the parsing-and-error-correction spec? At the very least, it should say "Use the abbr element instead." Element address: Should there be an invalid example that is still an address, but just not a contact address, such as My Dad lives at <address>123 Memory Lane</address>? Element area: needs some cleanup from the conversion, about the various state that coords would represent. Element canvas: I think most of this could be left in the processing document. Just list the two attributes, and their default values. Maybe specify that the coordinate space is in abstract units, which may not correspond to pixels or pica or ex. Say it is typically used with scripting, but maybe specify the default/initial appearance when no script is run. Element col: "If a col element has a parent ..." What does it represent otherwise? (The current spec doesn't say either.) Maybe for the valid markup, just reword it to show proper usage. "A col element represents one (or more) columns within its parent colgroup element." Element colgroup: Similar issue to col. Just drop the ", if it has a parent and that is a table element." (And no, I didn't finish reviewing all elements yet...) -jJ
Received on Thursday, 20 November 2008 17:09:57 UTC