HTML5 comments started at 2.3.8

1. Section 3.1.3. The 'lastModified' value uses the "MM/DD/YYYY hh:mm:ss" format. It's not clear if this is done for historical/compatibility reasons (it is, isn't it?) and there should be a reference to why this particular format is wanted.

2. (same). The lastModified field is defined to use the current local time zone, but does not convey any information about that time zone. Shouldn't it?

3. (same) Document.defaultCharset. Provide a recommendation to return UTF-8? It's not clear what this field is good for :-)

4. (same) Document.charset and Document.characterSet appear to be the same thing, although charset has some additional capabilities and restrictions. Should these be harmonized? (Is 'characterSet' new? If so, we'd probably prefer to see "encoding" used instead)

5. Section 3.1.4. Is there a way to set or get the direction of the title attribute?

6. Section The dir 'auto' value has this note:

The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). Authors are urged to only use this value as a last resort when the direction of the text is truly unknown and no better server-side heuristic can be applied.

Does HTML5 need to define auto so closely that no user-agent can provide a better algorithm? That seems counter-productive. Some room for innovation should be preserved.

7. Section (lang). What does this mean:

If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown.

Does an explicitly unknown language have any different effect? It might be a good idea to add text such as: 

If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown and any language specific processing that applied is implementation defined.

8. Section For this much discussed paragraph:

If none of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.

Wouldn't an example be useful? I can imagine implementers not following what the heck we're talking about.

9. Section There is a note that reads:

All attributes on HTML elements in HTML documents get ASCII-lowercased automatically, so the restriction on ASCII uppercase letters doesn't affect such documents.

Later in the section there are several references to ASCII-lowercasing and ASCII-uppercasing operations. There is no discussion of how to handle non-ASCII Unicode values (the wisdom of any such appearing in this context is, of course, open). Default Unicode case folding might be a good idea here, though, if such values are expected to occur?

=== stopped at 3.2.5 ===

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

Received on Sunday, 17 July 2011 20:57:37 UTC