- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 19 Aug 2002 19:27:20 +0300
- To: www-html-editor@w3.org
- Cc: www-html@w3.org
The Namespace The Draft specifies a new namespace. I think it would be nice to have some rationale for this design decision in the spec. Defining a new namespace certainly makes it easier for implementors of XHTML 2.0 user agents to resist requests to support legacy elements. However, for user agents that support both XHTML 1.x and 2.0, it would be easier to keep the same namespace and rename the elements whose semantics have changed. External Entities and User Agent Performance The Draft makes the inclusion of a doctype declaration in the document a "must" requirement and requires the PUBLIC id (if present) to reference a particular DTD. (However, the spec doesn't exactly require a lone SYSTEM id to reference an equivalent DTD resource, but I assume it is the intent.) I think this may cause performance problems to user agent that are able to parse DTDs if the namespace attributes are going to be handled the same way they are handled in XHTML 1.1. In XHTML 1.1 each element has the xmlns attribute and the attribute has a #FIXED value. If each and every element in the document instance doesn't have the attribute explicitly, it will be provided via attribute defaulting if the external subset is processed. However, if the external subset is not processed, the attribute values won't be there. From the point of view of the Namespaces in XML spec, the result is the same, but from the point of view of XML 1.0 validity constraints it makes a difference: if the documents was declared standalone difference in attribute defaulting when processing the external subset and when not would be a violation of a validity constraint. As a result, if one wanted to construct a valid standalone document when the DTD has xmlns attribute defaults for each element, one would have to repeat the xmlns attribute on each element, which is obviously impractical. However, being able to declare documents standalone would be useful in terms of user agent performance. Fetching a DTD with all the modules or even loading and parsing them from a local catalog causes a noticeable performance hit for user agents. (This can be seen when comparing DocZilla and Mozilla for example. DocZilla actually parses the DTD but Mozilla doesn't.) Therefore, it would make sense indicate to user agents that they may safely leave the external subset unparsed and that the external subset is only referenced for the purpose of validation. If the XHTML 2.0 spec requires the doctype declaration and uses attribute defaults similar to those in XHTML 1.1, it will be impractical to make valid standalone documents. Content-Type The Draft doesn't say which media type should be used for labeling the document instances when the transport used media type labeling. I think making the media type labeling clear early on is crucial for interoperability of implementations. It would also be good to include an explicit "must not" against sending XHTML 2.0 as text/html. Entity references The definition of entity reference implies that the DTD will declare character entities in addition to the predefined ones. I think doing so is unnecessary since XML allow the use of any Unicode character directly as UTF-8. Character entities move the responsibility of being able to deal with character aliases to the rendering end even though it is more of an input issue which should be dealt with at the data entry time. If someone really has to use an ASCII editor instead of a proper Unicode editor, the NCRs are there. On the other hand, allowing character entities makes it necessary to parse the external subset and that would complicate lightweight user agents with non-validating parsers unnecessarily. Classes The class example is ill-formed. <p class="note">...</p> would look better than using <span>. accesskey The Draft says 'Apple systems, one generally has to press the "cmd" key in addition to the access key.' The command key is the accelerator key for the keyboard shortcuts of the browser's own functions on Mac, so another modifier key (eg. ctrl) would have to be used in order to avoid conflicts. Deprecated Elements (like br) Since XHTML 2.0 defines a new namespace, there are no pre-existing elements in the namespace. The deprecated elements are effectively *created as deprecated*. I think creating elements as deprecated doesn't make sense, since that would mean creating a burdensome legacy where none would otherwise exist. On the other hand, if it is considered necessary to keep the deprecated elements, I think it would make more sense to keep them in the XHTML 1.x namespace. Headings I like the <section> and <h> arrangement a lot. However, I think including <h1> through <h6> unnecessarily complicates things. I'd like to suggest including only one way of marking up headings (the <h> and <section> way) instead of including two incompatible ways. Quote I think dropping the <q> element in favor of <quote> is a very good thing. In practice, generating context-sensitive quotation marks in the user agent is really hard to get right. The Draft says: "Visual user agents are not required to add delimiting quotation marks - -". I think it would be better to make the statement stronger in order to avoid cases where both the author and the user agent add quotation marks. I suggest substituting "are not required to" with "must not". Anchors Like many others, I was surprised to find that the Draft uses a linking method of its own instead of simple XLink. Isn't simple XLinks supposed to be used in new specs exactly in the cases like <a href="...">? The Edit Module "This element is unusual for XHTML in that they may serve as either block-level or inline elements (but not both)." I think the dual nature of ins and del is quite undesirable. The rule "The del element must not contain block-level content when it is behaving as an inline element." can't be enforced in validation. Also, if an element has a dual inline/block nature, it is more difficult to handle the presentation of the element in a user agent style sheet. I think it would be more straight forward to have separate elements for block and inline deletions and insertions (just like div and span are separate). Referencing Style Sheets The definition of <link/> includes the old HTML 4 style sheet linking. Since there is a general processing instruction for associating style sheets with XML documents, I think requiring user agents to support <link/> style sheets is unnecessary. The Metainformation Module The Draft says that http-equiv exist for the purpose of HTTP servers gathering information for HTTP headers. I could be mistaken, but when it comes to HTML and XHTML 1.x there don't seem to be actual servers implementing this feature. I think the unimplementedness suggests that the feature has failed in practice and could be removed from XHTML 2.0. There are browsers that pay attention to HTML http-equiv, and the http-equiv in HTML is routinely used for three purposes other than information gathering on the server side: 1) Trying to specify the charset parameter of the Content-Type header. I think this should not be supported in XHTML 2.0, since supporting this feature requires scanning the incoming data buffer before parsing since the information about the character set that is supposedly in an attribute can't be found in attribute parsing, because the information is needed before parsing. Also, since the character encoding issue is deal with at the XML level, it would be harmful to add another and less elegant way of specifying the character encoding. 2) Trying to make a "redirect" ("meta refresh") without knowing about real HTTP redirects. 3) Trying to manipulate cache behavior. Compared to proper HTTP headers this approach is harmful, because HTTP caches won't see the pseudo-HTTP header that the author thinks have some meaning to caching systems. Then recently some authors have thought that including the tag <meta http-equiv="Content-Type" content="application/xhtml+xml" /> would do something good when the real HTTP header says text/html. I think it would be appropriate to drop the http-equiv attribute. Or if is kept in XHTML 2.0, I think it would be good to include some notice that authors mustn't expect http-equiv attibutes to have any useful effect unless their server actually gathers information from the http-equiv attributes. The Scripting Module There's an example with document.write(). The way document.write() works in text/html user agents and is used is very tag soupish. Parsing the markup is suspended and a script prints strings to the tag soup parser input stream. The XML parser is usually developed separately from the application using it. This is a good thing, since it allows the development of robust and reusable XML parser. It also makes implementing something like document.write() harder, which I think is a good thing, too. Implementing document.write() would likely require tampering with the separately developed XML parser or would require the use of a separate pseudo-XML parser in addition to the real parser so that the application could combine the element trees coming from the pseudo-XML parser with the main tree coming from the real parser. I'd like to suggest disallowing the use of document.write() with XHTML 2.0 and with XML-based languages in general. This would simplify the implementation of user agents in other ways as well: When there is no document.write() there is no need to allow script elements to occur as descendants of the body element and there is no need to begin the execution of scripts before the entire document has been parsed and the corresponding DOM tree fully created. Also, the script element has an attribute called charset for indicating the character encoding of an external script. I can't find a good description of the attribute. It seems to me that an author could use such an attribute for two purposes: to try to override the charset parameter of the Content-Type header of the script or to let user agent make a decision about not loading scripts whose characters are encoded in an unsupported way. I think that in the former case the author should be encouraged to get the real HTTP charset of the script itself right. As for the latter case, I'm inclined to think that the usefulness of the attribute would be minimal, because programming languages tend the be representable in common encodings. Ruby The Draft references Ruby. The Ruby spec doesn't say clearly what the proper namespace URL for the Ruby elements is, but in XHTML 1.1 the Ruby elements seem to be in the http://www.w3.org/1999/xhtml namespace. Since the module is unchanged in XHTML 2.0, it would be reasonable to assume that the elements are still in the http://www.w3.org/1999/xhtml namespace. Are they or does the http://www.w3.org/2002/06/xhtml2 get elements with the same local names and identical semantics? Things That Weren't There I've observed that the elements available in HTML and XHTML 1.x are structures that tend to appear in technical articles but that (X)HTML lack named elements for many structures that appear on Web pages. Many Web pages include some kind of footer after the main content. The footer tends to contain the address of the author, a copyright notice, the date of update, a couple of works about the author and things like that. In HTML, one could write: <hr> <div class="footer"> <p>There author will be on vacation next week, so there won't be a new column next week. Last updated: 2002-08-17.</p> </div> The use of footers is so common that I think footers would deserve an element of their own: <!-- no hr needed --> <footer> <p>There author will be on vacation next week, so there won't be a new column next week. Last updated: 2002-08-17.</p> </footer> Another thing that I've noticed is that (X)HTML doesn't provide any semantic markup for indicating which part of the page are main content and which parts are navigation. Usually news sites and the like have a lot of navigation alongside the main content. When using handheld user agents or tty user agents, it may be difficult to scroll around. I think it would be could be useful for these browsing situations as well as for styling to provide semantic markup for designating something as being part of the main content and something else as being part of navigation. This would allow easy switching between the main content and the navigation parts in handheld ad tty clients. Also, providing a common way of marking up these thing would make it easier to write user style sheets that applied user preferences to the main content while leaving the navigation the way the author had suggested. -- Henri Sivonen hsivonen@iki.fi http://www.hut.fi/u/hsivonen/
Received on Monday, 19 August 2002 12:28:00 UTC