- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 19 Aug 2002 19:27:20 +0300
- To: www-html-editor@w3.org
- Cc: www-html@w3.org
The Namespace
The Draft specifies a new namespace. I think it would be nice to have
some rationale for this design decision in the spec. Defining a new
namespace certainly makes it easier for implementors of XHTML 2.0
user agents to resist requests to support legacy elements. However,
for user agents that support both XHTML 1.x and 2.0, it would be
easier to keep the same namespace and rename the elements whose
semantics have changed.
External Entities and User Agent Performance
The Draft makes the inclusion of a doctype declaration in the
document a "must" requirement and requires the PUBLIC id (if present)
to reference a particular DTD. (However, the spec doesn't exactly
require a lone SYSTEM id to reference an equivalent DTD resource, but
I assume it is the intent.) I think this may cause performance
problems to user agent that are able to parse DTDs if the namespace
attributes are going to be handled the same way they are handled in
XHTML 1.1.
In XHTML 1.1 each element has the xmlns attribute and the attribute
has a #FIXED value. If each and every element in the document
instance doesn't have the attribute explicitly, it will be provided
via attribute defaulting if the external subset is processed.
However, if the external subset is not processed, the attribute
values won't be there. From the point of view of the Namespaces in
XML spec, the result is the same, but from the point of view of XML
1.0 validity constraints it makes a difference: if the documents was
declared standalone difference in attribute defaulting when
processing the external subset and when not would be a violation of a
validity constraint.
As a result, if one wanted to construct a valid standalone document
when the DTD has xmlns attribute defaults for each element, one would
have to repeat the xmlns attribute on each element, which is
obviously impractical. However, being able to declare documents
standalone would be useful in terms of user agent performance.
Fetching a DTD with all the modules or even loading and parsing them
from a local catalog causes a noticeable performance hit for user
agents. (This can be seen when comparing DocZilla and Mozilla for
example. DocZilla actually parses the DTD but Mozilla doesn't.)
Therefore, it would make sense indicate to user agents that they may
safely leave the external subset unparsed and that the external
subset is only referenced for the purpose of validation.
If the XHTML 2.0 spec requires the doctype declaration and uses
attribute defaults similar to those in XHTML 1.1, it will be
impractical to make valid standalone documents.
Content-Type
The Draft doesn't say which media type should be used for labeling
the document instances when the transport used media type labeling. I
think making the media type labeling clear early on is crucial for
interoperability of implementations. It would also be good to include
an explicit "must not" against sending XHTML 2.0 as text/html.
Entity references
The definition of entity reference implies that the DTD will declare
character entities in addition to the predefined ones. I think doing
so is unnecessary since XML allow the use of any Unicode character
directly as UTF-8. Character entities move the responsibility of
being able to deal with character aliases to the rendering end even
though it is more of an input issue which should be dealt with at the
data entry time. If someone really has to use an ASCII editor instead
of a proper Unicode editor, the NCRs are there. On the other hand,
allowing character entities makes it necessary to parse the external
subset and that would complicate lightweight user agents with
non-validating parsers unnecessarily.
Classes
The class example is ill-formed. <p class="note">...</p> would look
better than using <span>.
accesskey
The Draft says 'Apple systems, one generally has to press the "cmd"
key in addition to the access key.' The command key is the
accelerator key for the keyboard shortcuts of the browser's own
functions on Mac, so another modifier key (eg. ctrl) would have to be
used in order to avoid conflicts.
Deprecated Elements (like br)
Since XHTML 2.0 defines a new namespace, there are no pre-existing
elements in the namespace. The deprecated elements are effectively
*created as deprecated*. I think creating elements as deprecated
doesn't make sense, since that would mean creating a burdensome
legacy where none would otherwise exist. On the other hand, if it is
considered necessary to keep the deprecated elements, I think it
would make more sense to keep them in the XHTML 1.x namespace.
Headings
I like the <section> and <h> arrangement a lot. However, I think
including <h1> through <h6> unnecessarily complicates things. I'd
like to suggest including only one way of marking up headings (the
<h> and <section> way) instead of including two incompatible ways.
Quote
I think dropping the <q> element in favor of <quote> is a very good
thing. In practice, generating context-sensitive quotation marks in
the user agent is really hard to get right.
The Draft says: "Visual user agents are not required to add
delimiting quotation marks - -". I think it would be better to make
the statement stronger in order to avoid cases where both the author
and the user agent add quotation marks. I suggest substituting "are
not required to" with "must not".
Anchors
Like many others, I was surprised to find that the Draft uses a
linking method of its own instead of simple XLink. Isn't simple
XLinks supposed to be used in new specs exactly in the cases like <a
href="...">?
The Edit Module
"This element is unusual for XHTML in that they may serve as either
block-level or inline elements (but not both)." I think the dual
nature of ins and del is quite undesirable. The rule "The del element
must not contain block-level content when it is behaving as an inline
element." can't be enforced in validation. Also, if an element has a
dual inline/block nature, it is more difficult to handle the
presentation of the element in a user agent style sheet. I think it
would be more straight forward to have separate elements for block
and inline deletions and insertions (just like div and span are
separate).
Referencing Style Sheets
The definition of <link/> includes the old HTML 4 style sheet
linking. Since there is a general processing instruction for
associating style sheets with XML documents, I think requiring user
agents to support <link/> style sheets is unnecessary.
The Metainformation Module
The Draft says that http-equiv exist for the purpose of HTTP servers
gathering information for HTTP headers. I could be mistaken, but when
it comes to HTML and XHTML 1.x there don't seem to be actual servers
implementing this feature. I think the unimplementedness suggests
that the feature has failed in practice and could be removed from
XHTML 2.0.
There are browsers that pay attention to HTML http-equiv, and the
http-equiv in HTML is routinely used for three purposes other than
information gathering on the server side:
1) Trying to specify the charset parameter of the Content-Type
header. I think this should not be supported in XHTML 2.0, since
supporting this feature requires scanning the incoming data buffer
before parsing since the information about the character set that is
supposedly in an attribute can't be found in attribute parsing,
because the information is needed before parsing. Also, since the
character encoding issue is deal with at the XML level, it would be
harmful to add another and less elegant way of specifying the
character encoding.
2) Trying to make a "redirect" ("meta refresh") without knowing about
real HTTP redirects.
3) Trying to manipulate cache behavior. Compared to proper HTTP
headers this approach is harmful, because HTTP caches won't see the
pseudo-HTTP header that the author thinks have some meaning to
caching systems.
Then recently some authors have thought that including the tag <meta
http-equiv="Content-Type" content="application/xhtml+xml" /> would do
something good when the real HTTP header says text/html.
I think it would be appropriate to drop the http-equiv attribute. Or
if is kept in XHTML 2.0, I think it would be good to include some
notice that authors mustn't expect http-equiv attibutes to have any
useful effect unless their server actually gathers information from
the http-equiv attributes.
The Scripting Module
There's an example with document.write(). The way document.write()
works in text/html user agents and is used is very tag soupish.
Parsing the markup is suspended and a script prints strings to the
tag soup parser input stream.
The XML parser is usually developed separately from the application
using it. This is a good thing, since it allows the development of
robust and reusable XML parser. It also makes implementing something
like document.write() harder, which I think is a good thing, too.
Implementing document.write() would likely require tampering with the
separately developed XML parser or would require the use of a
separate pseudo-XML parser in addition to the real parser so that the
application could combine the element trees coming from the
pseudo-XML parser with the main tree coming from the real parser.
I'd like to suggest disallowing the use of document.write() with
XHTML 2.0 and with XML-based languages in general. This would
simplify the implementation of user agents in other ways as well:
When there is no document.write() there is no need to allow script
elements to occur as descendants of the body element and there is no
need to begin the execution of scripts before the entire document has
been parsed and the corresponding DOM tree fully created.
Also, the script element has an attribute called charset for
indicating the character encoding of an external script. I can't find
a good description of the attribute. It seems to me that an author
could use such an attribute for two purposes: to try to override the
charset parameter of the Content-Type header of the script or to let
user agent make a decision about not loading scripts whose characters
are encoded in an unsupported way. I think that in the former case
the author should be encouraged to get the real HTTP charset of the
script itself right. As for the latter case, I'm inclined to think
that the usefulness of the attribute would be minimal, because
programming languages tend the be representable in common encodings.
Ruby
The Draft references Ruby. The Ruby spec doesn't say clearly what the
proper namespace URL for the Ruby elements is, but in XHTML 1.1 the
Ruby elements seem to be in the http://www.w3.org/1999/xhtml
namespace. Since the module is unchanged in XHTML 2.0, it would be
reasonable to assume that the elements are still in the
http://www.w3.org/1999/xhtml namespace. Are they or does the
http://www.w3.org/2002/06/xhtml2 get elements with the same local
names and identical semantics?
Things That Weren't There
I've observed that the elements available in HTML and XHTML 1.x are
structures that tend to appear in technical articles but that (X)HTML
lack named elements for many structures that appear on Web pages.
Many Web pages include some kind of footer after the main content.
The footer tends to contain the address of the author, a copyright
notice, the date of update, a couple of works about the author and
things like that. In HTML, one could write:
<hr>
<div class="footer">
<p>There author will be on vacation next week, so there won't be a
new column next week. Last updated: 2002-08-17.</p>
</div>
The use of footers is so common that I think footers would deserve an
element of their own:
<!-- no hr needed -->
<footer>
<p>There author will be on vacation next week, so there won't be a
new column next week. Last updated: 2002-08-17.</p>
</footer>
Another thing that I've noticed is that (X)HTML doesn't provide any
semantic markup for indicating which part of the page are main
content and which parts are navigation. Usually news sites and the
like have a lot of navigation alongside the main content. When using
handheld user agents or tty user agents, it may be difficult to
scroll around. I think it would be could be useful for these browsing
situations as well as for styling to provide semantic markup for
designating something as being part of the main content and something
else as being part of navigation.
This would allow easy switching between the main content and the
navigation parts in handheld ad tty clients. Also, providing a common
way of marking up these thing would make it easier to write user
style sheets that applied user preferences to the main content while
leaving the navigation the way the author had suggested.
--
Henri Sivonen
hsivonen@iki.fi
http://www.hut.fi/u/hsivonen/
Received on Monday, 19 August 2002 12:28:00 UTC