Some comments on the current draft

I would like to make some general, hopefully constructive, criticisms of
the current draft for XHTML 2.0 [1]. It should go without saying that
the following is all IMHO.

What's it for?

What is XHTML 2.0 for? The draft's introduction says:

> HTML 2 is a general purpose markup language designed for representing
> documents for a wide range of purposes across the World Wide Web. To
> this end it does not attempt to be all things to all people, supplying
> every possible markup idiom, but to supply a generally useful set of
> elements, with the possibility of extension using the class and role
> attributes on the span and div elements in combination with style
> sheets, and attributes from the metadata attributes collection.

This is a bit vague.

One of the nice things about the original HTML 1.0 draft is that it gave
some examples of how it could be used [1]:

> * Hypertext news, mail, online documentation, and
    collaborative hypermedia;
> * Menus of options;
> * Database query results;
> * Simple structured documents with inlined graphics.
> * Hypertext views of existing bodies of information 

With the proliferation of markup languages that we have today, I'd like
to see such a list make a return in XHTML 2.0. What's more, I would very
much appreciate a list of web content where XHTML 2.0 would *not* be
suitable markup. For instance, is XHTML 2.0 appropriate for marking up
blog posts, news articles, academic papers, critical editions of texts,
and web applications?

Default representation

Why aren't there suggested stylesheets for non-visual media too?

Can we require all user agents to distinguish somehow between *all*
structural and semantic components by default -- unless otherwise
specified by user preference or CSS? If so, would it be useful to 
establish two class of components. With the first class (e.g. links), user agents
must make the semantics trivially accessible (e.g. blue and underline).
With the second class, user agents may make the semantics slightly
more difficult to access (e.g. the user must right-click and select
properties) if this improves the overall user experience.

Confusing duplication of function

Too many components seem to be doing the same thing:

1. <h1>, <h2>, <h3>, <h4>, <h5>, <h6> and <h> (possibly also 
   <label>, <caption>, and <th>)

2. The so-called "structural" distinction between block and inline
   elements leads to the bifurcation of q/blockquote and code/blockcode
   (and discussions about a possible address/blockaddr). I've not yet
   read anything that persuades me that block and inline are not
   presentational qualities, as the ability to style elements block or
   inline with the CSS display property suggests.

3. I cannot understand the distinction the draft draws between <em>
   ("indicates emphasis for its contents") and <strong> ("indicates
   higher importance for its contents than that of the surrounding
   content"). The example [2] doesn't help me: "On
   <strong>Monday</strong> please put the rubbish out, but <em>not</em>
   before nightfall!"

4. Three forms of inclusion: the embedding attributes (src and srctype);
   img; and object.

Take courage

Given XHTML 2.0 will not actually be backwards compatible, is there any
evidence that including unnecessary elements like <img> will "ease the
transition to XHTML2" [3] rather than hinder it by making XHTML 2.0 more

There is a role for a document explaining how existing (X)HTML
techniques map to XHTML 2.0 techniques, along the lines of "XForms for
HTML authors" [4]. Such a document could explain that whereas in HTML
you might use <IMG> to mark up images, in XHTML 2.0 you would always
use <object>.

It would be worth working on such a document alongside the spec itself
(much as the accessibility WG drafts techniques at the same time as
drafting guidelines). This should help ensure that no useful (X)HTML
features are lost in the transition to XHTML 2.0.

But in general XHTML 2.0 *must* make sense to its author-base on its own
terms; it should *not* rely on legacy HTML elements to make itself

Similarly, I think it's deeply confusing to include elements/attributes
but then discourage their use, as with the style attribute [5]:

> use of the style attribute is strongly discouraged in favor of the
> style element and external style sheets. In addition, content
> developers are advised to avoid use of the style attribute on content
> intended for use on small devices, since those devices may not support
> the use of in-line styles.

When asked why Ruby on Rails is so popular, one of the reasons
identified by creator David Heinemeier Hansson was that [6]:

> Rails is opinionated software. It eschews placing the old ideals of
> software in a primary position. One of those ideals is flexibility—the
> notion that we should try to accommodate as many approaches as
> possible, that we shouldn't pass judgement on one form of development
> over another. Well, Rails does, and I believe that's why it works.

Advocates of the separation of content and style have had to argue from
a position of insurmountable weakness because of the presentational
elements and attributes included in all previous forms of (X)HTML. The
*only* sure way to stop developers using such elements and attributes is
to not include them in the standard in the first place.

XHTML 2.0 should be an opinionated specification.

But what does it all mean?

Reading the draft and through the mailing list, there seem to be five
ideas about how meaning can be conveyed in XHTML:

1. Language/symbols/punctuation used in context
2. XHTML elements
3. XML elements from other namespaces (e.g. from MathML, SVG, etc.)
   in XHTML+whatever documents
4. XHTML role attribute
5. class attribute
6. meta element

Does anyone else find this a tad confusing? As far as I can tell, the
dominant idea (though not necessarily the consensus?) is to move the
emphasis away from semantic elements to semantic roles.

I've seen two justifications for this departure. First, one element
might play more than one "role". This seems a little circular to me. Of
course, if you try and carve meaning up into "roles", bits of a document
will have more than one role. However, if you describe different
elements in terms of their roles, than that multiplicity of roles can be
implied by the element name just as easily as it can be stated
explicitly by the role attribute.  Wouldn't it be more efficient to
define roles for elements in a given namespace, and have an XHTML
document reference that definition for its elements?

Second, there is a fear of creating a markup language with too many
elements. For instance, Laurens Holst wrote [7]:

> There is a basic set of elements in the language to add semantics, 
> however a line has to be drawn somewhere, otherwise you’ll end up with a 
> docbook-kind of specification and the introduction of <irony> elements.

The irony of this claim is that because XHTML 2.0 is supposed to make
use of other XML markup languages where possible, authors of
XHTML+whatever documents can make use of more elements than DocBook
authors. The XHTML 2 draft currently has 89 elements (including the
XForms Module), only a few less than HTML 4.01's 91 elements [8].
According to the WG's charter [9], the design goal of XHTML 2 "is to use
generic XML technologies as much as possible", apparently meaning "W3C's
work on areas such as math, scalable vector graphics, synchronized
multimedia, voice browsing and forms". In addition to XHTML 2.0's own 89
elements, SVG 1.1 has 81 elements, SMIL 2.1 has 37 elements, VoiceXML
2.0 has 43 elements, and MathML 2 has 301 elements [10-13], which makes a
grand total of 551 elements! That's far more than DocBook's 417 elements
[14], and quite a lot more than TEI P5's current 535 [15].

It wouldn't be *entirely* true to say that this is an unfair comparison
given that XHTML+whatever allows you to mark up more than DocBook or TEI
-- since both DocBook and TEI allow you to mark up things that you can't
with XHTML+whatever, e.g. DocBook's <ProductName> or TEI's <soCalled>.

The desire to avoid complexity might be a good argument for (say)
splitting XHTML into a basic and advanced version (as with the
Simplified DocBook that has only 116 elements [16]), or breaking up
complexity into modules (as with XForms). But it's not a great argument
for relegating semantics to roles that are not defined in a single
specification or repository. That just hides the complexity.

What criteria decide which semantics make the cut as elements? If the
idea is that all newly proposed semantics should be roles not elements,
then why shouldn't the same go for <em> or <samp>? Why do some semantics
(mathematics, graphics) deserve their own modules/markup languages,
while others (e.g.  sophisticated text markup) must make do with roles?
Why can't we dump the current text module, and create one Text Module
with handy elements for marking up text and create another module
specifically for software documentation (which seems to be the use-case
for <samp> and friends)?

More importantly, shouldn't the meaning implied by web markup be a web
standard as far as is possible, and shouldn't all web browsers that
support XHTML 2.0 be able to communicate the meaning implied in XHTML
2.0 documents? In Steven Pemberton's XTech talk [17], the Chair says:

> In fact, anyone can add their own role values, so that whole
> communities can agree on new semantics to overlay on to the content.
> In fact this is exactly what microformats are about.

Microformats and roles are fine for machine processing, so long as
documents are required to declare which microformat/role profiles they
are using.

But what about web browsers? When creating a microformat/role profile,
is there a way of declaring a default audio/visual/tangible
representation of such semantics that will work even when the user has
CSS disabled? Is such a declaration required to create a conformant
profile? Are browsers required to download such declarations and apply
them? If so, shouldn't it an *absolute requirement* (not just "best
practice") for "the URI associated with" a role's namespace to "resolve
to a resource that allows for the discovery of the definition of the
roles in the namespace" [18]?

At the moment, it seems that the unrecognized roles might be treated the
haphazard way current UAs treat the title attribute [19]. But if I were
to create a "socalled" role, I surely wouldn't want to be dependent on
the user happening to choose to right-click and read the properties for
the text in question (I certainly wouldn't regard such indication of
semantics as particularly accessible). At the very least elements with
unrecognized roles need some sort of signal that they have such roles,
just as links are distinguished from normal text.

I hope that, if nothing else, this indicates some areas where
communication about how this is going to work needs to be clearer. I
*could* have said lots of nice things about the draft (such as its
willingness to countenance more than six levels of headings, its
naturalized paragraph elements, its simpler markup for navigation lists,
its general dedication to the excusion of presentational elements, its
use of XForms, etc.), but I think identifying points that may still need
a bit of work may be more helpful. :)





















Benjamin Hawkes-Lewis

Received on Saturday, 23 September 2006 14:55:27 UTC