- From: Benjamin Hawkes-Lewis <benjaminhawkeslewis@hotmail.com>
- Date: Sat, 23 Sep 2006 15:55:15 +0100
- To: www-html@w3.org
I would like to make some general, hopefully constructive, criticisms of the current draft for XHTML 2.0 [1]. It should go without saying that the following is all IMHO. What's it for? ============== What is XHTML 2.0 for? The draft's introduction says: > HTML 2 is a general purpose markup language designed for representing > documents for a wide range of purposes across the World Wide Web. To > this end it does not attempt to be all things to all people, supplying > every possible markup idiom, but to supply a generally useful set of > elements, with the possibility of extension using the class and role > attributes on the span and div elements in combination with style > sheets, and attributes from the metadata attributes collection. This is a bit vague. One of the nice things about the original HTML 1.0 draft is that it gave some examples of how it could be used [1]: > * Hypertext news, mail, online documentation, and collaborative hypermedia; > * Menus of options; > * Database query results; > * Simple structured documents with inlined graphics. > * Hypertext views of existing bodies of information With the proliferation of markup languages that we have today, I'd like to see such a list make a return in XHTML 2.0. What's more, I would very much appreciate a list of web content where XHTML 2.0 would *not* be suitable markup. For instance, is XHTML 2.0 appropriate for marking up blog posts, news articles, academic papers, critical editions of texts, and web applications? Default representation ====================== Why aren't there suggested stylesheets for non-visual media too? Can we require all user agents to distinguish somehow between *all* structural and semantic components by default -- unless otherwise specified by user preference or CSS? If so, would it be useful to establish two class of components. With the first class (e.g. links), user agents must make the semantics trivially accessible (e.g. blue and underline). With the second class, user agents may make the semantics slightly more difficult to access (e.g. the user must right-click and select properties) if this improves the overall user experience. Confusing duplication of function ================================= Too many components seem to be doing the same thing: 1. <h1>, <h2>, <h3>, <h4>, <h5>, <h6> and <h> (possibly also <label>, <caption>, and <th>) 2. The so-called "structural" distinction between block and inline elements leads to the bifurcation of q/blockquote and code/blockcode (and discussions about a possible address/blockaddr). I've not yet read anything that persuades me that block and inline are not presentational qualities, as the ability to style elements block or inline with the CSS display property suggests. 3. I cannot understand the distinction the draft draws between <em> ("indicates emphasis for its contents") and <strong> ("indicates higher importance for its contents than that of the surrounding content"). The example [2] doesn't help me: "On <strong>Monday</strong> please put the rubbish out, but <em>not</em> before nightfall!" 4. Three forms of inclusion: the embedding attributes (src and srctype); img; and object. Take courage ============ Given XHTML 2.0 will not actually be backwards compatible, is there any evidence that including unnecessary elements like <img> will "ease the transition to XHTML2" [3] rather than hinder it by making XHTML 2.0 more complicated? There is a role for a document explaining how existing (X)HTML techniques map to XHTML 2.0 techniques, along the lines of "XForms for HTML authors" [4]. Such a document could explain that whereas in HTML you might use <IMG> to mark up images, in XHTML 2.0 you would always use <object>. It would be worth working on such a document alongside the spec itself (much as the accessibility WG drafts techniques at the same time as drafting guidelines). This should help ensure that no useful (X)HTML features are lost in the transition to XHTML 2.0. But in general XHTML 2.0 *must* make sense to its author-base on its own terms; it should *not* rely on legacy HTML elements to make itself comprehensible. Similarly, I think it's deeply confusing to include elements/attributes but then discourage their use, as with the style attribute [5]: > use of the style attribute is strongly discouraged in favor of the > style element and external style sheets. In addition, content > developers are advised to avoid use of the style attribute on content > intended for use on small devices, since those devices may not support > the use of in-line styles. When asked why Ruby on Rails is so popular, one of the reasons identified by creator David Heinemeier Hansson was that [6]: > Rails is opinionated software. It eschews placing the old ideals of > software in a primary position. One of those ideals is flexibility—the > notion that we should try to accommodate as many approaches as > possible, that we shouldn't pass judgement on one form of development > over another. Well, Rails does, and I believe that's why it works. Advocates of the separation of content and style have had to argue from a position of insurmountable weakness because of the presentational elements and attributes included in all previous forms of (X)HTML. The *only* sure way to stop developers using such elements and attributes is to not include them in the standard in the first place. XHTML 2.0 should be an opinionated specification. But what does it all mean? ========================== Reading the draft and through the mailing list, there seem to be five ideas about how meaning can be conveyed in XHTML: 1. Language/symbols/punctuation used in context 2. XHTML elements 3. XML elements from other namespaces (e.g. from MathML, SVG, etc.) in XHTML+whatever documents 4. XHTML role attribute 5. class attribute 6. meta element Does anyone else find this a tad confusing? As far as I can tell, the dominant idea (though not necessarily the consensus?) is to move the emphasis away from semantic elements to semantic roles. I've seen two justifications for this departure. First, one element might play more than one "role". This seems a little circular to me. Of course, if you try and carve meaning up into "roles", bits of a document will have more than one role. However, if you describe different elements in terms of their roles, than that multiplicity of roles can be implied by the element name just as easily as it can be stated explicitly by the role attribute. Wouldn't it be more efficient to define roles for elements in a given namespace, and have an XHTML document reference that definition for its elements? Second, there is a fear of creating a markup language with too many elements. For instance, Laurens Holst wrote [7]: > There is a basic set of elements in the language to add semantics, > however a line has to be drawn somewhere, otherwise you’ll end up with a > docbook-kind of specification and the introduction of <irony> elements. The irony of this claim is that because XHTML 2.0 is supposed to make use of other XML markup languages where possible, authors of XHTML+whatever documents can make use of more elements than DocBook authors. The XHTML 2 draft currently has 89 elements (including the XForms Module), only a few less than HTML 4.01's 91 elements [8]. According to the WG's charter [9], the design goal of XHTML 2 "is to use generic XML technologies as much as possible", apparently meaning "W3C's work on areas such as math, scalable vector graphics, synchronized multimedia, voice browsing and forms". In addition to XHTML 2.0's own 89 elements, SVG 1.1 has 81 elements, SMIL 2.1 has 37 elements, VoiceXML 2.0 has 43 elements, and MathML 2 has 301 elements [10-13], which makes a grand total of 551 elements! That's far more than DocBook's 417 elements [14], and quite a lot more than TEI P5's current 535 [15]. It wouldn't be *entirely* true to say that this is an unfair comparison given that XHTML+whatever allows you to mark up more than DocBook or TEI -- since both DocBook and TEI allow you to mark up things that you can't with XHTML+whatever, e.g. DocBook's <ProductName> or TEI's <soCalled>. The desire to avoid complexity might be a good argument for (say) splitting XHTML into a basic and advanced version (as with the Simplified DocBook that has only 116 elements [16]), or breaking up complexity into modules (as with XForms). But it's not a great argument for relegating semantics to roles that are not defined in a single specification or repository. That just hides the complexity. What criteria decide which semantics make the cut as elements? If the idea is that all newly proposed semantics should be roles not elements, then why shouldn't the same go for <em> or <samp>? Why do some semantics (mathematics, graphics) deserve their own modules/markup languages, while others (e.g. sophisticated text markup) must make do with roles? Why can't we dump the current text module, and create one Text Module with handy elements for marking up text and create another module specifically for software documentation (which seems to be the use-case for <samp> and friends)? More importantly, shouldn't the meaning implied by web markup be a web standard as far as is possible, and shouldn't all web browsers that support XHTML 2.0 be able to communicate the meaning implied in XHTML 2.0 documents? In Steven Pemberton's XTech talk [17], the Chair says: > In fact, anyone can add their own role values, so that whole > communities can agree on new semantics to overlay on to the content. > In fact this is exactly what microformats are about. Microformats and roles are fine for machine processing, so long as documents are required to declare which microformat/role profiles they are using. But what about web browsers? When creating a microformat/role profile, is there a way of declaring a default audio/visual/tangible representation of such semantics that will work even when the user has CSS disabled? Is such a declaration required to create a conformant profile? Are browsers required to download such declarations and apply them? If so, shouldn't it an *absolute requirement* (not just "best practice") for "the URI associated with" a role's namespace to "resolve to a resource that allows for the discovery of the definition of the roles in the namespace" [18]? At the moment, it seems that the unrecognized roles might be treated the haphazard way current UAs treat the title attribute [19]. But if I were to create a "socalled" role, I surely wouldn't want to be dependent on the user happening to choose to right-click and read the properties for the text in question (I certainly wouldn't regard such indication of semantics as particularly accessible). At the very least elements with unrecognized roles need some sort of signal that they have such roles, just as links are distinguished from normal text. I hope that, if nothing else, this indicates some areas where communication about how this is going to work needs to be clearer. I *could* have said lots of nice things about the draft (such as its willingness to countenance more than six levels of headings, its naturalized paragraph elements, its simpler markup for navigation lists, its general dedication to the excusion of presentational elements, its use of XForms, etc.), but I think identifying points that may still need a bit of work may be more helpful. :) References ========== [1] http://www.w3.org/TR/2006/WD-xhtml2-20060726 [2] http://www.w3.org/TR/2006/WD-xhtml2-20060726/mod-text.html#edef_text_strong [3] http://www.w3.org/TR/2006/WD-xhtml2-20060726/mod-image.html#sec_20.1. [4] http://www.w3.org/MarkUp/Forms/2003/xforms-for-html-authors.html [5] http://www.w3.org/TR/2006/WD-xhtml2-20060726/mod-styleAttribute.html#s_styleAttributemodule [6] http://www.oreillynet.com/pub/a/network/2005/08/30/ruby-rails-david-heinemeier-hansson.html [7] http://lists.w3.org/Archives/Public/www-html/2005May/0158 [8] http://www.w3.org/TR/REC-html40/index/elements.html [9] http://www.w3.org/2002/05/html/charter [10] http://www.w3.org/TR/SVG11/eltindex.html [11] http://www.w3.org/TR/2005/REC-SMIL2-20051213/elements.html [12] http://www.w3.org/TR/voicexml20/#dml1.4 [13] http://www.w3.org/TR/MathML2/appendixl.html#index.elem [14] http://www.docbook.org/tdg/en/html/part2.html [15] http://www.tei-c.org/release/doc/tei-p5-doc/html/REFTAG.html [16] http://www.oasis-open.org/docbook/xml/simple/sdocbook/elements.html [17] http://www.w3.org/2005/Talks/05-steven-xtech/ [18] http://www.w3.org/TR/2006/WD-xhtml-role-20060725/ [19] http://lists.w3.org/Archives/Public/www-html/2006Aug/0167 -- Benjamin Hawkes-Lewis
Received on Saturday, 23 September 2006 14:55:27 UTC