Comments on 27 May 2005 Working Draft of XHTML 2

This document contains requests for clarifications, proposals, comments relating to accessibility, editorial and other comments on the 27 May 2005 Working Draft of XHTML 2. It reflects only the view of its author, and does not necessarily represent the view of any organisations or projects in which he is active.

The comments, requests, proposals etcetera are phrased as comments etcetera to the HTML Working Group. The @@todo notation is only used to indicate reminders to the author of this document.

0. General Comments

The list of terms and definitions rightly distinguishes between element, element type and generic identifier (or element type name). However, this distinction is not observed in other sections of the specification. Chapters 7-29 use headings such as The html element, The col and colgroup elements, etcetera. For consistency, they should read “The html element type”, etcetera. Also, a phrase like The pre element indicates that whitespace in the enclosed text has semantic relevance really means: “(The occurrence of) an instance of a pre element type …”. So all these phrases should be reworded as: “A p element represents a paragraph.”, “A pre element indicates …”, etcetera. Phrases like The element and attributes defined by this module are … should be reworded as “The element and attribute types defined by this module are …”.

The HTML 4.01 specification listed the attributes for each element in the section where that element was discussed and that made the specification a more usable reference than the current XHTML 2.0 draft.

Many element and attribute type names (p, pre, abbr, img, src, href, col, td, …) are abbreviations and should be marked up accordingly.

Some basic markup for scientific equations and formulae would not be amiss. Learning MathML is too much trouble for people who want to include simple mathematical or chemical expressions in XHTML documents.

1.Introduction

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/introduction.html.

Regarding major differences with XHTML 1, the introduction says: the definitions of whitespace are given by XML for input, and CSS for output. Please clarify input and output. It appears that the editors of the HTML and XHTML Frequently Answered Questions and some people on the HTML IG mailing list misinterpret xml:space, as has been pointed out by Björn Höhrmann (14 July 2004) and others (1 August 2005). In contrast with what the HTML and XHTML Frequently Answered Questions says, xml:space does not only control if whitespace will be present in the DOM. The XML specification says: A special attribute named xml:space MAY be attached to an element to signal an intention that in that element, white space should be preserved by applications (emphasis added). Refer to the definitions in the introducion of the XML specification:

[Definition: A software module called an XML processor is used to read XML documents and provide access to their content and structure.] [Definition: It is assumed that an XML processor is doing its work on behalf of another module, called the application.] This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application.

In the case of XHTML 2, the application is the user agent or browser; the DOM is an API for XML processors. If xml:space="preserve" is defined on an element, it is the browser, and not just the DOM, that should preserve whitespace. If a browser collapsed whitespace in spite of xml:space="preserve" before looking at the CSS, then the CSS rule white-space:pre could have no effect. A browser should only collapse whitespace if a stylesheet (author-defined, user-defined or the browser's default stylesheet/white-space handling) defines this kind of whitespace handling. However, according to the XML specification, xml:space="preserve" means that applications should preserve whitespace: the specification uses a lowercase should instead of an RFC2119 “MUST”, and it therefore allows applications to use their default whitespace handling (which really means: override xml:space="preserve").

2. Terms and Definitions

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/terms.html.

The definition of deprecated says: a feature marked as deprecated is in the process of being removed from this recommendation. In HTML 4, deprecated meant that a feature was outdated by newer constructs and that it might become obsolete in future versions of HTML. Saying “this recommendation” may give the impression that a feature may be removed in future drafts of this version (XHTML 2) — as opposed to future versions of XHTML (XHTML 3?). Please clarify if this difference is intentional and make the wording unambiguous.

Editorial comment: provide each definition with an ID, so that each can be referenced from other chapters of the specification.

3. Conformance Definition

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/conformance.html.

The dark orange background of some lists and paragraphs provides poor contrast with the text. Please choose another colour. Contact WCAG WG if in doubt.

The namespace URI for XHTML 2.0 is defined to be http://www.w3.org/2002/06/xhtml2/. Is this URI definitive or will it be adapted to the publication date of the final recommendation? I find the latter option preferable. A URI that does not reflect the publication date of the recommendation will provoke comments, even though namespace URIs are “just names”. (The namespace URI is also mentioned in the chapter on the XHTML Document Module.)

Example of an XHTML 2.0 document: at the end of the example code, there is a closing </pre> tag after the </html> end tag; the </pre> tag should be removed.

Change the heading XHTML Family User Agent Conformance to “XHTML 2 Family User Agent Conformance”.

Criterion 1: please add the requirement that user agents should inform the user when a document is not well-formed or valid, or that this information should at least be available to the user upon request.

Criterion 4 (If a user agent encounters an element it does not recognize, it must continue to process the content of that element.) and criterion 5 (If a user agent encounters an attribute it does not recognize, it must ignore the entire attribute specification (i.e., the attribute and its value).). However, documents must conform to the schemas for XHTML 2. Does this mean that documents must be valid but that the user agent's XML processor is not required to validate documents? Please make this explicit in the specification.
What should a user agent do with attrributes such as id, class, xml:space and xml:lang if they appear in an element that it does not recognise? What should a user agent do with any attrributes that appear in an element that it does not recognise? Should elements and attributes that are not recognised become nodes in the DOM or not? Does it make a difference whether these elements and attributes are defined in XHTML 2 or not? Please clarify these issues in the specification.

Criterion 8 (White space must be handled according to the rules of [XML]. All XHTML 2 elements preserve whitespace.). Please clarify how this is defined in the schema(s). Please apply xml:space="preserve" only to elements where all whitespace is significant (e.g. pre). If xml:space="preserve" is defined on every element, it will be necessary for every stylesheet to define html {white-space: normal;} and then override this again for the elements where whitespace should be preserved (see the comment on whitespace in the introduction). If you find that you really must preserve whitespace in all elements, please don't use <!ATTLIST pre xml:space (preserve) #FIXED 'preserve'> (or any of it's equivalents in other schema languages) but <!ATTLIST pre xml:space (default|preserve) 'preserve'>.

Please add a criterion that requires a user agent to inform the user when it encounters a document in a HTML/XHTML format it does not know. (Jukka Korpela suggested this earlier in his critical review of the HTML 4.0 draft.)

Please add what media type or types is/are required or allowed for serving XHTML 2, or add a statement that this will be clarified in an update of the W3C Note XHTML Media Types.

4. The XHTML 2.0 Document Type

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/xhtml2-doctype.html.

Issues: Identifying XHTML version in ~~ansence~~absence of DTDs.

Notes: for the present, DTD's are required for entity resolution. It is possible for an XML document to reference both a DTD — which can contain entities — and another type of schema. In fact, this is what the example in the section on conformance does. Please clarify why this is an issue?

5. Module Definition Conventions

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/abstraction.html.

The section Abstract Module Definitions defines the term “abstract module”, although it is already defined in Terms and Definitions. Please make the definitions consistent with each other. (If the definitions had IDs, you would be able to reference them.)

6. XHTML Attribute Collections

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-attribute-collections.html.

Forms: Attributes that designate provide a mechanism of repeating table rows within a form. Attributes that designate what? Or should it be “Attributes that ~~designate~~ provide a mechanism offor repeating table rows within a form”?

7. XHTML Document Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-document.html.

The html element type: the version attribute seems superfluous with xmlns. (Unless it is a way to identify document profiles such as xhtml+voice or xhtml+mathml+svg, but this is not made clear by the description.)

The title element type: for reasons of accessibility, please allow abbreviations/acronyms and language switches inside the content of this element type (instead of merely PCDATA). Language switches are also important for internationalisation. This would allow titles like

<title><abbr title="Web Content Accessibility Guidelines">WCAG</abbr> 2.0</title>

and

<title>What computers <em>can't</em> do</title>

and

<title xml:lang="fr">Introduction à <span xml:lang="en">Smalltalk-80</span></title>

and

<title xml:lang="fr">Introduction à <abbr xml:lang="en" title="Evidence-Based Medicine">EBM</abbr></title>

(The above examples are based on titles actually seen on the Web; for example the article What computers can't do tries to do this in HTML 4.01. A proposal for richer markup for abbreviated forms can be found in the comments in section 9.)

For reasons of accessibility, identifying changes in natural language is a requirement in WCAG: at level 1 in WCAG 1.0, and at level 2 in the 30 June draft of WCAG 2.0. Please provide features to support this in every element type where such changes can occur.

For reasons of accessibility, please also encourage the use of meaningful text instead of something like <title>:: CEN :: Specifications for a a complete European Web Accessibility certification scheme and a Quality Mark - WS/WAC.</title>, where the characters before and after “CEN” are meaningless, especially to a screen reader user.

The body element type: editorial comment: use active instead of passive voice where possible. Replace The content may be processed by a user agent in a variety of ways. For example by visual browsers it can be presented as text, images, colors, graphics, etc., … with “A user agent may process the content in a variety of ways. For example, a visual browser can present it as text, images, colors, graphics,etc, …”

8. XHTML Structural Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-structural.html .

The p element type: allowing blockcode, blockquote, pre and table inside paragraphs is completely unnecessary since the introduction of the section element. Please keep the content model of the p element type clean.

The pre element type: because of the existence of blockcode, xml:space, layout and the CSS property white-space, this element type is now redundant. Poems where whitespace has meaning, should also work with something like the following:

<p class="poem" layout="relevant">
          If
       I      had
   any           talent
       I      would
         be a

        poet
</p>

Since xml:space="preserve" only indicates that applications should preserve whitespace, but are not required to do so, that attribute is quite superfluous here. The above code should generate the following rendering:

          If
       I      had
   any           talent
       I      would
         be a

         poet

The separator element type: editorial comment: the second line in the second example reads <lable>Navigation</label> instead of <label>Navigation</label>.

Since HTML and XHTML are very frequently used to publish news (not only on newspaper sites but also on blogs and many other types of sites), please add a pullquote element type and other element types that are useful for publishing news.

9. XHTML Text Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-text.html.

The abbr element type: please rephrase The abbr element indicates that a text fragment is an abbreviation (e.g., W3C, XML, Inc., Ltd., Mass., etc.); this includes acronyms as “An abbr element indicates that a text fragment is an abbreviated from (e.g.,W3C, XML, Inc.,Ltd., Mass., etc.); abbreviated forms include abbreviations, acronyms, and initialisms”. Alternatively, introduce richer markup, as in the proposal for abbreviated forms below.

The requirement that the title attribute must be used on every abbr element is not a good idea, and a superficial glance at any HTML and XHTML specification is sufficient to illustrate this. Many element type names (p, abbr, a, div, …) are abbreviations and must be marked up as such! However, if each instance were marked up as an abbreviation with a title attribute, a screen reader user who has title reading turned on, would need to turn that option off in order to find out the actual element type names. Below, in my comment of the dfn element type, I provide a solution to this.

One consequence of the above is that the markup of the XHTML 2 specification itself is incorrect: instead of

<h2><a id="sec_9.1." name="sec_9.1.">9.1.</a> The <a class="edef" id="edef_text_abbr">abbr</a> element</h2>

the code for the headings should use the following pattern:

<h2><a id="sec_9.1." name="sec_9.1.">9.1.</a> The <a class="edef" id="edef_text_abbr"><code><abbr>abbr</abbr></code></a> element type</h2>

It is disappointing that support for abbreviated forms has become poorer when compared to previous versions of HTML and XHTML, especially since the late 20th century saw the emergence of more creative ways to abbreviate names and phrases, such as forms that use recursion (PHP) and forms that replace sequence of the same letters with a single letter followed by a number indicating frequency (W3C). Some abbreviated forms have a meaning that is forgotten by the majority of people (e.g. replacing IBM with International Business Machines would confuse many people), loose their original meaning (e.g. DVD apparently no longer means Digital Versatile Disk; CERN is the short name of European Organization for Nuclear Research rather than the abbreviation of Centre Européen pour la Recherche Nucléaire), and some forms have a meaning in a language that differs from what people would expect (ABS stands for Anti-Blockier-System, invented in 1980 by Bosch).

The HTML 4 specification was confusing on the subject of abbrevations and acronyms (see for example <ABBR> vs <ACRONYM> in the HTML 4 Specification by Ben Meadowcroft, HTML is not an acronym … by Craig Saila, a ABBR and ACRONYM are for user agents not for end users by Jesper Tverskov). However, removing element types is not the appropriate response. We need richer, not poorer markup. Another downside of the current proposal is that the expansion of abbreviated forms must still be provided in an attribute instead of an element, and this does not take into account that expanded forms can also contain abbreviated forms. The following element types should cover most types of abbreviated forms:

abbr for abbreviations: single words that are abbreviated; examples:
- <abbr>etc</abbr>,
- <abbr xml:lang="de"><title>beziehungsweise</title>bzw.</abbr>.
acronym for abbreviated forms that are created by taking the initial letters of a string of words (possibly not each word) and that can be read as a word; examples:
- <acronym>NASA</acronym>,
- <acronym>WAI</acronym>,
- <acronym xml:lang="fr"><title>Société Anonyme Belge d'Exploitation de la Navigation Aérienne</title>SABENA</acronym>,
- <acronym>radar</acronym>,
- <acronym>laser</acronym>,
- Web<acronym title="Accessibility in Mind">AIM</acronym>.
initialism for abbreviated forms that are created by taking the initial letters of a string of words (possibly not each word) and that cannot be read as a word; examples:
- <initialism>RNIB</initialism>,
- <initialism>WCAG</initialism>,
- <initialism lang="de" xml:lang="de"><title>Gesellschaft mit beschränkter Haftung</title>GmbH</initialism>
- <initialism><title>for example</title>e.g.</initialism> (in this case, most people would find the original expansion of this initialism — exempli gratia — confusing).
abbrform for abbreviated forms that do not fit in any of the above categories, for example because they are the result of a creative use of characters; examples:
- <abbrform>Web<acronym>AIM</acronym></abbrform>,
- <abbrform><title>Inclusive Design Curriculum Network</title><initialism>IDC</initialism><abbr>net</abbr></abbrform>,
- <abbrform><title>World Wide Web Consortium</title>W3C</abbrform>,
- <abbrform><title>European Atomic Energy Community</title><abbr>Eur</abbr><abbr>atom</abbr></abbrform>,
- <abbrform><title>octavo</title>8vo</abbrform>,
- <abbrform><title>system administrator</title><abbr>sys</abbr><abbr>admin</abbr></abbrform>,
- <abbrform><title><initialism>PHP</initialism> Hypertext Preprocessor</title><initialism>PHP</initialism></abbrform> (recursion in an initialism).

An alternative proposal is to make abbrform the container for all abbreviated forms (and only allow abbr, acronym and initialism inside abbrform) and to use a more specific element type to indicate the type of abbreviation, as in the following examples.

<abbrform><abbrform>W3C</abbrform><title>World Wide Web Consortium</title></abbrform>,
<abbrform lang="de" xml:lang="de"><initialism>GmbH</initialism><title>Gesellschaft mit beschränkter Haftung</title> </abbrform>,
<abbrform><initialism>PHP</initialism><title><initialism>PHP</initialism> Hypertext Preprocessor</title></abbrform>.

The cite element type can be used for a citation or a reference to other sources, but it should really be restricted to citations (references to cited works). If markup for speakers is necessary, please define an appropriate element type (e.g. speaker).

The dfn element type: the definition of this element type is not clear: “defining instance” could be understood to mean “definition” instead of the occurrence of the term where that term is defined (which is possibly also the first occurrence of that term); see Terms That Are Misunderstood, by Tommy Olsson. The element type name dfn is confusing because it looks like an abbreviation for “definition”; please rename it, remove it or make the definition unambiguous. Because of the way dfn is currently defined, there is no markup for definitions and no mechanism for linking definitions with “defining instances” or abbreviated forms. Therefore, I propose the following mechanism, which is inspired by HTML 4's label element.

Create a definition element type and make it contain the actual definition of a term, phrase or abbreviated from that is used elsewhere in a document (or “delivery unit”, in WCAG parlance). This would allow the following use:

An <span id="term-acronym">acronym</span> is <definition for="term-acronym">a word formed from the initial letters or groups of letters of words in a set phrase or series of words</definition>. In <initialism id="acr-HTML">HTML</initialism> (<definition for="acr-HTML" id="term-HTML">Hypertext Markup Language</definition>), acronyms are marked up with the acronym element type.(…)
Hypertext Markup Language is <definition for="term-HTML"><quote cite="http://boxmind.leeds.ac.uk/lectures/frank_van_harmelen/glossary.htm#html">the authoring language used to create documents on the World Wide Web</quote></definition>.

This approach has several advantages:

There is now an element type for definitions (which is not the case with the current draft).
It is possible to create a “definition cascade” or “definition chain” (see the HTML example above).
It is possible to provide definitions of abbreviated forms without duplicating the content of the title attribute of abbr, so the requirement that the title attribute must be used on every abbr element can be removed.

The quote element type: please explain what types of values the cite attribute can accept: only URIs?

The strong element type: please mention that strong really means strong emphasis, i.e. an even stronger emphasis than provided by the em element type.

The sup element type: this section states that Many scripts (e.g., French) require superscripts or subscripts for proper rendering. It appears that superscript is not required in the abbrevation Mlle (second example). Please provide an example where superscript is required, e.g., C^ie or n^o. In some languages, many authors use superscript where this is not correct: e.g. in Dutch, 2^de should be 2de. In French, superscript is only required if the second part of the abbreviation consists of only vowels:

Procédures d'abbréviation

a) Le mot est réduit à son début, et l'abbréviation se termine par un point.
(…)

b) Le mot est réduit à son début et à sa fin, et dans ce cas il n'y a pas de point. La fin est écrite au-dessus de la ligne dans un caractère plus petit.

M^e pour Maître; M^me pour Madame; M^lle pour Mademoiselle;
M^gr pour Monseigneur; D^r pour Docteur.
C^ie pour compagnie; n^o pour numéro.

Dans un texte manuscrit, on écrit souvent la deuxième partie de l'abbréviation sur la même ligne que la lettre initiale; Mme, Mlle, Mgr, Dr; cela est exclu lorsque la deuxième partie est constituée seulement par des voyelles: *Me, *no risqueraient d'être mal compris.

Nouvelle grammaire française. 3^e édition. Louvain-la-Neuve: Duculot, 1995. p. 41.

Please add note, endnote, footnote and/or sidenote element types to the specification. It is strange that endnotes and footnotes were not introduced earlier, especially since HTML was orignally a language for marking up and linking scientific information at CERN. (Joe Clark's blog entry on footnotes points out that footnotes within footnotes, as used by David Foster Wallace, cannot be adequately represented in HTML. Authors resort to hacks to simulate footnotes; see for example John Gruber's footnotes.)

10. XHTML Hypertext Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-hypertext.html.

The a element type: it would be a good idea to explain the href attribute type before explaining that the a type element is not strictly necessary. Whoever reads the specification from beginning to end has not seen the explanation of href when arriving at this section.

For reasons of accessibility, please emphasize that link text should be meaningful, i.e. not just “here”, “go” or the URL contained in the href attribute. For greater device independence, please add that link text should not mention the device that may be used to activate a hyperlink, as in “click here”.

I am not in favour of allowing href on every element type, please exclude div, h, hx, blockquote, blockcode, p, form, table, tbody, thead, tfoot, tr, ol, ul, dl, nl, object. If large blocks are allowed to serve as links, users may often activate a link by accident.

11. XHTML List Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-list.html.

The ol and ul element types: please specify where numbering starts for ol (e.g. at 0 or 1 for roman numerals?) and if language and locale allow for other types of numbering.

The li element type: please provide an example that uses the value attribute. For example, does the following example show what is intended?

<ol>
  <li value="0">[associated number is zero instead of 1]</li>
  <li>[associated number is 1]</li>
  <li value="5">[associated number is 5]</li>
  <li>[associated number is 6]</li>
</ol>

Please clarify how this interacts with the numbering systems defined in CSS 2's list-style-type property.

For reasons of accessibility, it would be good if user agents were allowed (through a configuration option) to use a long numbering format for nested ordered lists. This would make it easier for screen reader users to keep track of their location in a nested list, as in the following example.

<ol>
  <li>Tea
    <ol>
      <li>Darjeeling</li>
      <li>Assam
        <ol>
          <li>Dejoo Assam</li>
          <li>Gingia</li>
          <li>Tarajulie</li>
        </ol>
      </li>
      <li>Ceylon Orange Pekoe</li>
    </ol>
  </li>
  <li>Coffee</li>
</ol>

If a long numbering format were allowed, the above code example could result in the following rendering:

1. Tea
1. 1.1 Darjeeling
2. 1.2 Assam
  1. 1.2.1 Dejoo Assam
  2. 1.2.2 Gingia
  3. 1.2.3 Tarajulie
3. 1.3 Ceylon Orange Pekoe
2. Coffee

12. XHTML Core Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-core.html.

class attribute type: please clarify that this can contain a space-separated list of classes (for people who are not used to reading schemas).

layout=irrelevant*|relevant: does the asterisk after irrelevant have a special meaning (default value?) or is it a typographical error?

Is irrelevant the default for all element types, meaning that it contradicts xml:space="preserve"?

13. XHTML Hypertext Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-hyperAttributes.html.

The nextfocus attribute type: case 2.1: If the root element of the document has a nextfocus attribute, and the element referred to by the attribute is focusable, the element must receive focus. Please add “when the user requests that the user agent navigate to the next element that can receive focus”, unless you mean “when the document is loaded”.

The first rule for determining the next focus means that when (for example) a screen reader user has tabbed to the end of a document, he should automatically be taken back to the beginning. However, it seems desirable that there is some notification that the end of the document has been reached, to prevent the user from “tabbing in circles” before they notice that they're rereading the same document.

Please also add that nextfocus (which I assume replaces HTML 4.0's tabindex) should only be used if the document's normal tab order has usability issues.

I don't think tables are a good example to demonstrate nextfocus: first, WCAG 1.0 requires that tables linearize properly, then, XHTML 2.0 encourages the use of nextfocus to obscure the table structure?! Please provide an example where nextfocus actually helps the user! An alternative example follows.

<html nextfocus="search" xmlns="http://www.w3.org/2002/06/xhtml2/" xml:lang="en"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.w3.org/2002/06/xhtml2/
    http://www.w3.org/MarkUp/SCHEMA/xhtml2.xsd">
…
<ul class="navigation">
  <li>
    <a href="index.htm" id="homelink" nextfocus="sitemaplink">Home Page</a>
  </li>
  …
  <li>
    <a href="sitemap.htm" id="sitemaplink" nextfocus="content">Site Map</a>
  </li>
</ul>
…
<form method="get" name="search" action="search.jsp">
  <p><label for="search">Search: </label>
    <input nextfocus="homelink" id="search" 
      name="search" type="text" size="20" maxlength="40" />
  </p>
  <p>
    <span class="button"><input type="submit" value="Go!" /></span>
  </p>
</form>
…
<h1 id="content">Services</h1>
[Main content: description of services]

The above example first takes the user to the search form, then to the “Home Page” link, then to the “Site Map” link, and finally to the actual content. After that, focus goes to the next focusable element in document order.

14. XHTML I18N Attribute Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-i18n.html.

Regarding language inheritance: why is the HTTP Content-Language header not considered here? Why are meta elements (e.g. with Dublin Core: <meta property="dc:language">en</meta>) not considered here? If they don't count (even though the author property has been replaced with the Dublin Core creator property), please make this explicit.

Please also state what authors should do with the xml:lang attribute on the html element if there is no obvious “main language” in the page. For example, if an XHTML page is used to display a type sample of Latin, Cyrillic, and Greek glyphs that has no language, hence can have no language code; see the test file at joeclark.org/dossiers/FontGlyphSynopsis-2.html (discussed by Joe Clark in his blog entry of 19 April 2005).

Please also recommend that authors indicate all changes in natural language for reasons of accessibility.

Editiorial comment: I18N is an abbreviation and should be marked up as such (<abbr title="Internationalization">I18N</abbr>).

15. XHTML Bi-directional Text Attribute Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-bidi.html.

Please move the definition of block-level and inline element to the section “Terms and Defintions” and link to it.

16. XHTML Edit Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-edit.html.

Previous HTML specifications did not allow authors to indicate that a deletion and an insertion belonged together and constituted a replacement. The proposed edit attributes modules does still not allow this unless something like the following is valid markup:

<p>I will do it next 
  <span edit="changed">
    <span edit="deleted">week</span>
    <span edit="inserted">month</span>
  </span>.
</p>

Is nesting span elemens allowed in XHTML 2? Alternatively, XHTML 2 might preserve the del and ins elements and add optional ID & IDREF attribute types that allow linking insertions and deletions. (Jukka Korpela suggested other elements in his critical review of the HTML 4.0 draft.)

Why not add an attribute that allows the author to insert a comment or rationale for the edit, and an attribute that can contain the name or other identifier of the editor? With these attributes added, XHTML 2 would support collaborative authoring and editing of documents, just like some word processing formats.

17. XHTML Embedding Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-embedding.html.

Please specify that when accessing the remote resource does not fail, the user should still have access to the content of the element (for reasons of accessibility). In addition to this, the user agent must provide a means to access this element content, even if the embedded resource is available and supported by the user agent (including plug-ins). For example, it is possible that a user who is deaf or hard of hearing has media players installed for video and that these same media players also handle audio; when audio is present in a page, the user should be able to get at the alternative content (for example, a transcript) without first uninstalling or disabling his media players.

18. XHTML Handler Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-handler.html.

(No comment.)

19. XHTML Image Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-image.html.

This section says: Like the object element, this element's content is only presented if the referenced resource is not presentable. However, for reasons of accessibility, it is important to add that user agents must allow users to access the element's content if the references resource is “presentable”.

What does “presentable” mean? If the user has turned off images, but the image is availabe at the given URI, is the image considered “presentable” or not?

Another accessibility issue: there is a well-documented convention (see for example Creating Accessible Images by WebAIM, Jim Thatcher's web accessibility tutorial, IBM's Web accessibility developer guidelines, Alexander Day's presentation notes on text alternatives, RNIB's Web Access Centre) that HTML 4.x/XHTML 1.x img elements that should be ignored by screen readers (especially spacer images and purely decorative images), have an empty alt attribute. However, as Joe Clark has pointed out on the WCAG mailing list, this convention is not in the specification. In other words, the meaning of an empty alt attribute is undefined. XHTML 2 replaces the alt attribute type with mixed content inside the img element type. Please define the meaning of an empty img element (i.e. one without a text alternative).

The src attribute may be applied to any element, but there is still no element or attribute in XHTML to provide captions for images (or other objects), even though captions exist for tables. Why not add an optional caption element to the content model of img? The content model could be (#PCDATA | Text)* | caption, (#PCDATA | Text)*.

20. XHTML Image Map Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-csImgMap.html.

Please add that, for reasons of accessibility, client-side image maps are preferred to server-side image maps. Server-side image maps should only be used if a client-side implementation is not feasible (for example, if exact co-ordinates — as opposed to regions — are required).

21. XHTML Media Attribute Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-mediaAttribute.html.

(No comment.)

22. XHTML Metainformation Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-meta.html.

Forward and reverse links: this section says that Both the rel and rev attributes may be specified simultaneously. Can you provide an example where providing both simultaneously makes sense? It seems contradictory that the relationship and the reverse relationship of one document with another can be the same. If hreflang is defined in addition to rel and rev, is it correct to assume that hreflang applies to both relationships?

The meta element type: (last two examples) if both the href attribute and the element content contain a URI, which one takes precedence?

For reasons of accessibility, please discourage authors from using meta to refresh the current page after a number of seconds or to forward to another page.

Chaining Metadata: according to the explanation of the penultimate example, <link rel="dc:source" href="urn:isbn:0140449132"> means that The quote has a source of Crime and Punishment. However, ISBN numbers do not uniquely identify books but editions of books. The famous Constance Garnett translation of Crime and Punishment has a different ISBN than Jessie Coulson's translation, and both have a different ISBN than an edition of the original Russian text (which obviously did not have an ISBN number when it was published for the first time, because ISBN was introduced in the second half of the 20th century). Moreover, a hardcover and a flexicover edition of the same book (i.e. the same edition which is available in different bindings) also get different ISBN numbers. The ISBN number provided in the example does not refer to the “work” Crime and Punishment but to the paperback edition of David McDuff's translation, published by Penguin in the series Penguin Classics in 2002, and to this edition only. Please correct the explanation of the example.

23. XHTML Metainformation Attributes Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-metaAttributes.html.

The datatype attribute type: it is not clear if this attribute type completely replaces HTML 4's scheme attribute. For example, in HTML 4, you could write:

<meta name="DC.Type" scheme="DCMIType" content="Text" />
<meta name="DC.Language" scheme="ISO 639-1" content="en" />
<meta name="DC.Date.dateSubmitted" content="2004-11-30" />
<meta name="DC.Source" scheme="URI" content="#worksCited" />
<meta name="DC.Creator.email" content="Christophe.Strobbe@united-nitpickers.com" />

Data types such as languages, dates and URIs are covered in W3C XML Schema's datatypes, but others, like ISBN, the DCMITypes (and the PRISM types and categories) are not. Please state that not only XSD types are allowed and whether the labels defined in Dublin Core, PRISM and other standards are allowed in this attribute type (otherwise, something like HTML 4's scheme attribute needs to be reintroduced).

The property attribute type: a note below the second example says: Note that previous versions of XHTML included an author property; this has now been replaced with the Dublin Core creator property. However, neither XHTML 1.0 nor XHTML 1.1 defined this property. HTML 4.x contains examples where an author property is used, but these examples seemed more descriptive (of current practice) than prescriptive. HTML 4 also contained examples with keywords, copyright and date, but did not define how they should be used. Each of these properties has an equivalent in Dublin Core (DC.Description, DC.Subject, DC.Rights and DC.Date, respectively). Replacing only the author property (and not keywords, title or description) with its Dublin Core equivalent seems rather inconsistent. Please clarify this issue in the specification.

The rel attribute type: for the properties index and glossary, please specify whether it should be an index or glossary (respectively) for the resource or the whole website or if this does not matter.

Metadata as content: in the two examples, span elements are used where heading elements and paragraph elements seem more appropriate. Please make sure that all code examples in the specification use the most appropriate element types.

24. XHTML Object Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-object.html.

Rules for processing objects: please change these rules to take into account the accessibility-related comments mentioned in relation with the Embedding Attributes Module. This means that the statement: When a user agent is able to successfully process an object element it MUST not attempt to process inner elements needs to be modified: a user agent MUST always provide a means to access the content of an object element (for example, through the DOM, if that is the API that the user agent implements).

The penultimate example in the section Rules for processing objects demonstrates how text alternatives may be used. This example does not improve accessibility for people with disabilities; these people require a description or a transcript. Please adapt the example. Note that the term used in the current WCAG draft is “text alternative” instead of “alternate text”.

The last example in the section Referencing object data demonstrates a bad practice, reminiscent of “This website uses frames but your browser does not support it” (in an HTML 4 noframes element). Please replace the text This user agent cannot process this movie with a description or transcript to enhance accessibility.

The standby element type: please specify the content type of this element type: only PCDATA or mixed content that also allows acronyms, language switches, etcetera?

25. XHTML Role Access Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-role.html.

[The access element type is controversial: see the discussion in early June on the WAI Interest Group's mailing list and a follow-up discussion in early July on the WAI Interest Group's mailing list. The PF Working Group is working on a position statement regarding hotkey requirements. A draft of this position statement was sent to the WCAG Working Group.]

Role collection: the number of suggested standard roles is very limited. There must be some research that can be drawn upon to define a larger number of roles. For example, graphical information can be classified in different types: real world images, maps, schematic diagrams, charts, and graphical user interfaces (P. Blenkhorn & D. Evans: “Using Speech and Touch to Enable Blind People to Access Schematic Diagrams.” Journal of Network and Computer Applications. 1998. 17-29).

Earlier this year, a student at City University (London) had users evaluate home pages of 16 big websites and let them identify “information units” to find out what kinds of units are common. The results indicated that there is a common set of information units which exist in many webpages:

logo,
main content,
picture/image (a large picture, often found below top nagivation or next to left-hand navigation, that is not really meaningful but that catches the visitor's eye),
search,
news,
top navigation or main navigation,
copyright (often referred to as “small print”).

Main content, navigation and search are already covered (by the roles main, navigation and search, respectively), but it is possible to go further.

Other useful roles would be “introduction”, “conclusion”, “summary” and “warning”. Element types for these structures would be better; roles should not be used as an excuse to turn down proposals for new elements.

@@todo: find research on users recognizing sections/zones in web pages and suggest other properties based on this research.

For reasons of accessibility, please provide a role property (or, even better, an element type) for ASCII art. Screen reader users need a mechanism to skip ASCII art, and unambiguously identifying ASCII art so that screen readers can recognize it, is more useful than hacks such as skip links. If you don't find the role an appropriate mechanism for identifying ASCII art, please provide another one. The pre element as it is currently defined is not sufficient.

26. Ruby Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-ruby.html.

Please provide at least one example of XHTML 2 with Ruby annotation.

27. XHTML Style Attribute Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-styleAttribute.html.

(No comment.)

28. XHTML Style Sheet Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-styleSheet.html.

Specifying external stylesheets: apparently, the link element type is no longer recommended as a mechanism for specifying external stylesheets. However, using processing instructions for this purpose pushes an important feature into a construct that is outside the reach of schemas and validators (unless RELAX NG can define pseudo-attributes for processing instructions??). If the link element type can no longer be used for this purpose, please state this explicitly, either in Module 28 or in Module 22 (Metainformation), and in the section on Major Differences with XHTML 1. However, if the link element type can still be used for specifying external stylesheet, please make this explicit and provide example code. Apparently, link still has all the necessary attributes: rel (module 23: metainformation), title (module 12: core attributes), href (module 13: hypertext attributes) and hreftype — which apparently replaces type — (module 13: hypertext attributes).

29. XHTML Tables Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-tables.html.

The sentence: When this module is used, it adds the table element to the Structural content set of the Structural Module sounds rather mysterious, especially because this module is already part of the Structural Module. Please reformulate this.

The col and colgroup element types: in the example near the end of this section the table element contains an em element: is this allowed?

The summary element type: the example in this section also has an em element inside the table element: is this allowed?

[Table rendering by non-visual user agents: @@discuss on xtech mailing list, unless the Protocols and Formats Working Group has already provided input on this.]

Accessibility: please mention that nested tables are difficult to navigate for users with screen readers. Also encourage the use of CSS instead of tables for layout.

30. XForms Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-xforms.html.

Statements like When this module is included, the XForms Group content set is added to the Structural content set, and to the Text content set do not clarify how XForms is to be combined with XHTML 2: do both vocabularies share the same namespace?

Please provide a full example of XHTML 2 with XForms.

Note: it should be possible to write a schema (for example, in W3C XML Schema) that defines where XForms elements and attributes are allowed in XHTML 2 documents. (If this is not possible, the XHTML Working Group has a serious problem.)

31. XML Events Module

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/mod-xml-events.html.

(No comment.)

Appendix H. Style sheet for XHTML 2

URL: http://www.w3.org/TR/2005/WD-xhtml2-20050527/xhtml2-style.html.

General comment: the Web Content Accessibility Guidelines 1.0 recommend that a background colour should be specified if a foreground colour is specified (and vice versa), and that numbers instead of names should be used for colours. Please make sure that the default stylesheet, and any other stylesheets that may be published as part of the specification, always sets background and foreground text colours atomically (i.e. both or neither) in the style rules. (Specifying colours with numbers instead of names may be debatable.)

body { padding: 8px; line-height: 1.2 }: a line height of 1.3 or greater would be more readable. (Several sources point out that line height should be greater when text columns are wider. See Typographical measurement systems by Jan Roland Eriksson for recommendations on line height for different column widths, and Forgotten Times by Andy Hume for advice on Times New Roman.) Adding some word spacing (word-spacing:0.1em) and letter spacing (letter-spacing:0.01em) would also make text easier to read. More whitespace can also help; see Reading Online Text: A Comparison of Four White Space Layouts by Barbara Chaparro and others.

The stylesheet does not define fonts for body or for @media print. Some tests and research have shown that serif fonts are easier to read on a printed page and that sans-serif fonts are easier on a computer screen (see for example HTML E-Mail: Text Font Readability Study by Ralph F. Wilson).

h1, h2, h3, h4, h5, h6 { font-family: sans-serif; font-weight: bolder }: serif fonts actually work quite well for large text; the Coudal website is a good example. Why is the h element type not in this list?

@@todo: improve stylesheet

Christophe Strobbe: Christophe.Strobbe ( @ ) esat.kuleuven.be