Spiderman and the XHTML Kindergarten from Bjoern Hoehrmann on 2009-05-13 (www-archive@w3.org from May 2009)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 14 May 2009 00:13:39 +0200
To: www-archive@w3.org
Message-ID: <1igm05tt25ip8kg10hureru72agmo2n2c9@hive.bjoern.hoehrmann.de>

Good news everyone,

What is now known as the "XHTML2 Working Group" has recently published
Proposed Edited Recommendations for various XHTML specifications [1]. As
usual, The Director should not have approved them for publication due to
obvious procedural, technical, and editorial deficiencies.

That in itself is of course not too much of a problem, process require-
ments and quality standards can be overly demanding and with limited re-
sources, you might be unable to live up to them. To manage expectations,
and allow for corrective action, such problems have to be pointed out by
those encountering them.

That is in fact required by the Process document. In order to enter PER
status, a Working Group has to enumerate any and all known substantive
issues the group failed to address in a particular revision [2]. Rightly
so, it allows groups to address important problems quickly, and others
to verify that the important problems are being addressed quickly.

As one can find out in a couple of seconds clicking through the group's
issue tracking system [3] there are dozens of unresolved issues with the
documents in question and their dependencies as they fall in the group's
charter, none of which are being reported as unaddressed.

To pick one such issue: HTML 4.0 did not include the name attribute for
the form and img elements; HTML 4.01 changed that [4] adding it to all
document types. XHTML 1.0 is supposedly based on HTML 4.01 and it does
include the attribute in the Frameset and Transitional variants, but not
in the Strict variant, but does not account for this difference in the
specification.

The appropriate way to address this issue would affect conformance, and
as such this is a substantive issue. The group has been aware of this
problem for six years now [5] and even though it would take minutes to
fully address it, the group has taken no action on it, and did not note
this failure as required either.

The proposed document in fact pretty much claims the opposite [6], that
the document reflects corrections based on community feedback. It is
very rare that the group responds to community feedback at all, but even
issues where they promised swift action six years ago remain unchanged.

To pick one such issue: the current specification marks all references
as informative. Clearly that is incorrect as acknowledged by the group
[7] and in addition to being quite embarassing makes reasoning about the
specification difficult. The proposed document has the same flaw [8].

The References section is in fact rather curious as pretty much none of
them have been updated, even though that was the stated purpose of the
publication [9]. It lists RFC 2396 which has been obsoleted by 3986, the
first edition of Namespaces in XML, and quite importantly the second
edition of XML 1.0.

That is a particular problem. The fifth edition changed what characters
are allowed inside Names which affects what is allowed, say, in ID
attributes. With an unupdated specification it would be reasonable to
assume XHTML 1.0 would inherit those changes.

But with an updated specification that references an outdated specifi-
cation, we cannot make such an assumption. Perhaps there is some subtle
reason beyond the imagination of the uninitiated that caused them to
intentionally not update it, which would ultimately cause speculation
and argument in places such as the www-validator mailing list.

For the Validator this update would have been particularily helpful, as
then the SGML declaration for XML 1.0 as included in the specifications
would have been updated, saving the Validator developers the trouble of
creating their own unofficial version, which I take it has yet to happen
[A].

Editorially we need not look further than the SotD section [6] to know
how much care was put into drafting the document. According to it, the
only change of note is an update to Appendix A. Now Appendix A has not
been changed at all, it enumerates the document type definitions and
entity sets which have been left unchanged, including still listing the
INRIA as W3C host.

The change is actually to the controversial Appendix C. The idea is to
move the content into the accountability free realm of Working Group
Notes [B]. Needless to say that in drafting said Note, the group did
not find reason to address issues with the text collected in their own
issue tracking system, or much of the newly reported ones, with the
usual complaints about that [C].

The group also proposes to needlessly break all links into the section,
as tools such as my own Appendix C validator [D] would generate them.
Of course, the document does not actually follow said guidelines: they
did manage to use only xml:lang instead of xml:lang and lang for one of
the Chairs' affiliation; the other documents have similar problems.

Let's move on to XHTML 1.1. Procedurally it is important to note that
unlike the new version of XHTML 1.0 it did not pop out of nowhere, it
had a preceding Working Draft [E]. Of course this PER has much of the
same issues as the XHTML 1.0 PER.

Let's take for example the matter of unaddressed substantive issues. To
pick one, there are certain nesting rules that cannot be expressed in
the formal languages used to describe the format. XHTML 1.0 calls those
out explicitly, XHTML 1.1 and XHTML Modularization on which the latter
is based do not.

One may imagine that the rules from XHTML 1.0 are inherited, but XHTML
1.1 also includes elements from Ruby annotation [F] and does not call
out nesting prohibitions for those elements. Again addressing this will
affect conformance and the issue has been known for half a decade [G]
but remains unchanged.

For actually acknowledged issues you can also go back as many as eight
years like with [H] which somewhat ironically is quite related to the
issue [5].

The References section naturally has similar issues as XHTML 1.0 has,
though this time they manage to reference the slightly less outdated
Fourth Edition of XML 1.0, for example. One may wonder though why XHTML
1.0 is a normative reference even if it is referenced only in informa-
tive sections, and, say, Namespaces in XML informative, even though it
is referenced in normative text.

Moving on, we can have a look at the changes. The news piece [1] and the
Status section [I] inform us there is a couple of corrections and clari-
fications. Nothing could be further from the truth, we actually have the
greatest paradigm shift since the event of XHTML: DOCTYPE declarations
are now optional.

That is gross violation of the W3C Process. Given that the document was
a Working Draft beforehand, at least one of [J] and [K] applies, and of
course [L] and again [2].

The new way to specify the dialect you are using now seems to be the
version attribute. The version attribute is a long dead relict from the
1990s; being redundant it's been deprecated in HTML 4 and never made it
into its Strict variant, or XHTML 1.0 for that matter. If you have some
XHTML 1.1 and do not explicitly specify the version attribute, it quite
probably is non-conforming now. One has to wonder though whether adding
it would change that, given its deprecated nature [M].

One thing to say in their favour though is that they realized, more by
accident, that the version attribute must contain a formal public iden-
ifier per XHTML Modularization 1.1. They failed to realize that when
creating another dialect [N]. Naturally this change from XHTML 1.0 is
not mentioned in the document.

The other change, the re-addition of the lang attribute, is called out
in the status section, but otherwise quite similar. Of course, the moti-
vation mentioned in the status section, compatibility with user agents
and assistive technologies is not the actual motiviation that called for
this change [O].

The reasoning is rather ludicrous, if you want to use the attribute you
can use XHTML 1.0 unless you also want to use Ruby. If there is a need
to use legacy attributes from a decade ago alongside Ruby and you want
to have the result validate against some W3C approved document type,
then a new document type could easily be made. As noted in the abstract
the purpose [P] of XHTML 1.1 is a different one.

Clearly if compatibility is a concern all of the sudden, one might ex-
pect that they would re-evaluate other compatibility concerns previously
rejected. One such issue is the definition of the usemap attribute. It's
been changed in XHTML 1.1 in a way that causes compatibility problems.
Despite a number of promises to reevaluate and change it [Q] [R], there
is no change at all. But perhaps nobody noticed? [S] [T] are suggestive.

Of course, the new XHTML 1.1 DTD does not actually allow using the lang
attribute. They tried to allow it by defining the lang.attrib parameter
entity in the document model module, but when that is read the entity is
already defined, and the first declaration always wins [U]. It's telling
that they cannot even use their own framework for making such changes.

Needless to say that [2] requires the group to demonstrate that support
for the lang attribute and doctype-less documents has been implemented,
and that consequently there is no implementation report whatsoever. Last
time [V] they figured checking for DTD support in XML processors would
suffice, if they had bothered to check a document with a lang attribute
they might just have noticed this issue.

The idea that motivated this change [O], that you can now use text/html
for XHTML 1.1, is also hardly accomodated by this change, as the XHTML
1.1 specification requires to use application/xhtml+xml for documents.

If they wanted to do something good for language declarations in XHTML,
they might at least have brought XHTML 1.0 in line with the now rather
outdated second edition of XML 1.0 which they reference and allow the
empty string in xml:lang attributes as per [W]. Or at least not refer
to HTML 4 for the definition of the lang attribute in XHTML 1.1 [X] as
HTML 4 prohibits using the empty string as value for it [Y].

Now XHTML 1.1 is based on XHTML Modularization 1.1 and so inherits all
the issues with that document. That Recommendation is a successor of [Z]
the 2004 Modularization of XHTML 1.0 Second Edition draft. In 2005 the
group, or rather its predecessor, wanted to have it advanced to PER
status, but there were too many changes [a] [b] to proceed thus.

So the group renamed the draft XHTML Modularization 1.1 and had it pub-
lished as Proposed Recommendation in 2006 [c]. Now, a problem with that
was that reviewers filed about two dozens of issues with the 2004 draft,
most of which were not addressed, as is mandatory to enter PR status.

Unusually the Team investigated complaints resulting from this, and
found there were no procedural problems, except that the Disposition of
Comments [d] was not properly linked from the document. Now, that DoC
is not even for the document that was advanced, but rather for its pre-
decessor; except perhaps if you want to believe that the group discarded
the 2004 draft, started over, came up with materially the same document,
and had that published instead, replacing the first edition of XHTML
Modularization 1.0, as the second edition would have.

Be that as it may, an issue of particular import, raised quite a number
of times, was the definition of the profile attribute. It kinda takes
one or more URIs that identify meta data profiles, but the draft allowed
only a single URI. When the PR was published, there was no record of a
group decision on the matter. Alerted by process complaints, one got
added in haste [e], saying that this was a change made on purpose to use
the attribute for something else entirely, to identify compound document
profiles.

Couple of weeks later this was actually found to be an inadvertent edi-
torial error that will be corrected [f]. Given the attention the issue
received, surely this error has been corrected in the Recommendation?

Well as it turns out, not really. Both DTD and Schema in XHTML Modulari-
zation [g] [h] allow only a single URI, or in case of Schema, associate
incorrect semantics with the attribute value. Helpfully the DTD version
notes the profile attribute is reserved for future use with document
profiles. So much for an inadvertent editorial error.

Worse, the DTD defines a default value to the attribute, in case of
XHTML 1.1 that is the empty string. The implication of that is that the
default meta data profile of each XHTML 1.1 document is the document it-
self, which is almost always incorrect.

We can course effortlessly find as yet unreported issues simply using a
little script [i] and our favourite diff tool to compare the proposed
XHTML 1.1 DTD with the previous one, and find, for example, that the bdo
element suddenly lacks event attributes. And [3] has plenty unfixed old
issues.

Moving on we have XHTML Basic 1.1 and XHTML Print Second Edition. Shall
we have a look at how they deal with the issues we discussed before? So,
take the version attribute for example. Recall it is forbidden in XHTML
1.0, recommended in XHTML 1.1. In XHTML Print it's not applicable since
it was deprecated all the way back in HTML 4 [j]. XHTML Basic 1.1 does
not mention it.

How about DOCTYPEs? Well they are required in XHTML 1.0, optional in the
proposed XHTML 1.1 version, and in Basic 1.1 and Print they are required
again. Also of note is that in XHTML 1.1 documents must somehow conform
to the Schema *and* the DTD, while in Basic 1.1 and Print one or the
other is sufficient.

The lang attribute? It's not in Print, but it's in Basic, except that it
is not really there, thanks to the group modifying the DTD incorrectly.
XHTML Print then thankfully references the third edition of XML 1.0, so
we have a nice set with the 2nd, 3rd, and 4th, missing only the current
and first edition.

Oh, as we are talking about it, do not override parameter entities in
the internal subset when using XHTML Print or XHTML Basic 1.1, that's
not allowed in there. Use XHTML 1.1 instead. Always remember the rules:
if you need to open documents in a new tab, specify list item values,
or input modes, use XHTML Basic; if you need Ruby, use XHTML 1.1. If you
need both, you're screwed.

Actually this is kinda funny, because XHTML 1.1 is actually supposed to
have the target attribute [k], and it was in the 2007 Working Draft but
only in the DTD, not the schema [l]. That issue appears to be fixed now:
now it is in the schema but not in the specification prose or the DTD;
except in the flat version, as that one has not been updated since the
2007 draft, and is not in fact a DTD at all. If that change had been
noted in the draft as required, perhaps someone might have spotted the
mistake.

[1] http://www.w3.org/News/2009#item73
[2] http://www.w3.org/2005/10/Process-20051014/process.html#cfr-edited
[3] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues
[4] http://www.w3.org/TR/html4/appendix/changes.html#19991224
[5] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.0?id=6504
[6] http://www.w3.org/TR/2009/PER-xhtml1-20090507/#status
[7] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.0?id=6674
[8] http://www.w3.org/TR/2009/PER-xhtml1-20090507/#refs
[9] http://lists.w3.org/Archives/Public/public-xhtml2/2009Mar/0087.html
[A] http://dev.w3.org/cvsweb/validator/htdocs/sgml-lib/xml.dcl
[B] http://www.w3.org/TR/2009/NOTE-xhtml-media-types-20090116/
[C] http://lists.w3.org/Archives/Public/public-xhtml2/2009Feb/0032.html
[D] http://qa-dev.w3.org/~bjoern/appendix-c/validator/
[E] http://www.w3.org/TR/2007/WD-xhtml11-20070216/
[F] http://www.w3.org/TR/ruby/
[G] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.1-DTDs?id=8840
[H] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues/XHTML-1.1-text?id=480
[I] http://www.w3.org/TR/2009/PER-xhtml11-20090507/#status
[J] http://www.w3.org/2005/10/Process-20051014/process.html#return-to-wg
[K] http://www.w3.org/2005/10/Process-20051014/process.html#correction-classes
[L] http://www.w3.org/2005/10/Process-20051014/process.html#DocumentStatus
[M] http://www.w3.org/TR/html4/struct/global.html#adef-version
[N] http://www.w3.org/TR/2008/REC-rdfa-syntax-20081014/#docconf
[O] http://lists.w3.org/Archives/Public/public-xhtml2/2009Jan/0049.html
[P] http://www.w3.org/TR/2009/PER-xhtml11-20090507/xhtml11.html#abstract
[Q] http://lists.w3.org/Archives/Public/www-validator/2002Apr/0019.html
[R] http://ln.hixie.ch/?start=1172653243&count=1
[S] http://www.google.com/search?q=usemap+%22XHTML+1.1%22
[T] https://bugzilla.mozilla.org/show_bug.cgi?id=109445
[U] http://www.w3.org/TR/xml/#sec-entity-decl
[V] http://web.archive.org/*/http://www.w3.org/MarkUp/2006/m12n-11-implementation.html
[W] http://www.w3.org/XML/xml-V10-2e-errata#E41
[X] http://www.w3.org/TR/2009/PER-xhtml11-20090507/xhtml11.html#s_doctype
[Y] http://www.w3.org/TR/html4/struct/dirlang.html#langcodes
[Z] http://www.w3.org/TR/2004/WD-xhtml-modularization-20040218/
[a] http://www.w3.org/mid/42A99B2E.6020808%40w3.org
[b] http://www.w3.org/mid/op.sxtcfwl4smjzpq%40r600.lan
[c] http://www.w3.org/TR/2006/PR-xhtml-modularization-20060213/
[d] http://htmlwg.mn.aptest.com/htmlwg/xhtml-m12n-schema-lc-doc-20050907.html
[e] http://lists.w3.org/Archives/Public/www-html/2006Feb/0087.html
[f] http://www.w3.org/mid/44634B86.8080805%40w3.org
[g] http://www.w3.org/TR/xhtml-modularization/xhtml-modularization.html#a_modules_basicmods_2
[h] http://www.w3.org/TR/xhtml-modularization/xhtml-modularization.html#a_modules_basicmods
[i] http://lists.w3.org/Archives/Public/www-archive/2005Feb/0029.html
[j] http://www.w3.org/TR/2009/PER-xhtml-print-20090507/#s3.2
[k] http://www.w3.org/mid/op.sxtcfwl4smjzpq%40r600.lan
[l] http://htmlwg.mn.aptest.com/cgi-bin/voyager-issues/Modularization-Schemas?id=9721

regards,
--
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Wednesday, 13 May 2009 22:14:21 UTC