Re: XHTML Considered Harmful from Arjun Ray on 2001-06-27 (www-talk@w3.org from May to June 2001)

From: Arjun Ray <aray@q2.net>
Date: Wed, 27 Jun 2001 02:18:51 -0400 (EDT)
To: www-talk@w3.org
Message-ID: <Pine.LNX.4.21.0106270120300.20998-100000@info.q2.net>
On Tue, 26 Jun 2001, William F. Hammond wrote:

> Arjun Ray <aray@q2.net> writes, 24 Jun 2001 22:33:50 -0400 (EDT):

> Isn't it acceptable if there is no user agent tolerance in regard
> to XML conformance and if the use of tags outside of the default
> namespace is compliant with namespace rules?

Ian and I just went over the conformance requirements, with less than 
happy conclusions.  Do you disagree with them?   

> While it is true that a given instance will not validate as both
> classical HTML and as XHTML, this is no more serious than saying
> that a given instance of HTML 4.0 may not validate as HTML 3.2.

The difference has to do with basic syntax.  LINK elements have EMPTY
declared content, and are subelements of HEAD which does not allow
mixed content.  Thus, the form <LINK> will not validate as XML, and
the form <LINK/> will not validate as RCS SGML.

When ordinary people - non-geeks confronting a geeky distinction - are
harangued with all the hype about compatibility and whatnot, to belie
the natural expectation *encouraged* by words such as "compatibility"
- mamely, that "it oughta all work either way" and therefore the same
document "should" "validate" in both regimes - is not merely very bad
engineering, but also an open invitation for vendors to give people
what they've been *asked* to want (i.e "make it so" *in practice*).

The idea that non-geeks should respect geeky niceties is Canutism at
its worst.  "Zero tolerance" is one thing if end-users can be made to
expect it; it's another when precisely the opposite is the expectation
being sold to the public.

> In the W3C family of classical HTML specs there have been at least
> 3 different underlying SGML declarations.  

Substantively, only two, having to do with differences in the document
character set.  The SGML declarations have kept all the RCS features
intact - e.g. retaining / for NET - in some starry-eyed belief that
SGML-aware systems are going to take HTML-since-Mosaic seriously.

> Any correct validating system for classical HTML needs to
> comprehend that fact and needs to digest the document type
> declaration before picking the correct SGML declaration and,
> hence, before parsing.

I'm sorry, I don't understand this.  I know of no ratified notion of
"correct validation" which predicates (the contents of) an SGML
declaration on (the contents of) a document type declaration.  I
believe you are trying to retrofit a justification onto the guesswork
of online services such as validator.w3.org and www.htmlhelp.com.

[As usual, W3C myths of convenience are institutionalizing ad hoc
heuristics despite a standardized mechanism.  The WebSGML TC allows an
SGML declaration to have an external body dereferenced via a public
identifier.  Annex K.3.1.]

> It is, therefore, a nearly trivial matter to add XHTML to a correct
> pre-existing validating system for classical HTML.

Only to the extent that guesswork is trivial.
 
> Furthermore, in regard to namespace extensions of XHTML the crucial
> case in point at this time is MathML.

Besides "namespace extension" being a maguffin to start with, I'm not
sure how this is relevant.  Achitectures have been the answer for
years now.



Arjun
Received on Wednesday, 27 June 2001 02:04:34 UTC