Re: XHTML Considered Harmful

(Responding to Arjun Ray and Ian Hixie)

I think that the 3 of us agree on most of this.  In particular, I do
not see serious issues about markup here.

But I believe that each of them sees in opposite ways no overlap
between text/html and XML markup although the W3C XHTML spec does, RFC
2854 does, and W3C's Amaya does.  Of course, it is correct that there
is no overlap between XHTML and tag soup, where I understand "tag
soup" to mean HTML without a document type declaration.  And I
perceive classical mass market user agents as ignorers of document
type declarations even if they are provided, so that these agents
always have been tag soup handlers.

My understanding is that the overlap is provided for the purpose of
making it possible for content providers to bring up XHTML documents
that are usable -- at least to some extent -- in old user agents.
That means necessarily that they must be handled by old non-rigorous
user agents as tag soup.

There is no suggestion in any spec I know that a given FPI should have
more than one method of construal in an SGML or XML parser.  (Arjun:
were you seriously suggesting that a given FPI used in a document type
declaration to refer to an external document type definition, with,
for the sake of discussion, no internal subset, can be used with more
than one SGML declaration?)

Whatever notion of "compatibility" arises from the informative Appendix
C of the XHTML spec is simply in the observation that there are ways
to prepare a sophisticated XHTML document -- for example under the
Carlisle/Altheim "XHTML 1.1 plus MathML 2.0" document type -- so that
old mass market user agents can parse it as tag soup.

What I perceive as the two most widely distributed mass market user
agents are behaving in diametrically opposite ways on the mime type
issue.  This is a big problem for content providers.

If the web is to move beyond tag soup in a smooth way, I think it
clear that text/html should be the primary content-type for all XML
document types that are extensions of the historic HTML language
and that have been prepared to degrade to tag soup.  This is necessary
to enable content providers to make a smooth orderly transtion.

It would be outrageous for a new XHTML-capable user agent to deny
content providers the reward for this effort during the rather long
time that the content providers are concerned about reaching readers
with old user agents.  This is simply NOT the W3C-specified model,
and it is not the behavior of W3C's Amaya.

The writers of XHTML capable user agents need to understand the not
very complicated subtleties of document prolog construction that arise
with XHTML in order to be able to smoothly service old and new.  This
is a not run time performance hit.  Notice how smoothly Amaya
navigates from tag soup to Carlisle/Altheim documents.

If Amaya can do it, then the big guys can do it, too.

                                    -- Bill

-----------------------------------------------------------------------

Responses on finer points follow below.

Arjun wrote:

> Ian and I just went over the conformance requirements, with less than 
> happy conclusions.  Do you disagree with them?   

I agree with Arjun, to the extent that I've looked.

> The difference has to do with basic syntax.  LINK elements have EMPTY
> declared content, and are subelements of HEAD which does not allow
> mixed content.  Thus, the form <LINK> will not validate as XML, and
> the form <LINK/> will not validate as RCS SGML.

[ Actually: <link/>.  :-) ]

Yes, a given instance cannot be both valid HTML 4.01 and valid xml
of any kind.

But the compatibility assertion is that "<link ...  />" or
"<link ...
/>"
(with a positive amount whitespace) degrades as tag soup in old user
agents.  Example: the root URI at W3C.
 
> I'm sorry, I don't understand this.  I know of no ratified notion of
> "correct validation" which predicates (the contents of) an SGML
> declaration on (the contents of) a document type declaration.  I

Realistically, a human never sees a fully assembled HTML 4.01
document.  For example with SP one uses a catalog.  The catalog may be
specified as an argument to SP.  Each catalog points to an SGML
declaration.  Each FPI in the catalog points to a system identifier
for the document type declaration (external subset only).

Ian wrote in reply to Arjun:

> > The idea that non-geeks should respect geeky niceties is Canutism at
> > its worst. "Zero tolerance" is one thing if end-users can be made to
> > expect it; it's another when precisely the opposite is the
> > expectation being sold to the public.
> 
> I would tend to agree with this. I don't think we (the W3C and its
> community) should be bothering to promote "compatability" of XHTML and
> Tag Soup. Here is how I think it should work:

XHTML and tag soup are very different.  The point, however, is that
there is an easy way for most XHTML, strictly conforming or not, to be
prepared so that it qualifies both as XHTML in a new user agent and as
tag soup in an old user agent.  That is the essence of the advertised
compatibility.  Nobody ever said that tag soup would get non-failing
treatment in a new XHTML user agent.  Check out Amaya, which yells
about XHTML but not about tag soup.  For this purpose an example of
classical tag soup might be Friday's html version of the "Scout
Report", which does have a document type declaration but has some
validation issues.  Amaya will not yell about it, but Amaya will yell
about problems in XHTML, regardless of mime type.

>   4. Document authors use XHTML (text/xml).
...
> Step 4 is in the future.

Step 4 is realized by Amaya, which handles XHTML either as text/html
or as text/xml though there is no justification in any XHTML-related
specification for the serving of XHTML as text/xml.  Still it would
appear to be justified by RFC 3023.

Why not bring Mozilla up to speed?

> I fail to understand the point of that.

It's a service for content providers.  It makes it possible for a
huge web of documents to be moved slowly from the old world to the
new world without having to worry about whether readers have old
user agents or new user agents.

The Mozilla 0.9.1 behavior forces content-providers either to keep
dual archives or else, in serving their sleek new XHTML as text/html,
to give up the benefit of new handling in Mozilla.  Worse than that,
if they do not keep dual archives and if they are not validating, they
won't really know if it "works" until they're in deep trouble.  Or
they might end up thinking that someone else's new browser is doing a
much better job.

> All I see are many reason not to do it, the primary one being that it
> will cause XHTML UAs to have to be backwards compatible with a
> premature Step 4's supposedly-XHTML content which works in today's
> browsers... otherwise known as Tag Soup. Welcome back to Step 1.

No, the new user agent needs, like Amaya, to make a quick early decision
about which way to go.

As I've said before, the W3C HTML WG could give user agent writers a
bit more help in deciding how to proceed here.

                                    -- Bill

Received on Wednesday, 27 June 2001 12:44:09 UTC