Re: Why your XHTML article is wrong from Aaron Swartz on 2003-01-20 (www-archive@w3.org from January 2003)

From: Aaron Swartz <me@aaronsw.com>
Date: Mon, 20 Jan 2003 11:49:15 -0600
To: Ian Hickson <ian@hixie.ch>
Cc: public web archive <www-archive@w3.org>
Message-Id: <7DF35AEE-2C9F-11D7-A606-0003936780B2@aaronsw.com>
>>> XHTML sent as text/html is treated as legacy tag soup by UAs. Legacy
>>> tag soup does not support namespaces.
>> I'm not sure what this means. How do you want the UA to support
>> namespaces?
> In the DOM, in CSS, in mapping elements to their semantics and default
> presentations, etc.

Does CSS have namespace support yet? It would be nice... If so, please 
let me know what I should add to 
http://www.aaronsw.com/2002/HTMLnamespaces

> Why, what did you mean by it?

I just wanted to add namespaced elements and attributes so as to 
prevent collisions.

>> Even if it was normative, it doesn't outlaw namespaces.
> The normative section of XHTML that I quoted above says "...as they are
> compatible with most HTML browsers". I am not aware of any HTML 
> browsers
> that support namespaces, therefore it seems unreasonable to assume that
> namespaces may be used with text/html "HTML-compatible" documents.

They're certainly backwards-compatible when used properly, just adding 
an xmlns attribute to the <head> is.

>>> Document sent as text/html are handled as tag soup by most UAs.
>>> Most authors only check their documents look good in their UA of
>>> choice. This means that most authors are not checking for validity.
>> This might be because most UA authors are, in the words of Mark
>> Pilgrim "alpha male[s who] chase after bleeding edge technologies".
> When I say "authors" I mean document authors, when I say
> "implementers" I mean UA writers.

My point is that such strictness is utter stupidity. It might make 
sense if the goal was to be able to get rid of SGML processors in web 
browsers but that will never happen because HTML is a format and not a 
protocol. The same warning applies:

"""
** A warning about evolution

It's easy to add syntax to a protocol. First upgrade all the readers to 
understand the syntax; then the writers can safely start using it.

For most protocols, it's just as easy to remove syntax. First upgrade 
all the writers to avoid the syntax; then the readers can safely stop 
supporting it.

Mail is different. A mail message lasts forever. Millions of people 
have saved billions of messages; and they expect every new MUA to be 
able to parse every one of those messages. Tomorrow's readers have to 
be compatible with yesterday's writers.

If you're a new implementor, you'll be shocked at how badly 822 was 
designed. Extracting even the simplest information from a message---the 
author's address, for example, or the sending date---is excruciatingly 
painful. And I see no sign that we'll ever be rid of the horrors of 822 
syntax; how can we convince users to convert their old mailboxes to a 
sensible new format?
"""

So what exactly is the point except to show how macho one's markup is?

> Users would fast get irritated at a UA which said "this page is
> invalid" to almost every page they visited.

But they'd love one that refused to display almost every page?

>>>>>  * The only real advantage to using XHTML rather than HTML is
>>>>>    that it is then possible to use XML tools with it. However, if
>>>>>    tools are being used, then the same tools might as well
>>>>>    produce HTML for you. Alternatively, the tools could take SGML
>>>>>    as input instead of XML.
>>>> And tools could parse and produce TeX too. By your reasoning, it'd
>>>> be safe for the Web to move to TeX.
>>> TeX is not semantically rich, so it is not even relevant here.
>> I was pointing out the error in your reasoning. If tools are being
>> used, then the same tools "might as well" name me Supreme Overlord
>> of the W3C. But they don't. Why are you bringing it up?
> The context is "why XHTML is no better than HTML". Since one of the
> primary arguments I have been presented with is "XHTML has more
> tools", I was refuting this myth.

I was pointing out that your reasoning was invalid. The tools _do not_ 
produce HTML for me. They do not produce TeX for me. Why should I waste 
my time making them support this ugly hard-to-parse format you profess 
to hate?

>>>> It also found a Mozilla bug:
>>>> Test case: http://www.aaronsw.com/2002/fixedxmlns
>>>> Do you want to file a bug?
>>> That isn't a bug. Mozilla is a non-validating parser, and as such
>>> does not have to do attribute defaulting.
>> That's insane. How does it handle the declared HTML entities then?
> It has a built in SGML catalogue which maps known DOCTYPEs to built in
> DTDs that define the entities. (It also correctly handles HTML ID
> attributes, and does so by hard coding at the C++ level, I believe.)
>
> Why is that insane? Validation is a very time-consuming process which
> provides little benefit to the user.

It could hardcode this FIXED attribute for this very popular format.

>>> Anyway, that document is invalid.
>> Why?
> It is missing the xmlns attribute on the root node, as per 3.1.1.3.
>    http://www.w3.org/TR/xhtml1/#strict

That's a definition of strict conformance, not validity.

>    1. XHTML relies or tag soup parsers' error recovery.

Sorry, can you explain why this is a disadvantage?

>    2. Users of XHTML expect to be able to switch to an XML parsing
>       mode with little effort, but in practice there are big
>       differences that mean this transition is not smooth.

This is not a disadvantage but a caution.

> what other argument is there to defend sending XHTML as text/html in 
> the face of the disadvantages I have listed?

Namespaces are just one of the XML tools that can be used with XHTML.

-- 
Aaron Swartz [http://www.aaronsw.com/]
Received on Monday, 20 January 2003 12:49:18 UTC