Re: Why your XHTML article is wrong

On Sun, 24 Nov 2002, Aaron Swartz wrote:
>> 
>> XHTML sent as text/html is treated as legacy tag soup by UAs. Legacy 
>> tag soup does not support namespaces.
>
> I'm not sure what this means. How do you want the UA to support
> namespaces?

In the DOM, in CSS, in mapping elements to their semantics and default
presentations, etc.

Why, what did you mean by it?


>> Only XHTML documents that are compatible with legacy tag soup (as 
>> defined by XHTML 1.0 Appendix C) may be sent as text/html.
>
> That appendix is informative so I don't need to follow it

XHTML1.0:

# 5. Compatibility Issues
#
# This section is normative. [...]
#
# XHTML Documents which follow the guidelines set forth in Appendix C,
# "HTML Compatibility Guidelines" may be labeled with the Internet Media
# Type "text/html" [RFC2854], as they are compatible with most HTML
# browsers.
 -- http://www.w3.org/TR/xhtml1/#media


RFC2854, section 2, refers to the above:

# In addition, [XHTML1] defines a profile of use of XHTML which is
# compatible with HTML 4.01 and which may also be labeled as text/html.
 -- http://www.ietf.org/rfc/rfc2854


In any case, if there was any ambiguity about the thinking of the HTML
working group, the XHTML Media Types note published to clarify these
issues should remove any doubts (section 3.1):

# In particular, 'text/html' is NOT suitable for XHTML Family document
# types that adds elements and attributes from foreign namespaces
 -- http://www.w3.org/TR/2002/NOTE-xhtml-media-types-20020801/#text-html


> but I'm pretty sure I do. Even if it was normative, it doesn't outlaw
> namespaces.

The normative section of XHTML that I quoted above says "...as they are
compatible with most HTML browsers". I am not aware of any HTML browsers
that support namespaces, therefore it seems unreasonable to assume that
namespaces may be used with text/html "HTML-compatible" documents.


>> All XHTML documents can be mapped directly to equivalent HTML documents
> Not if they include namespaces.

My bad, that was supposed to be "All HTML documents can be mapped
directly to equivalent XHTML documents".


>> Document sent as text/html are handled as tag soup by most UAs.
>> Most authors only check their documents look good in their UA of
>> choice. This means that most authors are not checking for validity.
> 
> This might be because most UA authors are, in the words of Mark
> Pilgrim "alpha male[s who] chase after bleeding edge technologies".

When I say "authors" I mean document authors, when I say
"implementers" I mean UA writers.


> Maybe if they spent time creating a UI indication of validity (ala
> iCab) instead of subjecting all readers to "user-hostile behavior",
> they wouldn't need such idiocy.

I'm not sure what idiocy you are referring to, but in any case, such a
UI indicator is a non-starter.

Users would fast get irritated at a UA which said "this page is
invalid" to almost every page they visited. Irritating your users is a
very bad idea if you want to get any sort of market share, especially
in a market like this one, where users will take any excuse to jump to
the market leader.


>>>>  * The only real advantage to using XHTML rather than HTML is
>>>>    that it is then possible to use XML tools with it. However, if
>>>>    tools are being used, then the same tools might as well
>>>>    produce HTML for you. Alternatively, the tools could take SGML
>>>>    as input instead of XML.
>>> And tools could parse and produce TeX too. By your reasoning, it'd
>>> be safe for the Web to move to TeX.
>> TeX is not semantically rich, so it is not even relevant here.
> I was pointing out the error in your reasoning. If tools are being
> used, then the same tools "might as well" name me Supreme Overlord
> of the W3C. But they don't. Why are you bringing it up?

The context is "why XHTML is no better than HTML". Since one of the
primary arguments I have been presented with is "XHTML has more
tools", I was refuting this myth.


>>>>  * HTML 4.01 contains everything that XHTML contains,
>>> HTML 4.01 doesn't allow namespaces.
>> Neither does XHTML sent as text/html.
> According to who?

See above: The HTML WG.
 

>>>>  so there is little reason to use XHTML in the real world.
>>> Even if the premise was true, that doesn't follow.
>> Assume for the moment that the premises are true, why does it not 
>> follow?
> 
> Does this follow? "The Library of Congress contains everything my
> home library contains, so there is little reason to use my home
> library in the real world." There's more to markup than what it can
> express, as you point out above (error handling, for example).

That is a false analogy.

The following are the only differences between HTML and XHTML sent as
text/html:

   1. XHTML relies or tag soup parsers' error recovery.

   2. Users of XHTML expect to be able to switch to an XML parsing
      mode with little effort, but in practice there are big
      differences that mean this transition is not smooth.

Since these are both reasons to avoid XHTML, it seems to me that using
XHTML (sent as text/html) is a pointless excercise.


>>> I have no problem sending my content with a special mime type to a
>>> client which will do the right thing with it, do you have code that
>>> will do this for me?
> 
> I've asked Mark for the code he's using.

It's based on:

   http://www.damowmow.com/playground/demos/mime-mod_rewrite/

...after changes to make it work in the real world.


>>> It also found a Mozilla bug:
>>> Test case: http://www.aaronsw.com/2002/fixedxmlns
>>> Do you want to file a bug?
>> That isn't a bug. Mozilla is a non-validating parser, and as such
>> does not have to do attribute defaulting.
> 
> That's insane. How does it handle the declared HTML entities then?

It has a built in SGML catalogue which maps known DOCTYPEs to built in
DTDs that define the entities. (It also correctly handles HTML ID
attributes, and does so by hard coding at the C++ level, I believe.)

Why is that insane? Validation is a very time-consuming process which
provides little benefit to the user.


>> Anyway, that document is invalid.
> 
> Why?

It is missing the xmlns attribute on the root node, as per 3.1.1.3.

   http://www.w3.org/TR/xhtml1/#strict


>>> We know that we will have to rewrite HTML pages to be XHTML.
>> Why?
> 
> OK, maybe only I will have to rewrite pages to be XHTML. It's a
> court-imposed term of my probation.

Seriously though. Why?

I don't see any reason to ever convert my HTML documents (or even my
text/plain documents, of which I have many) to XHTML.



Anyway, it seems to me your main argument is "XHTML-as-text/html is ok
because I can use namespaces with it, and this outweighs the two main
disadvantages of XHTML-as-text/html". Since namespaces cannot be used
with text/html (probably not in theory and definitely not in
practice), what other argument is there to defend sending XHTML as
text/html in the face of the disadvantages I have listed?

-- 
Ian Hickson                                      )\._.,--....,'``.    fL
"meow"                                          /,   _.. \   _\  ;`._ ,.
http://index.hixie.ch/                         `._.-(,_..'--(,_..'`-.;.'

Received on Tuesday, 26 November 2002 09:56:53 UTC