- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 20 Nov 2002 07:27:02 +0000 (GMT)
- To: Aaron Swartz <me@aaronsw.com>
- Cc: "www-archive@w3.org" <www-archive@w3.org>
First, let me be absolutely clear about what my opinion is, so that we don't argue at cross-purposes. I am in favour of XHTML itself, and nothing against the technology. The only thing I have a problem with is sending XHTML as text/html. Second, let's make sure we agree on some core facts: XHTML sent as text/html is treated as legacy tag soup by UAs. Legacy tag soup does not support namespaces. Only XHTML documents that are compatible with legacy tag soup (as defined by XHTML 1.0 Appendix C) may be sent as text/html. XML requires that UAs abort with a fatal error when parsing an ill-formed document. XHTML is an XML application and thus all of XML's parsing rules apply to XHTML. On Tue, 19 Nov 2002, Aaron Swartz wrote: >> >> It is suggested that authors should use HTML 4.01 instead of XHTML >> [...] > > XHTML is simpler, XHTML has fewer esoteric syntax rules, agreed. > more aesthetically pleasing, That is a subjective argument and not really one I am concerned about. All XHTML documents can be mapped directly to equivalent HTML documents and vice versa, meaning that either form can be used for content development, which is the only time a format's aesthetic qualities matter. > and works with deployed XML and HTML tools and specs (like > namespaces). I am not entirely sure what this means. Certainly, UAs do not support namespaces in legacy tag soup documents (or XHTML documents sent as text/html, which are treated as legacy tag soup documents). HTML works equally well with deployed SGML and HTML tools and specs. >> you are [...] relying on their error handling. > > I don't see why this is bad. Because error handling is not defined anywhere, and you are therefore relying on what is basically proprietary technology. > Relying on such slack is how we can build backwards-compatible specs; This is very much incorrect, so much so that an entire document metaformat was developed with one overwhelming requirement: that error handling rules be explicitly defined. (That format is now known as XML.) > otherwise upgrading would require a flag day. An example of a format which is backwards compatible due to good design is CSS. It has _forward_ compatible parsing rules which ensure that any conforming UA will treat any document in a predictable way. Thus upgrading CSS does _not_ require a "flag day". HTML is a technology with undefined error handling. The slack is in fact the source of most of the _incompatabilities_ between UAs. >> * <script> and <style> elements > > I don't use any. You are fortunate. Most people use both extensively. >> * Document sent as text/html are handled as tag soup [2] by most UAs. >> This means that authors are not checking for validity, > > That doesn't follow. You are correct, I am missing a step in that argument. Document sent as text/html are handled as tag soup by most UAs. Most authors only check their documents look good in their UA of choice. This means that most authors are not checking for validity. >> the main advantage of using XHTML [is] that it has to be valid > > That's pretty subjective; I don't consider that the main advantage. It was one of the primary goals of XML's development, as discussed above. >> * If you ever switch your XHTML documents from text/html to text/xml, >> then you will in all likelyhood end up with a considerable number >> of XML errors, meaning your content won't be readable by users. >> (Most XHTML documents do not validate.) > > Sure, invalid documents are invalid. What does this have to do with > XHTML? XHTML UAs _must_ refuse to render ill-formed documents, per the XML spec. This does not apply to legacy tag soup (aka HTML) UAs. This means that if ill-formed XHTML content is sent using an XML MIME type to UAs, it will no longer be readable by users, as compliant UAs will refuse to render the content. >> * A CSS stylesheet written for an HTML document has subtly different >> semantics [...] > > I don't take advantage of these. Surprisingly, this does indeed appear to be the case. >> * The only real advantage to using XHTML rather than HTML is that it >> is then possible to use XML tools with it. However, if tools are >> being used, then the same tools might as well produce HTML for you. >> Alternatively, the tools could take SGML as input instead of XML. > > And tools could parse and produce TeX too. By your reasoning, it'd be > safe for the Web to move to TeX. TeX is not semantically rich, so it is not even relevant here. >> * HTML 4.01 contains everything that XHTML contains, > > HTML 4.01 doesn't allow namespaces. Neither does XHTML sent as text/html. >> so there is little reason to use XHTML in the real world. > > Even if the premise was true, that doesn't follow. Assume for the moment that the premises are true, why does it not follow? >> UAs can't handle XHTML sent as text/html as XML > > I agree with this, and I'd like to hear suggestions on how to address > this problem. It's not a problem. Use application/xhtml+xml (or any of the other MIME types suggested by RFC 3023). > I have no problem sending my content with a special mime type to a > client which will do the right thing with it, do you have code that > will do this for me? Yes. See: http://software.hixie.ch/utilities/cgi/xhtml-for-ie/ Alternatively, see: http://www.damowmow.com/playground/demos/mime-mod_rewrite/ This is a problem I've had to solve for people several times. > Kluge: I've set up http://xhtml.aaronsw.com/ to be identical to > aaronsw.com except serve pages with an XHTML mime type. Doing this > found a few problems, as your article suggests it would. If you'd used HTML4, you would never have had to find problems, because you would never have had to take existing content and change its MIME type, effectively thrusting it into a world with new rules. I am all in favour of using XHTML, _for new content_, sent as an XML MIME type from the start. > It also found a Mozilla bug: > Test case: http://www.aaronsw.com/2002/fixedxmlns > Do you want to file a bug? That isn't a bug. Mozilla is a non-validating parser, and as such does not have to do attribute defaulting. Anyway, that document is invalid. >> There are few advantages to using XHTML if you are sending the content >> as text/html, and many disadvantages. > > You have not listed one disadvantage. Correction, you have not agreed to one disadvantage. I have listed at least six, including: 1. Relying on proprietary error handling technology 2. Syntactic differences such as <script> and <style> content models 3. Lack of syntax checking in UAs 4. Switching to the right MIME type causes problems 5. Differences in CSS semantics 6. Differences in DOM semantics >> Authors should stick to writing valid HTML 4.01 for the time being. >> Once user agents that support XML and XHTML sent as text/xml are >> widespread, then authors may reconsider learning and using XHTML. > > This makes no sense. We know that we will have to rewrite HTML pages to > be XHTML. Why? Why would you ever want to do this? Assuming Google is complete (it's not) there are approximately three billion pages out there, of which the overwhelming majority is HTML. Even if we assume that only a third of those are HTML (and the likely number is more like 99%, if not higher, and that's not even counting the invalid documents labelled as XHTML which will also need to be corrected before "switching" to XHTML), that's still one billion HTML documents. Why do you think we'll need to rewrite these billion documents? -- Ian Hickson )\._.,--....,'``. fL "meow" /, _.. \ _\ ;`._ ,. http://index.hixie.ch/ `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 20 November 2002 02:27:05 UTC