Re: XHTML and MIME from Lachlan Hunt on 2006-09-03 (www-forms@w3.org from September 2006)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Sun, 03 Sep 2006 11:26:20 +1000
To: John Boyer <boyerj@ca.ibm.com>
CC: public-appformats@w3.org, www-forms@w3.org
Message-ID: <44FA2F3C.7040708@lachy.id.au>
John Boyer wrote:
> Responses to a few people:
> 
> JB: > 2) Why do you say "text/html is not XML"?
> 
> Lachlan:
> Um.  Because it's not!  See earlier in the thread where it was mentioned 
> that XHTML documents served as text/html are not treated as XML, but 
> rather as any other erroneous HTML document, in tag-soup parsers.
> 
> JB: Exclamation is not explanation.  XHTML served as text/html are not 
> treated as XML because your current code makes no effort to attempt that 
> first.   In my earliest posts on this subject, I said that an application
> should lead with the attempt to parse XML, then follow with recovery 
> strategies, or that it could try HTML first until it found "new features" 
> then switch to an attempt to use XML.

I get the feeling you're basing this and other arguments on the fallacy 
that text/html can be treated as XML because RFC 2854 (or any other 
spec) doesn't explicitly define it as not being XML and because XHTML 
1.0 can be *compatible* with HTML4 browsers.

It's exactly like saying text/plain can be treated as HTML because it 
isn't explicitly defined as not being HTML, and HTML source code can be 
sent as text/plain.  Unfortunately, IE does exactly that and I'm sure 
you're aware of the mess that has caused!

Content sniffing for (X)HTML is not defined anywhere and it is not 
endorsed by the HTML WG, who have previously stated that XHTML as 
text/html should be treated as HTML; and major browser vendors have 
already decided that they cannot and will not implement such a feature now.

Any solution developed *must not* ignore the major desktop browser 
vendors.  To do so would only further divide the two camps and result in 
another specification from you that will be ignored by both browser 
vendors and authors, making it effectively useless in the real world.

The solution must also be compatible with the current state of the web. 
  If mainstream browsers did actually attempt what you suggest by 
parsing real-world text/html content as XML and switching to HTML at the 
first well-formedness error, then in approximately 99.9% of all cases 
(if not more!), the browser would simply be wasting time with an XML 
parser when it's just going to end up using the tag-soup parser anyway.

Given that, and the other technical reasons given by Anne and Henri, 
it's time to give up the idea that text/html content can be treated as 
XML in the real world and move on.

> The explanation for why not to do it this way has so far been "Cuz we 
> don't wanna!'

No, my arguments have been based on technical reasons, specs and 
evidence of real-world authoring habits.

> On the technical side, Mark B has already shown it works,  and Raman
> described an even smoother technique that would allow an even more 
> graceful degradation.

Unfortunately, the "it works for me" argument they've given simply 
doesn't hold up.  It is you, and the others in your camp, that have been 
arguing "Cuz we can" and presenting evidence that relies on *undefined* 
handling of XML in text/html.

> Anne: Partially because a pretty large community (as I perceive it anyway) 
> considers that to be harmful. I also don't really see the point in doing 
> failure recovery when parsing XHTML, except perhaps for debugging...
> 
> JB: Declaration isn't explanation either.  Why do you consider it harmful?

One simply has to look at real-world evidence of XHTML served as 
text/html to see how many fatal mistakes are made by millions of 
authors, which would make any real switch to XML incredibly painful. 
Such mistakes include (among others) well-formedness errors, character 
encoding issues, and scripts and stylesheets relying on HTML handling.

> The problem here is that sometimes folks are advocating for relaxed 
> graceful degradation and at other times rigid adherence to rules...

Graceful degradation and adherence to the rules are not mutually 
exclusive goals.  There is no problem here.

> ...that have little justification other than preventing a useful 
> migration from happening over time.

There has been plenty of reasons given and none of them have anything to 
do with preventing migration to XML.

> Elliote Harold: In a typical browser, yes. However I routinely download 
> such pages with non-browser-tools based on XML parsers; and there the 
> results are quite different. In these contexts, the XML-nature of these 
> pages is very useful to me.

See above.  We cannot ignore typical browsers when developing a 
solution.  Also, what you do with a file once you've downloaded it for 
offline use is up to you.  There are no interoperability concerns with 
that and is not relevant to this discussion.

-- 
Lachlan Hunt
http://lachy.id.au/
Received on Sunday, 3 September 2006 01:26:34 UTC