Re: text/html for html and xhtml from Boris Zbarsky on 2008-04-21 (www-svg@w3.org from April 2008)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 21 Apr 2008 13:47:35 -0500
To: William F Hammond <hammond@csc.albany.edu>
CC: public-html@w3.org, www-math@w3.org, www-svg@w3.org
Message-ID: <480CE147.30508@mit.edu>
William F Hammond wrote:
>>> 1.  Many search engines appear not to look at application/xhtml+xml.
>> That seems like a much simpler thing to fix in search engines than in
>> the specification and UAs, to be honest.
> 
> Technically yes, but politically no.

Why, exactly?  There are no more major search engines than major UAs, and the 
change on their end would have much faster uptake (no need to get users to upgrade).

Have you actually brought this up with any search engine providers?

>>                                            I don't see this as a
>> compelling reason to add complexity to the parsing model.
> 
> Not all that complex.  Even arguendo if it is, the issue is between
> one-time complexity for half a dozen user agent authors and many-time
> complexity for tens of thousands of content providers

Uh... We're talking about a tradeoff between complexity in all shipping HTML 
parsers and complexity in search engines.  Content providers don't even enter 
the picture here.

>> This is the argument for any type of content-type sniffing, no?
> 
> It's not.  It's merely saying that the boundary between text/html
> and application/xhtml+xml is (i) artificial and (ii) not well understood
> by content providers.

It's not that artificial: they two are parsed very very differently, and content 
that's "safe" (say in the sense of not executing script) when parsed as one is 
not necessarily safe when parsed as the other.

Which means that this sniffing has all the same security issues that any kind of 
content sniffing does, and would require updates to firewall software, etc, etc.

Not to mention that you never answered my concerns about ambiguous doctype 
detection.

>>> (And, of course, "text/xml" and "application/xml" are non-specific
>>> mimetypes for which there is no base namespace.  They are sane content
>>> channels for web browsers only when display is entirely controlled
>>> with something like CSS.)
>> Uh...  Have you tested this? ...
> 
> I hope you are not disagreeing with my characterization of the two
> umbrella XML mimetypes from a standards perspective.

Sure I am.  From a standards perspective, any XML can be sent as those types 
(modulo the encoding constraints on text/xml); what happens to it afterwards 
depends on the namespaces used and what a UA decides to do with the type.  All 
the semi-popular UAs that support XHTML support sending it as these types.  I 
don't see where the "sane ... only display is entirely controlled with something 
like CSS" conclusion follows from.  A UA is free to apply its default HTML 
stylesheet to any document that contains elements in the XHTML namespace, and 
UAs do just that.

> Long term those
> mimetypes might better be handled by XML triage agents than by web
> browsers.

I'm not sure where the conclusion follows from, since right now browsers handle 
those types just fine if the content is something they know what to do with.

>> If you're talking about UAs other than those three that support
>> application/xhtml+xml,  ...
> 
> Mozilla [Gecko], Opera, Safari, as you say, but also Amaya and
> IE-with-MathPlayer, possibly others that do not come to mind.

I repeat: have you tested this?  Amaya works exactly as Gecko, Opera, and Safari 
do, as far as I can tell.  Certainly an XHTML file sent as text/xml is rendered 
as XTHML in Amaya.

> See David Carlisle's message related to this:
> http://lists.w3.org/Archives/Public/www-math/2008Apr/0190.html

 From the article cited in that message, IE+MathPlayer treats the following 
types pretty much identically for purposes of this discussion:

   'application/xhtml+xml'
   'text/xml'
   'text/xml; charset=utf-8'
   'text/xml; charset=iso-8859-1'

I strongly suspect that there is no technical reason it couldn't also support 
application/xml using the exact same codepath...

My point is that the application/xhtml+xml vs application/xml distinction is 
useful in providing extra information (that the XML document is in fact expected 
to be XHTML), but by no means necessary to render XHTML.  Further, as things 
stand an application/xhtml+xml document that contains MathML is actually invalid 
(per the rules about what content you can label as application/xhtml+xml).  This 
is one of the things this working group aims to fix, but right now the only 
technically standards-compliant way to have MathML in XHTML is to serve the 
document as application/xml or text/xml, without an XHTML doctype, and rely on 
XML namespace handling in the UA to do the right thing.  Which it largely does.

-Boris
Received on Monday, 21 April 2008 18:48:30 UTC