- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 21 Apr 2008 13:47:35 -0500
- To: William F Hammond <hammond@csc.albany.edu>
- CC: public-html@w3.org, www-math@w3.org, www-svg@w3.org
William F Hammond wrote: >>> 1. Many search engines appear not to look at application/xhtml+xml. >> That seems like a much simpler thing to fix in search engines than in >> the specification and UAs, to be honest. > > Technically yes, but politically no. Why, exactly? There are no more major search engines than major UAs, and the change on their end would have much faster uptake (no need to get users to upgrade). Have you actually brought this up with any search engine providers? >> I don't see this as a >> compelling reason to add complexity to the parsing model. > > Not all that complex. Even arguendo if it is, the issue is between > one-time complexity for half a dozen user agent authors and many-time > complexity for tens of thousands of content providers Uh... We're talking about a tradeoff between complexity in all shipping HTML parsers and complexity in search engines. Content providers don't even enter the picture here. >> This is the argument for any type of content-type sniffing, no? > > It's not. It's merely saying that the boundary between text/html > and application/xhtml+xml is (i) artificial and (ii) not well understood > by content providers. It's not that artificial: they two are parsed very very differently, and content that's "safe" (say in the sense of not executing script) when parsed as one is not necessarily safe when parsed as the other. Which means that this sniffing has all the same security issues that any kind of content sniffing does, and would require updates to firewall software, etc, etc. Not to mention that you never answered my concerns about ambiguous doctype detection. >>> (And, of course, "text/xml" and "application/xml" are non-specific >>> mimetypes for which there is no base namespace. They are sane content >>> channels for web browsers only when display is entirely controlled >>> with something like CSS.) >> Uh... Have you tested this? ... > > I hope you are not disagreeing with my characterization of the two > umbrella XML mimetypes from a standards perspective. Sure I am. From a standards perspective, any XML can be sent as those types (modulo the encoding constraints on text/xml); what happens to it afterwards depends on the namespaces used and what a UA decides to do with the type. All the semi-popular UAs that support XHTML support sending it as these types. I don't see where the "sane ... only display is entirely controlled with something like CSS" conclusion follows from. A UA is free to apply its default HTML stylesheet to any document that contains elements in the XHTML namespace, and UAs do just that. > Long term those > mimetypes might better be handled by XML triage agents than by web > browsers. I'm not sure where the conclusion follows from, since right now browsers handle those types just fine if the content is something they know what to do with. >> If you're talking about UAs other than those three that support >> application/xhtml+xml, ... > > Mozilla [Gecko], Opera, Safari, as you say, but also Amaya and > IE-with-MathPlayer, possibly others that do not come to mind. I repeat: have you tested this? Amaya works exactly as Gecko, Opera, and Safari do, as far as I can tell. Certainly an XHTML file sent as text/xml is rendered as XTHML in Amaya. > See David Carlisle's message related to this: > http://lists.w3.org/Archives/Public/www-math/2008Apr/0190.html From the article cited in that message, IE+MathPlayer treats the following types pretty much identically for purposes of this discussion: 'application/xhtml+xml' 'text/xml' 'text/xml; charset=utf-8' 'text/xml; charset=iso-8859-1' I strongly suspect that there is no technical reason it couldn't also support application/xml using the exact same codepath... My point is that the application/xhtml+xml vs application/xml distinction is useful in providing extra information (that the XML document is in fact expected to be XHTML), but by no means necessary to render XHTML. Further, as things stand an application/xhtml+xml document that contains MathML is actually invalid (per the rules about what content you can label as application/xhtml+xml). This is one of the things this working group aims to fix, but right now the only technically standards-compliant way to have MathML in XHTML is to serve the document as application/xml or text/xml, without an XHTML doctype, and rely on XML namespace handling in the UA to do the right thing. Which it largely does. -Boris
Received on Monday, 21 April 2008 18:48:30 UTC