- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 21 Apr 2008 21:52:51 -0500
- To: William F Hammond <hammond@csc.albany.edu>
- CC: public-html@w3.org, www-math@w3.org, www-svg@w3.org
William F Hammond wrote: >>>>> 1. Many search engines appear not to look at application/xhtml+xml. >>>> That seems like a much simpler thing to fix in search engines than in >>>> the specification and UAs, to be honest. >>> Technically yes, but politically no. >> Why, exactly? > > I already explained that. I'm sorry, but you in fact did not. You just said that "search engines won't do it because they don't see a benefit". UAs see no benefit to complicating the parsing model. Why is the claim that it is politically easier to make the UAs change than it is to make the search engines change true? I'm glad we agree that technically (and especially in terms of speed of roll-out) the search engine change is likely to be easier. >> Have you actually brought this up with any search engine providers? > > It was mentioned in the parent of this cross-posted thread; > see http://lists.w3.org/Archives/Public/www-math/2008Mar/0042.html That doesn't really answer my question... >> Uh... We're talking about a tradeoff between complexity in all >> shipping HTML parsers and complexity in search engines. Content >> providers don't even enter the picture here. > > Yes they do; it's been a consistent theme over the years since 2001 > in www-math@w3.org. How do they enter, exactly? Your complaint is that search engines don't search application/xhtml+xml, so it would be good if UAs would sniff some text/html content as text/html. Why would a change in search engine behavior here entail any effort whatsoever on teh part of content providers? >> Not to mention that you never answered my concerns about ambiguous >> doctype detection. > > But I did; and I said in the worst case new specs could provide > an easy-for-browsers method, going forward, to flag the distinction. As I said in my previous mail, any such method introduces serious security concerns, and mitigating those will involve updates to a lot more software than just web browsers. >> I'm not sure where the conclusion follows from, since right now >> browsers handle those types just fine if the content is something they >> know what to do with. > > The world of xml has two parts: (1) documents for human reading > and (2) electronic data. Not every xml instance is suitable for > browsers. Agreed, but what does that have to do with this discussion, which is about whether it should be possible to send XHTML to a web browser using the text/html MIME type and have it be parsed as XML? >> My point is that the application/xhtml+xml vs application/xml >> distinction is useful in providing extra information (that the XML >> document is in fact expected to be XHTML), > > to be XHTML and not be random EDI stuff largely unsuitable for > display How commonly does such "random EDI stuff" contain XHTML-namespace nodes? How often is a web browser pointed to it? How do you tell apart "random EDI stuff" and a multi-namespace document? In any case, all this is rather far afieldf from the original proposal to ignore the text/html type and perform content sniffing to see whether it's "really" HTML or whether it's actually XHTML. > David Carlisle has referred you to RFC 3236 for this. Indeed. As I replied to him, I stand corrected. -Boris
Received on Tuesday, 22 April 2008 02:54:27 UTC