Re: text/html for html and xhtml from Boris Zbarsky on 2008-04-22 (www-math@w3.org from April 2008)

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Mon, 21 Apr 2008 21:52:51 -0500
To: William F Hammond <hammond@csc.albany.edu>
CC: public-html@w3.org, www-math@w3.org, www-svg@w3.org
Message-ID: <480D5303.1020204@mit.edu>
William F Hammond wrote:
>>>>> 1.  Many search engines appear not to look at application/xhtml+xml.
>>>> That seems like a much simpler thing to fix in search engines than in
>>>> the specification and UAs, to be honest.
>>> Technically yes, but politically no.
>> Why, exactly?
> 
> I already explained that.

I'm sorry, but you in fact did not.  You just said that "search engines won't do 
it because they don't see a benefit".  UAs see no benefit to complicating the 
parsing model.  Why is the claim that it is politically easier to make the UAs 
change than it is to make the search engines change true?

I'm glad we agree that technically (and especially in terms of speed of 
roll-out) the search engine change is likely to be easier.

>> Have you actually brought this up with any search engine providers?
> 
> It was mentioned in the parent of this cross-posted thread;
> see http://lists.w3.org/Archives/Public/www-math/2008Mar/0042.html

That doesn't really answer my question...

>> Uh... We're talking about a tradeoff between complexity in all
>> shipping HTML parsers and complexity in search engines.  Content
>> providers don't even enter the picture here.
> 
> Yes they do; it's been a consistent theme over the years since 2001
> in www-math@w3.org.

How do they enter, exactly?  Your complaint is that search engines don't search 
application/xhtml+xml, so it would be good if UAs would sniff some text/html 
content as text/html.  Why would a change in search engine behavior here entail 
any effort whatsoever on teh part of content providers?

>> Not to mention that you never answered my concerns about ambiguous
>> doctype detection.
> 
> But I did; and I said in the worst case new specs could provide
> an easy-for-browsers method, going forward, to flag the distinction.

As I said in my previous mail, any such method introduces serious security 
concerns, and mitigating those will involve updates to a lot more software than 
just web browsers.

>> I'm not sure where the conclusion follows from, since right now
>> browsers handle those types just fine if the content is something they
>> know what to do with.
> 
> The world of xml has two parts: (1) documents for human reading
> and (2) electronic data.  Not every xml instance is suitable for
> browsers.

Agreed, but what does that have to do with this discussion, which is about 
whether it should be possible to send XHTML to a web browser using the text/html 
MIME type and have it be parsed as XML?

>> My point is that the application/xhtml+xml vs application/xml
>> distinction is useful in providing extra information (that the XML
>> document is in fact expected to be XHTML), 
> 
> to be XHTML and not be random EDI stuff largely unsuitable for
> display

How commonly does such "random EDI stuff" contain XHTML-namespace nodes?  How 
often is a web browser pointed to it?  How do you tell apart "random EDI stuff" 
and a multi-namespace document?

In any case, all this is rather far afieldf from the original proposal to ignore 
the text/html type and perform content sniffing to see whether it's "really" 
HTML or whether it's actually XHTML.

> David Carlisle has referred you to RFC 3236 for this.

Indeed.  As I replied to him, I stand corrected.

-Boris
Received on Tuesday, 22 April 2008 02:54:27 UTC