W3C home > Mailing lists > Public > whatwg@whatwg.org > November 2009

[whatwg] HTML5 doctypes incompatible with XHR if named entities present

From: Boris Zbarsky <bzbarsky@MIT.EDU>
Date: Thu, 12 Nov 2009 00:33:17 -0500
Message-ID: <4AFB9E1D.8040101@mit.edu>
On 11/11/09 11:57 PM, Aryeh Gregor wrote:
> A number of popular web apps output mostly well-formed XML, as far as
> I know: vBulletin, WordPress, etc.

I assume you meant "mostly" as in "most of the pages are well-formed", 
not "pages are mostly well-formed", since the latter is useless, right?

I did a brief survey of obvious sites fitting those descriptions that I 
had in my browser history at the moment.  These were not-well-formed:


These are:


So either you're looking at a totally different dataset or "mostly" is a 
bit of a stretch....

> Not even close to most websites, of course, but a significant number, I'd think.

Sure.  0.01% of all websites is a "significant number".  I just think 
it's broken often enough, and easy enough to break by accident, that 
relying on it working for screen scraping is not likely to be happening 
on a wide scale....

>> Yes, but browsers would have to add explicit support for it.
> That mostly defeats the point -- they could equally add explicit
> support for non-XML responseXML first.


> This makes it sound like if Wikipedia switches to HTML5 and isn't
> willing to break all screen-scrapers on principle, we'll have to use
> an obsolete but conforming doctype.

Or stop using HTML named entities, yes.

Received on Wednesday, 11 November 2009 21:33:17 UTC

This archive was generated by hypermail 2.4.0 : Wednesday, 22 January 2020 16:59:19 UTC