- From: Aryeh Gregor <Simetrical+w3c@gmail.com>
- Date: Wed, 11 Nov 2009 23:16:02 -0500
I already filed a bug <http://www.w3.org/Bugs/Public/show_bug.cgi?id=8268>, but figured I'd copy it here to get more discussion. Wikipedia just experimented with switching to an HTML5 doctype. A lot of user tools broke, and after two hours of investigation, we determined that the problem is intractable and switched back to XHTML 1.0 Transitional. XMLHttpRequest was historically intended only for XML, and lots of scripts rely on the responseXML property being set to a Document. In current browsers, this only happens when the document is actually well-formed XML. But named entities are treated differently based on the doctype. Consider this document: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html><head> <title>Hello</title> </head> <body> <p> </p> </body> </html> This works just fine in all browsers I tested in (latestish versions of Firefox, Chrome, Opera). However, if you serve the exact same document but replace the doctype with <!DOCTYPE html>, all of them throw a syntax error on . Practically speaking, this means that any site that wants to serve content compatible with XHR cannot use either of the two doctypes that the spec recommends for authors. There are a variety of widely-used scripts on Wikipedia that rely on XHR, so this is currently a blocker for us. It's very unlikely that we'll deploy HTML5 in the foreseeable future if it means our users have to rewrite all their scripts. I'm pretty sure that XHR is used for screen-scraping beyond Wikipedia, too, so this will probably crop up elsewhere too. I don't know what the extent of the magic is that causes this problem. Could some reasonably minimal, distinctive doctype be invented that would avoid the problem but not make the document look to humans and validators like it thinks it's some old version of XHTML? If an existing XHTML doctype must be reused, should validators continue to raise warnings as they do now, or should an XHTML doctype be promoted from "obsolete permitted DOCTYPE" to a fully permitted doctype? Also, is this a wider problem? Are there any other tools besides browsers that might be magically allowing named entities for some doctypes only?
Received on Wednesday, 11 November 2009 20:16:02 UTC