[Bug 8268] XMLHttpRequest fails for documents with named entities due to doctype from bugzilla@wiggum.w3.org on 2009-11-12 (public-html-bugzilla@w3.org from November 2009)

From: <bugzilla@wiggum.w3.org>
Date: Thu, 12 Nov 2009 04:00:21 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1N8Qr7-0001o3-Te@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8268





--- Comment #1 from Michael(tm) Smith <mike@w3.org>  2009-11-12 04:00:21 ---
Aryeh Gregor:
> Wikipedia just experimented with switching to an HTML5 doctype.  A lot of user
> tools broke, and after two hours of investigation, we determined that the
> problem is intractable and switched back to XHTML 1.0 Transitional.
> 
> XMLHttpRequest was historically intended only for XML, and lots of scripts rely
> on the responseXML property being set to a Document.  In current browsers, this
> only happens when the document is actually well-formed XML.  But named entities
> are treated differently based on the doctype.  Consider this document:
> 
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
> <html><head>
> <title>Hello</title>
> </head>
> <body>
> <p>&nbsp;</p>
> </body>
> </html>
> 
> This works just fine in all browsers I tested in (latestish versions of
> Firefox, Chrome, Opera).  However, if you serve the exact same document but
> replace the doctype with <!DOCTYPE html>, all of them throw a syntax error on
> &nbsp;.
> 
> Practically speaking, this means that any site that wants to serve content
> compatible with XHR cannot use either of the two doctypes that the spec
> recommends for authors.  There are a variety of widely-used scripts on
> Wikipedia that rely on XHR, so this is currently a blocker for us.  It's very
> unlikely that we'll deploy HTML5 in the foreseeable future if it means our
> users have to rewrite all their scripts.  I'm pretty sure that XHR is used for
> screen-scraping beyond Wikipedia, too, so this will probably crop up elsewhere
> too.
> 
> I don't know what the extent of the magic is that causes this problem.  Could
> some reasonably minimal, distinctive doctype be invented that would avoid the
> problem but not make the document look to humans and validators like it thinks
> it's some old version of XHTML?  If an existing XHTML doctype must be reused,
> should validators continue to raise warnings as they do now, or should an XHTML
> doctype be promoted from "obsolete permitted DOCTYPE" to a fully permitted
> doctype?
> 
> Also, is this a wider problem?  Are there any other tools besides browsers that
> might be magically allowing named entities for some doctypes only?
> 

[no comment, just repeating the problem description for purposes of echoing it
to public-html]


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 12 November 2009 04:00:30 UTC