[Bug 8268] New: XMLHttpRequest fails for documents with named entities due to doctype

http://www.w3.org/Bugs/Public/show_bug.cgi?id=8268

           Summary: XMLHttpRequest fails for documents with named entities
                    due to doctype
           Product: HTML WG
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Keywords: NE
          Severity: normal
          Priority: P2
         Component: HTML5 spec bugs
        AssignedTo: dave.null@w3.org
        ReportedBy: Simetrical+w3cbug@gmail.com
         QAContact: public-html-bugzilla@w3.org
                CC: ian@hixie.ch, mike@w3.org, public-html@w3.org


Wikipedia just experimented with switching to an HTML5 doctype.  A lot of user
tools broke, and after two hours of investigation, we determined that the
problem is intractable and switched back to XHTML 1.0 Transitional.

XMLHttpRequest was historically intended only for XML, and lots of scripts rely
on the responseXML property being set to a Document.  In current browsers, this
only happens when the document is actually well-formed XML.  But named entities
are treated differently based on the doctype.  Consider this document:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html><head>
<title>Hello</title>
</head>
<body>
<p>&nbsp;</p>
</body>
</html>

This works just fine in all browsers I tested in (latestish versions of
Firefox, Chrome, Opera).  However, if you serve the exact same document but
replace the doctype with <!DOCTYPE html>, all of them throw a syntax error on
&nbsp;.

Practically speaking, this means that any site that wants to serve content
compatible with XHR cannot use either of the two doctypes that the spec
recommends for authors.  There are a variety of widely-used scripts on
Wikipedia that rely on XHR, so this is currently a blocker for us.  It's very
unlikely that we'll deploy HTML5 in the foreseeable future if it means our
users have to rewrite all their scripts.  I'm pretty sure that XHR is used for
screen-scraping beyond Wikipedia, too, so this will probably crop up elsewhere
too.

I don't know what the extent of the magic is that causes this problem.  Could
some reasonably minimal, distinctive doctype be invented that would avoid the
problem but not make the document look to humans and validators like it thinks
it's some old version of XHTML?  If an existing XHTML doctype must be reused,
should validators continue to raise warnings as they do now, or should an XHTML
doctype be promoted from "obsolete permitted DOCTYPE" to a fully permitted
doctype?

Also, is this a wider problem?  Are there any other tools besides browsers that
might be magically allowing named entities for some doctypes only?


-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Thursday, 12 November 2009 03:55:08 UTC