- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 11 Feb 2010 11:13:16 +0000 (UTC)
- To: HTML WG <public-html@w3.org>
On Fri, 30 Oct 2009, Alexey Proskuryakov wrote: > > As noted in > <http://www.whatwg.org/specs/web-apps/current-work/#writing-xhtml-documents>, > there is no guarantee that authors can use character entity references > such as in XHTML, because XML parsers are not required to process > external DTD subsets. This works in at least Firefox, Safari and Opera, > but it's depressing that such a major feature is not interoperable per > the spec. HTML5 now attempts to navigate the XML spec in a manner that encourages interoperability here as much as possible without strictly violating XML's requirements on the matter. > I think that it's important to guarantee that character entity > references work in XHTML Insofar as XML allows us to guarantee interoperability at all, I have now done so. > (even when parsing fragments, e.g. with innerHTML - which doesn't > currently work in Firefox or Safari, and is confusing to authors). I have not done this; innerHTML on elements does not support entities in XML documents. In general I would discourage use of this API, and use of true XHTML in general is pretty rare, so it doesn't seem worth the additional potential engineering cost to add this. On Mon, 2 Nov 2009, Henri Sivonen wrote: > > There are three classes of documents: > > DTDless: Entities other than the 5 built-in ones must not "work" in > these. Here we have interop: > http://hsivonen.iki.fi/test/moz/entity-without-dtd.xhtml > > Known DTD: The browser pretends to have loaded the DTD from the network > but actually does something else. Here we have interop, too, to the > extent the list of known DTDs is the same: > http://hsivonen.iki.fi/test/moz/entity-with-known-dtd.xhtml > > Bogus DTD: Here we don't have interop: Opera falls back to behaving like > an XML parser that hasn't loaded the DTD. Gecko and WebKit resolve the > bogus DTD to a zero-length stream and then let the XML parser proceed > thinking it has read the DTD (hence invoking the clauses of the XML spec > that make unknown entity refs fatal). Well, that's what Gecko does. I > didn't check WebKit's code, but the black-box behavior is the same. > http://hsivonen.iki.fi/test/moz/entity-with-bogus-dtd.xhtml A strict reading of the text in the spec now implies Opera's behaviour, I believe. I can change that if people think we should make all external entities resolve; it seemed like that would cross the line into violating XML more explicitly, which is why I avoided doing this. > IIRC, WebKit's known list of doctype doesn't cover the legacy MathML > doctypes that Gecko's list covers and that are used in legacy content, > so I think we should standardize Gecko's list--not WebKit's list--if we > end up standardizing a list. The only difference was "-//W3C//DTD MathML 2.0//EN" which was only present in Gecko's list: http://trac.webkit.org/browser/trunk/WebCore/dom/XMLTokenizerLibxml2.cpp#L1245 http://mxr.mozilla.org/mozilla-central/source/parser/htmlparser/src/nsExpatDriver.cpp#287 I've used Gecko's list here. This is probably a violation of MathML's rules too, though I haven't mentioned that in the spec. > If we standardize a list, there's the question if the list is a minimum > list or the closed list forever. (See > http://groups.google.com/group/mozilla.dev.tech.mathml/browse_thread/thread/e7f7efbb5e161348/9fde74f46fb0b5d2 > ) Given that the entity list is no longer growing, and that DTDs are no longer useful other than for entities, I've made it a closed list. The list of entities is the complete list of entities supported in text/html, for all public identifiers. This will cause a small memory footprint increase in WebKit, though that hit would be taken anyway when implementing the HTML5 parser. It will also cause some engineering cost to Gecko and Opera to avoid a performance regression (since their parsers parse the external subset each time), though for optimal performance in XML modes, such engineering work would likely be needed anyway. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 11 February 2010 11:13:45 UTC