- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Mon, 2 Nov 2009 14:52:15 +0200
- To: Alexey Proskuryakov <ap@webkit.org>
- Cc: HTML WG <public-html@w3.org>
On Oct 31, 2009, at 01:10, Alexey Proskuryakov wrote: > As noted in <http://www.whatwg.org/specs/web-apps/current-work/#writing-xhtml-documents > >, there is no guarantee that authors can use character entity > references such as in XHTML, because XML parsers are not > required to process external DTD subsets. This works in at least > Firefox, Safari and Opera, but it's depressing that such a major > feature is not interoperable per the spec. The above is an oversimplification. There are three classes of documents: DTDless: Entities other than the 5 built-in ones must not "work" in these. Here we have interop: http://hsivonen.iki.fi/test/moz/entity-without-dtd.xhtml Known DTD: The browser pretends to have loaded the DTD from the network but actually does something else. Here we have interop, too, to the extent the list of known DTDs is the same: http://hsivonen.iki.fi/test/moz/entity-with-known-dtd.xhtml Bogus DTD: Here we don't have interop: Opera falls back to behaving like an XML parser that hasn't loaded the DTD. Gecko and WebKit resolve the bogus DTD to a zero-length stream and then let the XML parser proceed thinking it has read the DTD (hence invoking the clauses of the XML spec that make unknown entity refs fatal). Well, that's what Gecko does. I didn't check WebKit's code, but the black- box behavior is the same. http://hsivonen.iki.fi/test/moz/entity-with-bogus-dtd.xhtml IIRC, WebKit's known list of doctype doesn't cover the legacy MathML doctypes that Gecko's list covers and that are used in legacy content, so I think we should standardize Gecko's list--not WebKit's list--if we end up standardizing a list. If we standardize a list, there's the question if the list is a minimum list or the closed list forever. (See http://groups.google.com/group/mozilla.dev.tech.mathml/browse_thread/thread/e7f7efbb5e161348/9fde74f46fb0b5d2 ) Opera's behavior in the unknown DTD case is cleaner than Gecko's and WebKit's behavior from the XML spec POV, but I don't know if the off- the-shelf parsers used by Gecko and WebKit have enough API surface for that behavior. (I don't like it that Opera reportedly has put new doctypes on the list of known doctypes, though. I'm personally in the frozen list camp myself.) > I think that it's important to guarantee that character entity > references work in XHTML (even when parsing fragments, e.g. with > innerHTML - which doesn't currently work in Firefox or Safari, and > is confusing to authors). Test cases: http://hsivonen.iki.fi/test/moz/innerHTML-no-doctype.xhtml http://hsivonen.iki.fi/test/moz/innerHTML-xhtml1-doctype.xhtml Opera supports entities in innerHTML setter regardless of the doctype of the document. Gecko and WebKit don't support entities in the innerHTML setter. Frankly, I'm a bit annoyed to see Opera supporting entities here, because now we don't have a stable state and Gecko and WebKit may end up putting engineering cycles into tweaking stuff that's marginal on the Web scale, since it doesn't work in IE at all. Why does Opera support entities here? It seems logical (as far as the XML spec goes) not to support entities here. Authors who use application/xhtml+xml are explicitly asking for XML. If they don't want XML the way it is, they shouldn't ask for it. I think we shouldn't paper over the flaws of XML one by one. Instead, I think we should take XML 1.0 as it is until the time is ripe and XML Core does XML5 all at once (with all the MathML entities predefined, the tokenizer state machine borrowed from HTML5, non-Draconian tree builder, no DTDs, etc.). > For obvious performance reasons, it is impractical to ask UAs to > utilize validating XML parsers, so this guarantee may need to be > specified in a way that doesn't require full DTD support. There are three classes of XML processors: 1) Non-validating XML processors that don't process the external DTD subset. 2) Non-validating XML processors that process the external DTD subset. 3) Validating XML processors that process the external DTD subset. It's not a dichotomy between #1 and #3. -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Monday, 2 November 2009 12:53:02 UTC