- From: David Carlisle <davidc@nag.co.uk>
- Date: Thu, 17 Jan 2013 21:02:55 +0000
- To: whatwg@lists.whatwg.org
On 17/01/2013 18:58, Ian Hickson wrote: > On Thu, 17 Jan 2013, David Carlisle wrote: >> >> By adding >> >> "-//W3C//ENTITIES HTML MathML Set//EN//XML" >> >> To the list in >> >> 13.2 Parsing XHTML documents >> >> Of Identifiers that are recognised when parsing XHTML syntax documents. > > What problem does this solve? We tried to spell out various problems in the referenced document at http://www.w3.org/2003/entities/2007doc/xhtmlpubid.html But basically it solves the problem that the existing list leads to a situation where data corruption and user confusion are both inevitable as the only way to enable entities to be loaded into a an xhtml agent is to reference a DTD that defines a different incompatible set of entities. > > >> The current list gives no way to specify the identifier of a compatible >> set of entity definitions so makes it highly likely that documents will >> be interpreted differently by an XHTML user agent and a standard XML >> toolchain. > > I do not understand what this means. Can you give an example? Yes. If for example you use ⟬ then in an XHTML User Agent if you specify one of the blessed DTD Identifiers the HTML entity set will be loaded and the entity will expand to U+27EC (MATHEMATICAL LEFT WHITE TORTOISE SHELL BRACKET) as intended however this character was added at Unicode 5.1 years after MathML2 and XHTML 1 specifically to support this character so the definitions in the legacy DTD are different. Currently you have to specify the XHTML 1 DTD or MathML 2 DTD. If you use the former then in any (normally configured) xml toolchain you will get the XHTML 1 DTD the entity will not be defined and the entire document is rejected with a fatal error. If you specify the latter then the MathML2 DTD will be loaded and the entity will expand to the Asian punctuation character U+3018 (LEFT WHITE TORTOISE SHELL BRACKET). The sole purpose of the requested chain is to allow the document to reference a set of entity definitions that matches the definitions that will be used in the browser. > > > Fundamentally, I'd rather be removing these magic strings than adding > more. If there's a compatibility need, then we should add it, but if the > browsers don't already support the string, then there's no compat need > that I can see. It _used_ to be possible to reference a usable dtd. The MathML2 spec worked in Firefox (every version up to 3) and Internet explorer and any other browser of the period that I was aware of. It was your first drafts of html(5) that introduced this bug by restricting the doctype handling in a way that excluded any DTD that defined the correct set of entities. Currently browsers have converged on that erroneous list. There is something very broken with the process if it is impossible to fix bugs in the spec if some implementations implement the broken spec text. There is more to compatibility than compatibility between the browsers. For XHTML there needs to be compatibility between Browsers and XML tools (otherwise why use XML at all, I know you would rather people didn't but so long as the spec allows then to it should not mandate a situation that makes document corruption so likely). David
Received on Thursday, 17 January 2013 21:05:16 UTC