- From: John Cowan <cowan@mercury.ccil.org>
- Date: Sun, 9 Nov 2003 15:47:07 -0500
- To: xml-dev@lists.xml.org
- Cc: elharo@metalab.unc.edu, xml-editor@w3.org
In two different messages, Elliotte Rusty Harold writes: > I noticed the following production in the latest proposed > recommendation of XML 1.1: > > [78] extParsedEnt ::= TextDecl? content - Char* > RestrictedChar Char* > > In XML 1.0 the production is > > [78] extParsedEnt ::= TextDecl? content > > The rationale for this change is unclear to me. Could anyone explain > this? and > In direct contradiction to the W3C's advertised policies, the > recently released XML 1.1 Proposed Recommedation makes a very > substantive change since the candidate recommendation. The candidate > recommendation used the following production for char: > > [2] Char ::= #x9 | #xA | #xD | [#x20-#x7E] | #x85 | [#xA0-#xD7FF] > | [#xE000-#xFFFD] | [#x10000-#x10FFFF] > > Note that control characters such as bell and vertical tab were not > allowed. They could be inserted using character references as > indicated in section 4.1. > > In the new proposed recommendation, however, this is no longer true. > C0 controls can now be directly included in XML documents. Its Char > production is: > > [2] Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | > [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate > blocks, FFFE, and FFFF. */ > [2a] RestrictedChar ::= [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] | > [#x7F-#x84] | [#x86-#xBF] > Despite production 2a, I can find no text in the spec that restricts > these characters to being included only via character references and > prevents them from being included literally. I'm responding jointly because the answer to one message is (all unawares) in the other message. The whole point of the modification to production 78 is precisely to forbid a text containing a RestrictedChar from appearing in an extParsedEntity. It seems clear to me that the same modification should have been made to production 1, to forbid RestrictedChars from appearing in document entities; I'm not sure at this point why that didn't happen. The Core WG believed that the current approach, where the prohibition of RestrictedChars appearing directly is enforced very globally was more likely to get corner cases correct than the CR version, which excluded them from Char (like 1.0) but then allowed them as a special case in character references. In particular, since character references in entity values are expanded greedily, a construction like <!ENTITY esc ""> is illegal in the CR version, but legal in the PR version. Disclaimer: this is my personal response as editor, not an official Core WG response. Therefore, this change is not substantive but editorial; it is a different formulation that better captures the intent of allowing ISO controls in character reference form. -- Evolutionary psychology is the theory John Cowan that men are nothing but horn-dogs, http://www.ccil.org/~cowan and that women only want them for their money. http://www.reutershealth.com --Susan McCarthy (adapted) jcowan@reutershealth.com
Received on Sunday, 9 November 2003 15:47:18 UTC