Production 78 / Process failure in XML 1.1

In two different messages, Elliotte Rusty Harold writes:

> I noticed the following production in the latest proposed
> recommendation of XML 1.1:  
> 
> [78]     extParsedEnt     ::=     TextDecl? content - Char*
> RestrictedChar Char*
> 
> In XML 1.0 the production is
> 
> [78]     extParsedEnt     ::=     TextDecl? content
> 
> The rationale for this change is unclear to me. Could anyone explain
> this?

and 

> In direct contradiction to the W3C's advertised policies, the
> recently released XML 1.1 Proposed Recommedation makes a very
> substantive change since the candidate recommendation. The candidate
> recommendation used the following production for char:
> 
> [2]     Char    ::=    #x9 | #xA | #xD | [#x20-#x7E] | #x85 | [#xA0-#xD7FF]
>                       | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> 
> Note that control characters such as bell and vertical tab were not
> allowed. They could be inserted using character references as       
> indicated in section 4.1.
> 
> In the new proposed recommendation, however, this is no longer true.
> C0 controls can now be directly included in XML documents. Its Char
> production is:
> 
> [2]     Char     ::=     [#x1-#xD7FF] | [#xE000-#xFFFD] |
> [#x10000-#x10FFFF]  /* any Unicode character, excluding the surrogate
> blocks, FFFE, and FFFF. */
> [2a]    RestrictedChar    ::=    [#x1-#x8] | [#xB-#xC] | [#xE-#x1F] |
> [#x7F-#x84] | [#x86-#xBF]
> Despite production 2a, I can find no text in the spec that restricts
> these characters to being included only via character references and
> prevents them from being included literally.

I'm responding jointly because the answer to one message is (all
unawares) in the other message.  The whole point of the modification to
production 78 is precisely to forbid a text containing a RestrictedChar
from appearing in an extParsedEntity.  It seems clear to me that the
same modification should have been made to production 1, to forbid
RestrictedChars from appearing in document entities; I'm not sure at
this point why that didn't happen.

The Core WG believed that the current approach, where the prohibition of
RestrictedChars appearing directly is enforced very globally was more
likely to get corner cases correct than the CR version, which excluded
them from Char (like 1.0) but then allowed them as a special case in
character references.   In particular, since character references in
entity values are expanded greedily, a construction like 

	<!ENTITY esc "&#x1B;">

is illegal in the CR version, but legal in the PR version.

Disclaimer: this is my personal response as editor, not an official
Core WG response.

Therefore, this change is not substantive but editorial; it is a different
formulation that better captures the intent of allowing ISO controls in
character reference form.

-- 
Evolutionary psychology is the theory           John Cowan
that men are nothing but horn-dogs,             http://www.ccil.org/~cowan
and that women only want them for their money.  http://www.reutershealth.com
        --Susan McCarthy (adapted)              jcowan@reutershealth.com

Received on Sunday, 9 November 2003 15:47:18 UTC