- From: Rick Jelliffe <ricko@topologi.com>
- Date: Tue, 14 Jan 2003 00:55:29 +1100
- To: <www-xml-blueberry-comments@w3.org>
The 1.1 spec has an ambiguity: it is not clear whether the various productions for characters apply to the characters that can appear in the infoset or the characters that can appear in the text of a document. Until 1.1, I don't think this was a problem, because no such distinction existed. In 1.1, the control characters are in this category. I believe the correct way out is: 1) To clarify that the productions relate to the characters that can appear in the infoset 2) To make control character rejection a part of input conditioning. This may be best done by renaming s2.11 "End-of-Line and Control Code Handling" and adding the following text: "It is a well-formedness error for control characters (characters in the range 0x00 to 0x1F and 0x7F to 0x9F) to appear in an external parsed entity, with the exception of the whitespace characters in the previous paragraph. Control characters, except 0x00, must be marked up using numeric characer references." 3) To reviseproduction 2 to be [2] Char ::= [#01 - #xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] I believe the WG could fairly claim that is a change in expression not in intent, and so perhaps does not require a new CR? It would also prevent the horrible problem of the current formulation that control characters cannot appear in entities without causing a WF error, and would have to be escaped (and re-escaped for each level of depth of the entity reference.) It is clearly bogus to expect the expression of a character to be dependent on its use in references &amp;amp;#01;! Furthermore, it is against ISO 8879. Furthermore it is unusable and confusing for people. Cheers Rick Jelliffe P.S. Another approach, which may fit in with some implementations better, would be to revise production 2 to be [2] Char ::= [#01 - #x10FFFF] then add a disconnected production for the allowed values of an external parsed entity [x] EPE_Char ::= #x9 | #xA | #xD | [#x20-#x7E] | #x85 | [#xA0-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] and then add a disconnected production for the allowed values for the value of an NCR [x] NCR_Char ::= [#x01-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] In other words, make it clear that you don't need to to multiple checks: just when you bring a character in, and when you derefence a NCR.
Received on Monday, 13 January 2003 08:55:21 UTC