XML 1.1: Non-ascii chars in XML/text declaration from Bjoern Hoehrmann on 2004-10-17 (xml-editor@w3.org from October to December 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Sun, 17 Oct 2004 11:41:28 +0200
To: xml-editor@w3.org
Message-ID: <41933622.186625473@smtp.bjoern.hoehrmann.de>

Dear XML Core Working Group,

  I am unable to find your public response that formally addresses an
issue raised by David Carlisle on the XML 1.1 Proposed Recommendation
which is publicly archived at

  http://lists.w3.org/Archives/Public/xml-editor/2003OctDec/0048.html

which you have apparently rejected (the Recommendation contains the same
apparently contradictory text). The relevant text in the Recommendation
is:

[...]
  To simplify the tasks of applications, the XML processor MUST behave
  as if it normalized all line breaks in external parsed entities
  (including the document entity) on input, before parsing, by
  translating all of the following to a single #xA character:

  [...]

  The characters #x85 and #x2028 cannot be reliably recognized and
  translated until an entity's encoding declaration (if present) has
  been read. Therefore, it is a fatal error to use them within the XML
  declaration or text declaration. 
[...]

I do not understand this either. Please point me to your response to
David which will hopefully answer the following questions:

  * Why is it (in theory) not possible to recognize these characters
    reliably or (in theory) with less reliability than recognizing
    any other character such as U+0020?

  * How can a processor detect this error if it is not possible to
    recognize the offending characters reliably?

  * How can a processor detect this error if it is not possible that
    these characters are present when parsing the XML declaration due
    to line break normalization?

regards.

Received on Sunday, 17 October 2004 09:42:17 UTC