- From: Rick Jelliffe <ricko@topologi.com>
- Date: Mon, 11 Feb 2002 20:38:34 -0500 (EST)
- To: <www-tag@w3.org>
The current XML 1.1 draft and discussions by the XML Core WG alters XML 1.0 to: 1) Allow NUL and control characters: such XML would not be compatable with MIME text/* and typical C APIs. 2) Allow almost any characters as name characters: such XML names would not be compatible with languages that follow Unicode's guidelines on characters that are suitable for markup. 3) Allow characters at which line-breaking can occur in names: a document innocently made with these could be corrupted merely by opening the data in a text editor which follows the Unicode properties for line-wrapping. (Of course, adding extra whitespace is possible at the moment when opening in auto-wrapping editors, but not going from WF to non-WF.) 4) Reduce the ability of XML to catch encoding errors, for the particular case of encoding errors where the real encoding and the nominal encoding are mutually feasible and non-ASCII markup has been used and the names are being used by some generic processing system (e.g. names used in IDs or IDREFs in any case, names for elements and attributes and enumerations used by non-validating systems.) That the Core WG feel free to ignore constraints coming in from IETF, Unicode, and the existing technologogical base, shows a serious problem with either the XML 1.1 Requirements document, which does not mention the outside world, or with the Core WG's understanding of how an interchange technology such as XML operates: that maximum compatibility is essential. The XML Core WG has decided not to discuss any individual character problems. In doing this they are refusing to look at any evidence; instead they wish to treat XML as something that can be treated in isolation. However, almost no part of XML can be justified in isolation. The idea that XML should be treated as merely a serialization format for any Unicode database, which is where the Core WG is surely heading, can only lead (and in fact is leading, in XML 1.1 draft) to the removal of any features for XML needed for support of editing XML as text, for human useability, or for early catching of encoding errors. I believe this is fundamentally an organization/architecture problem. The XML Core WG may indeed feel that architecture issues are now the TAG's domain, and they somehow are bound to ignore pragmatic issues. I call on the TAG to give guidance to the XML Core WG, and to ask the Core WG to add to any list of design principles they have for XML the following: 1) XML is a text format. 2) Any XML document should be able to be sent as text/xml 3) Any WF XML document should be able to be opened in a text editor for the encoding of that document and not become non-WF merely because the text editor has followed Unicode guidelines for its line-wrapping. 4) Binary data should be sent using Hex or Bin64 encoding as provided by XML Schemas. 5) In XML documents, control characters have their direct significance, and are not "data". For example, the presence of a flow control character in an XML stream is an inband signal and do not form part of the text of the document. 6) That support for as strong-as-possible detection of encoding errors is critical for the current state of technology. In this regard, I note that the introduction of the Euro means that for Western European documents it is no longer workable merely to work in CP1252 (Windows "ANSI") and then relabel the document "ISO8859-1", as can be done now if only the 8859-1 characters are used. Some transcoding libraries will correctly detect that 0x80 (Euro in CP1252) is not in ISO 8859-1, but many will not. So the Core WG's decision to remove as many checks has particularly bad timing. I believe this is a matter that should be dealt with sooner rather than later. If W3C is dumping XML as text, then the user community should be told, and have the rationale presented on a character-by-basis why new problems are not being introduced. If the W3C is not dumping XML as text, then the Core WG needs to be informed so in order to approach the NEL and Unicode 3.1. issue. Furthermore, it is clear from private communication that members of the Core WG believe that XML 1.1. is not a temporary fix to particular problems, but a permanent solution which any XML 2.0 would also adopt. This greatly increases the significance of XML 1.1, from being a hack to overcome some temporary problems to being an important arhitectural decision which may favour some commercial members of W3C more than the interests of the general public. Cheers Rick Jelliffe
Received on Tuesday, 12 February 2002 08:30:17 UTC