New Issue request for TAG: Where does XML fit into the Web Architecture

The current XML 1.1 draft and discussions by the XML Core WG alters XML 1.0 to:

1) Allow NUL and control characters: such XML would not be compatable
with MIME text/* and typical C APIs. 

2) Allow almost any characters as name characters: such XML names would not be
compatible with languages that follow Unicode's guidelines on characters that
are suitable for markup.

3) Allow characters at which line-breaking can occur in names: a document innocently
made with these could be corrupted merely by opening the data in a text editor which
follows the Unicode properties for line-wrapping.  (Of course, adding extra
whitespace is possible at the moment when opening in auto-wrapping editors, but not
going from WF to non-WF.)

4) Reduce the ability of XML to catch encoding errors, for the particular case
of encoding errors where the real encoding and the nominal encoding are mutually
feasible and non-ASCII markup has been used and the names are being used by
some generic processing system (e.g. names used in IDs or IDREFs in any case, 
names for elements and attributes and enumerations used by non-validating systems.)

That the Core WG feel free to ignore constraints coming in from IETF, Unicode, 
and the existing technologogical base, shows a serious problem with either the XML 1.1 Requirements document, which does not mention the outside world, or with the Core WG's understanding of how an interchange technology such as XML operates: that maximum compatibility is essential.  

The XML Core WG has decided not to discuss any individual character problems. 
In doing this they are refusing to look at any evidence; instead they wish to treat
XML as something that can be treated in isolation. However, almost no part of
XML can be justified in isolation. The idea that XML should be treated as merely
a serialization format for any Unicode database, which is where the Core WG is surely
heading, can only lead (and in fact is leading, in XML 1.1 draft) to the removal
of any features for XML needed for support of editing XML as text, for human useability,
or for early catching of encoding errors. 

I believe this is fundamentally an organization/architecture problem. The XML Core WG 
may indeed feel that architecture issues are now the TAG's domain, and they somehow are bound to ignore pragmatic issues. 

I call on the TAG to give guidance to the XML Core WG, and to ask the Core WG
to add to any list of design principles they have for XML the following:

1) XML is a text format.

2) Any XML document should be able to be sent as text/xml

3) Any WF XML document should be able to be opened in a text editor for the encoding
of that document and not become non-WF merely because the text editor has followed 
Unicode guidelines for its line-wrapping.

4) Binary data should be sent using Hex or Bin64 encoding as provided by XML Schemas.

5) In XML documents, control characters have their direct significance, and are not
"data".  For example, the presence of a flow control character in an XML stream is 
an inband signal and do not form part of the text of the document.

6) That support for as strong-as-possible detection of encoding errors is critical
for the current state of technology.  In this regard, I note that the introduction of
the Euro means that for Western European documents it is no longer workable
merely to work in CP1252 (Windows "ANSI") and then relabel the document
"ISO8859-1", as can be done now if only the 8859-1 characters are used.  
Some transcoding libraries will correctly detect that 0x80 (Euro in CP1252) 
is not in ISO 8859-1, but many will not. So the Core WG's decision to 
remove as many checks has particularly bad timing.

I believe this is a matter that should be dealt with sooner rather than later.
If W3C is dumping XML as text, then the user community should be told,
and have the rationale presented on a character-by-basis why new problems
are not being introduced.   If the W3C is not dumping XML as text, then the
Core WG needs to be informed so in order to approach the NEL and 
Unicode 3.1. issue.

Furthermore, it is clear from private communication that members of the Core
WG believe that XML 1.1. is not a temporary fix to particular problems,
but a permanent solution which any XML 2.0 would also adopt.  This greatly
increases the significance of XML 1.1, from being a hack to overcome 
some temporary problems to being an important arhitectural decision
which may favour some commercial members of W3C more than the
interests of the general public.

Cheers
Rick Jelliffe

Received on Tuesday, 12 February 2002 08:30:17 UTC