Re: [xml-dev] version numbers and infosets from Elliotte Rusty Harold on 2002-07-26 (www-xml-blueberry-comments@w3.org from July 2002)

From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
Date: Fri, 26 Jul 2002 07:47:40 -0400
To: <xml-dev@lists.xml.org>
Cc: <www-xml-blueberry-comments@w3.org>
Message-Id: <p04330101b966e3597e1c@[192.168.254.4]>

At 3:03 PM +1000 7/26/02, Rick Jelliffe wrote:

>In http://www.w3.org/TR/newline the three use cases are:
>

Looking at this document, I note that its author(s) had some serious 
misconceptions about XML. For example, they state:

Well-formed but invalid - because the [NEL] character appears in 
element content:
<a>[NEL]<b/>[NEL]</a>
where the corresponding DTD contains
<!ELEMENT b EMPTY> <!ELEMENT a (b)>

In fact, however, the second example is invalid with or without 
allowing NEL. Valid elements declared empty may not contain white 
space.

A similar misconception is seen later when the authors state:

\n printf output: OS/390 C or Java program	[NEL]

This may be true in C. It is not true in Java. In Java \n always 
results in a linefeed. If it's producing a NEL on OS/390, then the 
OS/390 JVM is not conformant to the Java spec either.

>>  Using native system string functions, such as atoi and atof, to 
>>convert XML strings, documents, or fragments, to other data types

This really goes to the heart of the problem: atoi and atof are ASCII 
functions that are simply not suitable for Unicode-based XML 
regardless of what we do with NEL. The atof() signature is:

double atof(const char \nptr);

It's been a while since I've written C, but my recollection is that 
the char type is always one-byte wide. Processing XML in C requires 
using different kinds of wide chars and wide string types. You can't 
use native system string functions to work with XML data because XML 
data is Unicode, not ASCII. For instance, in the Apache Xerces-C DOM 
"String is represented by 'XMLCh*' which is a pointer to unsigned 16 
bit type holding utf-16 values, null terminated." Other schemes are 
possible. However, you simply cannot use C's traditional 1-byte 
strings and characters and their associated functions. This is not an 
OS/390 issue. It is a C issue. The same is true on Windows, Mac OS, 
Unix, and every other platform that uses C.

All of the other functions we're talking about are similar. Even with 
NEL, you still shouldn't be using these to process XML. OS/390 needs 
to get some modern libraries. XML does not need to change. If 
mainframe programmers think that NEL is the only problem they have, 
they are sorely mistaken. IBM is asking us to break XML for many 
thousands of users for something that won't even fix their own 
problems. Short of moving XML to ASCII (a solution we all rightly 
abhor), the only way to solve the OS/390 problem is to fix OS/390. 
XML *cannot* be fixed enough to make XML usable on OS/390 in the way 
IBM wants.
-- 

+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
|          XML in a  Nutshell, 2nd Edition (O'Reilly, 2002)          |
|              http://www.cafeconleche.org/books/xian2/              |
|  http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/  |
+----------------------------------+---------------------------------+
|  Read Cafe au Lait for Java News:  http://www.cafeaulait.org/      |
|  Read Cafe con Leche for XML News: http://www.cafeconleche.org/    |
+----------------------------------+---------------------------------+

Received on Friday, 26 July 2002 08:15:40 UTC