- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Date: Fri, 26 Jul 2002 12:28:52 -0400
- To: xml-dev@lists.xml.org
- Cc: www-xml-blueberry-comments@w3.org
At 9:06 AM -0400 7/26/02, John Cowan wrote: >For that matter, the Java situation is not open and shut either. >Although in Java it is guaranteed that '\n' == '\013', which is not >guaranteed in C, the specific encoding employed by PrintStream to print >characters is explicitly platform-specific, and it is not unreasonable >for a Java implementation to output a NEL when it is asked to print '\n'. Anybody using a PrintStream to do serious work deserves the bugs they get. They've got problems before they even start thinking about XML. In fact, I wrote one 600 page book inspired mostly by exactly the problems with PrintStream. Good code uses the other stream and writer classes, in which this behavior is unambiguously specified. >But to meet your larger point, there is nothing inappropriate in the use >of 8-bit functions in XML processing. XML parsers that return UTF-8 are >not unknown, and every XML file I generate for publication (~200 a day) >is generated with 8-bit operations, and is either in UTF-8 or in 8859-1 >(properly labeled). > Do you really mean to suggest that using UTF-8 code points as C chars is adequate? I suppose you could do that, but it most certainly is not convenient and completely fails your stated goal of making XML files plain text files. You're basically suggesting we treat them as binary data rather than text. >> All of the other functions we're talking about are similar. Even with >> NEL, you still shouldn't be using these to process XML. OS/390 needs >> to get some modern libraries. XML does not need to change. > >The issue remains: XML files on the mainframe are not plaintext files >according to local conventions. Yes, that's true and the issue is *much* broader than merely adding NEL to the white space production. Even if we do this, XML files on mainframes will still not be plain text files. Adding NEL won't fix the problem. This whole notion of the "plain text" file may be a red herring. The community has realized over the last several years, that calling XML files plain text, really isn't accurate on any platform. Hence the move from text/xml to application/xml. >XML processing is specified to be done in terms of LF only, with all >other line-terminator conventions translated to LF. Suppose this >had not been done, and all XML storage representations had been >defined to require LF only. "What about Windows?" "Oh well, they >can run an external program to convert CR/LF to LF before parsing, >and LF to CR/LF after generation." If that had been the story, there >damned well would be no significant amount of XML on Windows. >You can rearrange this story using any line terminator and OS you like. You're confusing issues by merging together two different time frames: before and after XML 1.0 was released. Had IBM raised this issue during the development of XML, it could have been considered on different grounds. They failed to do so, and I see no justification for reopening the case now. It is far more important for XML to remain stable, than to allow a miniscule number of users (possibly as few as zero) not to upgrade their software to something that supports XML 1.0 conventions. I find it completely reasonable to ask editors and other tools to support the line ending conventions of the files they're editing. I do this routinely on Mac, Windows, and Unix. I find it hard to believe that it is so much more difficult for mainframe programmers to do this. >Mainframes and EBCDIC are far from dead. XML 1.0 Appendix F makes a >point of talking about how to autodetect EBCDIC encodings, for example; >there is no reason why XML files can't start 4C 6F A7 94. >There is no reason not to convert the occasional 0x15 (or 0x85 in >the ASCII-compatible encoding) to an XML end of line, either. Airline reservation clerks and bank tellers don't count. They never see the XML. How many actual users are their writing raw XML who have problems? So far I haven't seen any. A programmer generating XML from code can easily specify the line ending that XML requires. A programmer reading XML through a parser will just see line feeds anyway. You're trying to fix a non-existent problem. >Speaking for myself and not necessarily the Core WG, I agree that there >is no need to redefine the S production, merely to do line-terminator >mapping on input. IMHO, there is no reason for #xD to be part of S >either, as all real CRs are already mapped away, and having #xD be >part of S serves only to allow very strange abuse of character >references in entities containing attribute values and the like. >However, I am certainly not suggesting that #xD be removed from S. > Again, it's a time frame issue. We are not discussing what XML would be in an ideal world, had we known everything in 1996 that we know now. We are discussing what is best to do now. Failing to add NEL, in no way justifies removing CR. -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | XML in a Nutshell, 2nd Edition (O'Reilly, 2002) | | http://www.cafeconleche.org/books/xian2/ | | http://www.amazon.com/exec/obidos/ISBN%3D0596002920/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.cafeconleche.org/ | +----------------------------------+---------------------------------+
Received on Friday, 26 July 2002 12:36:48 UTC