- From: Elliotte Rusty Harold <elharo@metalab.unc.edu>
- Date: Sat, 15 Dec 2001 09:08:27 -0400
- To: "Champion, Mike" <Mike.Champion@SoftwareAG-USA.com>, xml-dev@lists.xml.org
At 7:57 PM -0700 12/14/01, Champion, Mike wrote: >I'm out of my depth here, but this argument doesn't smell right to me. I >thought we concluded in the massive Blueberry thread a few months back that >#x85 probably should have been included in the S production in the first >place, and wasn't mainly because of a lack of mainframe expertise among the >members of the original WG. No, we didn't conclude that. A lot of us thought then and still think that XML 1.0 got this right, that #x85 should not have been part of the S production and still shouldn't be. >pragmatism and leave them out. BUT there is an IMMENSE amount of data in >mainframe databases that will probably be exposed via XML one day. It's not >IBM that will pay the cost of debugging all the programs that neglect to >translate #x85 into a politically correct separator when exposing these >legacy systems as web services. And it is potentially OUR bank accounts and >insurance policies in these legacy systems that are vulnerable to someone >getting this wrong. > And exactly *none* of this data is in XML. If you want to take it out of the database and put it in XML, then it must be translated with or without XML 1.1. The same is true of Oracle, FileMaker, SQL: Server, and all other legacy database products on the market. It is trivial to translate #x85 to #xA or #xD or both in the process. However, even that isn't necessary! #x85 is allowed in character data; i.e. in element content and attribute nodes, today, with XML 1.0. All fields from IBM's databases that contain #x85 characters can be included in XML 1.0 documents without translations. The only place you can't put #x85 is in tags between element names and attributes and attributes and other attributes. The issue is not IBM databases and never has been. The issue is that IBM has some brain damaged text editors that insert a #x85 every time you hit the return key instead of inserting a #xA or #xD or both. Files created with these editors are not well-formed XML without an additional conversion pass. Similarly, IBM has some programming languages and tools that generate a #x85 when they do a println() or that language's equivalent. That's all. This has nothing to do with letting data move from IBM databases into XML. It has everything to do with IBM not wanting to update their software to the standards the rest of the world has been using for more than 20 years. Worst of all, IBM wants to start shipping around XML documents they generate with these strange line ending characters that will not behave appropriately in the installed base of software the rest of the world is using. I'm not just talking about XML here, but much more broadly installed things like text editors and programming languages. For instance, suppose an IBM tool generates a start-tag like this using #x85: <name att1="value" att2="value" att3="value" > Looks like well-formed ASCII right? But it's not. Here's what you'll see if you open up the document containing that tag on a typical Windows text editor: <name... att1="value"... att2="value"... att3="value"...> (Actual ellipsis characters will be used instead of three periods, but you get the idea.) Open it on a Mac and all the ellipses will change into O with two dots above instead. This isn't just a question of recognizing the right encoding. It's a question of attaching the right semantics to the characters. #x85 isn't just another character. It's a character with special meaning for many text-processing systems. Unfortunately IBM has chosen to assign different semantics to this character than pretty much everyone else in the world. Even if the document is labeled as ISO-8859-1 and the editor recognizes that and can tell that #x85 is not a graphics character, it still won't break the lines when it sees #x85! -- +-----------------------+------------------------+-------------------+ | Elliotte Rusty Harold | elharo@metalab.unc.edu | Writer/Programmer | +-----------------------+------------------------+-------------------+ | The XML Bible, 2nd Edition (Hungry Minds, 2001) | | http://www.ibiblio.org/xml/books/bible2/ | | http://www.amazon.com/exec/obidos/ISBN=0764547607/cafeaulaitA/ | +----------------------------------+---------------------------------+ | Read Cafe au Lait for Java News: http://www.cafeaulait.org/ | | Read Cafe con Leche for XML News: http://www.ibiblio.org/xml/ | +----------------------------------+---------------------------------+
Received on Saturday, 15 December 2001 09:14:50 UTC