- From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
- Date: Sat, 17 May 1997 02:05:44 -0700
- To: w3c-sgml-wg@w3.org
[Andrew Layman:] | I'm getting a lot of pushback from database people regarding this | point. They are very concerned that we make it possible for them to be | more economical in their encoding. Accomodating their needs means | opening up a whole additional category of XML user. I forgot one important category of existing user who would be inconvenienced by short tags if that were the only form allowed (this is relevant because I think that there shouldn't be options in this case). This is every current large-scale SGML user whose tool (like Adept) normalizes files when saving them by providing explicit end tags. There are a few hundred million pages of such stuff out there, and I don't want to make this the illegal form. (Yes, they have to filter for the new empty-tag syntax anyway, but this is an additional annoyance for tools vendors.) Note that no one ever complains about the extra space this normalization requires; on the contrary, most people consider this a feature. I do. The larger point is what gets traded for what is gained. I'm just not seeing the big savings in byte count that motivate this proposal. Simply adopting Unicode roughly doubles file sizes, and adopting UTF-8 as the default encoding adds roughly another 50 percent for the Asian users; I'm not hearing anyone complaining about these tradeoffs. Drop one extra GIF image onto a page and you've wiped out any conceivable reduction in size gained by short tags. And in the pure database examples, more space would have been saved by a smarter choice of GIs than by leaving them out of the end tags. I'm still trying to maintain an open mind on this one, but it's reminding me so much of all the articles written in the 1970s on how to save space in BASIC programs by leaving out the spaces between tokens that I'm having a difficult time shaking off the feeling that I've seen this particular movie already. Jon
Received on Saturday, 17 May 1997 05:06:14 UTC