- From: Steven J. DeRose <sjd@eps.inso.com>
- Date: Tue, 20 May 1997 15:07:28 -0400
- To: w3c-sgml-wg@w3.org
At 12:08 AM 05/19/97 +0900, Weichel Bernhard (K3/EES4) wrote: >I am strongly supporting SD1 - Short End Tags, primarily with respect to >the useage of XML to exchange databases. In some examples from my desk - >we are exchanging data for engine management system - I was saving 40% >size of the instance by simply shorten the end tags. This sounds a bit like "I really want smaller heads on these hammers you're selling -- I'm using them to turn screws and they're just not very efficient." Usually if 80% of your files is markup (roughly that, since by shortening half the tags you say you're saving 40%), there's a bigger problem going on. The biggest savings come if you've got a whole lot of little tiny fields. Little tiny fields seldom have much internal structure; we're usually talking about integers, dates, social security and phone numbers, etc. In such cases, you're a whole lot better off sticking them in attributes. Conceptually, it makes at least as much sense; and attributes are good at reprenting little chunks of info that each have little internal structure; and SGML and XML can validate a number of datatypes for attributes (and, perhaps will add a few), whereas it is not possible even to require that #PCDATA content *exist*. To me, the big problem is not that you have to give the GI twice. It's that you have to give it on every *instance*, which simply doesn't make sense for RDBs. If you're shipping 100 records with 10 fields each, you don't want to just get from 2000 GIs to 1000 GIs. You want to get to 10 GIs. The total costs for different forms, given an n-char GI are: <gi> </gi> or 5+2n chars per field <gi> </> or 5+n chars per field <gi/ / or 3+n chars per field gi=" " or 5+n chars per field |gi or 1 chars per field. Maybe I'm an incorrigible CS nerd, but I find O(k) a lot better than O(n), and O(n) not very much better than O(2n). You can save the 2n vs. n amount just by shortening GIs, and most modems will compress it away anyway. Doesn't seem worth it, esp. since we have a known constituency that it will hurt (the DPH, not the parser writers, of course). Saving part of the end-tag is just a patch; if we really want to address this case we should save an awful lot more than that, and it seems to me that is best done some more radical way if at all. Steven J. DeRose, Ph.D., Chief Scientist Inso Electronic Publishing Solutions (formerly EBT)
Received on Tuesday, 20 May 1997 15:10:32 UTC