- From: Robert Streich <streich@slb.com>
- Date: Sun, 22 Sep 96 03:31:16 CDT
- To: w3c-sgml-wg@w3.org
I put these statistics together a few days ago in the momentary confusion over document sizes. I was curious about the implications of various proposals on the size of a document. When the confusion was cleared up, I began wondering why we were concerned about being able to parse without the DTD. So, I decided to send this out anyway just to provide a few data points. The size of the sample of documents that I used to get these statistics was a little over 44 Mbytes of SGML in 138 documents. The average size of a document is 320 Kbytes. The range was from 2 Kbytes to 2.7 Mbytes. Empty end tags: The names of end tags comprise 15% of the total size of the document. I didn't try to differentiate, but this is probably due in large part to the number of tables in these docs (just over 5000 of them). I didn't try to determine the mode which would have been interesting but more work than I was willing to invest. The range was from 7.5% to 20.6%. "Pseudo-element" delimiters Eliminating mixed content as James and Charles have proposed, would increase the average document size in this set by 0.98% for every character used to bound the "pseudo-element." So, for example, if you had a single character name for the element, file size would increase by 6.86% (three characters in the start tag, four in the end tag. The range was from 0.5% to 1.9%. End tags on EMPTY elements Adding end-tags to EMPTY elements increased file size 0.9%. The range was from 0.3% to 3.3%. Attribute value literals Currently, all of the attribute values in these docs are quoted, i.e., attribute value literals. If I were to strip out all unnecessary quotes, I could reduce the file size by 2%. The range was from 0.3% to 4.2%. DTD size As is, our DTD, without comments, is 25 Kbytes. If I were to trim out all of the fat, and rebuild it with an emphasis on keeping size to a minimum, I'm sure I could get it to around 20 Kbytes. There are lots of declarations that could be paired up. At 20 Kbytes, the DTD is 6.3% the size of an average document. Stylesheets The DynaText stylesheets associated with these documents are 94 Kbytes, 29% of the size of an average document and 4.7 times the size of the DTD. There are some oddities in the DynaText stylesheet language that might make it more verbose than a corresponding DSSSL stylesheet, but I kind'a doubt it since there are also a lot of pieces that encapsulate a lot of behavior into single functions. Robert Streich streich@slb.com Schlumberger voice: 1 512 331 3318 Austin Research fax: 1 512 331 3760
Received on Sunday, 22 September 1996 04:31:40 UTC