- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Mon, 19 May 1997 09:44:20 -0400
- To: andrewl@microsoft.com
- CC: w3c-sgml-wg@w3.org
>Where they do become important is when XML is machine-generated as a >transport protocol by an automated process. For example, it is very >important to me to consider using XML as a format for getting results >back from database queries. They might be financial records, electronic >commerce records, purchase orders, etc. These are neither written by >humans nor meant to be read by humans. In many of these cases, the >volume of data is large, but is mainly short fields, so the overhead of >lengthy tags is pretty high relative to the basic data. I'm getting a >lot of pushback from database people regarding this point. They are very >concerned that we make it possible for them to be more economical in >their encoding. Accomodating their needs means opening up a whole >additional category of XML user. I think you'll find at most a 30% reduction in size using short tags. I generated a psuedo-excel spreadsheet (using a simple C program) that looks like this: <WORKBOOK> <ROW NAME="1"> <COLUMN NAME="A" TYPE=TIME> Mon May 19 09:32:23 EDT 1997 </COLUMN> .... </ROW> .... </WORKBOOK> with 1024 rows, and 25 columns (A-Y). I got the following sizes: 1738016 May 19 09:34 all.xml --- Newlines and end tag GI's 1684820 May 19 09:35 nonl.xml --- End tags GI's, but no newlines 1581497 May 19 09:36 stag.xml --- Shortag with newlines 1529324 May 19 09:41 stagnonl.xml --- Shorttag with no newlines You could get far greater reduction by simply using <R> and <C> instead of <ROW> and <COLUMN> and choosing a smaller representation of time.
Received on Monday, 19 May 1997 09:45:43 UTC