RE: SD1 - Short End Tags from Gavin Nicol on 1997-05-19 (w3c-sgml-wg@w3.org from May 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Mon, 19 May 1997 09:44:20 -0400
To: andrewl@microsoft.com
CC: w3c-sgml-wg@w3.org
Message-Id: <199705191344.JAA26636@nathaniel.ebt>

>Where they do become important is when XML is machine-generated as a
>transport protocol by an automated process. For example, it is very
>important to me to consider using XML as a format for getting results
>back from database queries. They might be financial records, electronic
>commerce records, purchase orders, etc. These are neither written by
>humans nor meant to be read by humans. In many of these cases, the
>volume of data is large, but is mainly short fields, so the overhead of
>lengthy tags is pretty high relative to the basic data. I'm getting a
>lot of pushback from database people regarding this point. They are very
>concerned that we make it possible for them to be more economical in
>their encoding. Accomodating their needs means opening up a whole
>additional category of XML user.

I think you'll find at most a 30% reduction in size using short tags.
I generated a psuedo-excel spreadsheet (using a simple C program) that
looks like this:

   <WORKBOOK>
   <ROW NAME="1">
   <COLUMN NAME="A" TYPE=TIME>
   Mon May 19 09:32:23 EDT 1997
   </COLUMN>
   ....
   </ROW>
   ....
   </WORKBOOK>

with 1024 rows, and 25 columns (A-Y). I got the following sizes:

   1738016 May 19 09:34 all.xml      --- Newlines and end tag GI's
   1684820 May 19 09:35 nonl.xml     --- End tags GI's, but no newlines
   1581497 May 19 09:36 stag.xml     --- Shortag with newlines
   1529324 May 19 09:41 stagnonl.xml --- Shorttag with no newlines

You could get far greater reduction by simply using <R> and <C> instead
of <ROW> and <COLUMN> and choosing a smaller representation of time.

Received on Monday, 19 May 1997 09:45:43 UTC