Re: SD1 - Short End Tags

At 09:58 PM 05/16/97 -0500, len bullard wrote:
>But an old category of SGML user.  You are now discovering some 
>of the reasons for SGML features which were tossed away for the 
>ease of the DPH.  ...

But SGML, even with DATATAG and everything else, is still not a particularly
good tool for transferring RDB fragments. A fundamental (intrinsic, required
by definition) characteristics of a relational table is that every row has
the same list of fields in it (some might be zero, but they still exist).
This is fundamentally NOT true of elements in typical documents ("every
section has the same number of paras" - not).

And that is exactly why the common transfer/export formats for RDBs use
things like a header to say ONCE what order the fields come in, then just
separate them by commas or tabs thereafter. You don't restate the field name
before every instance, because it's perfectly predictable.

Likewise, SGML and XML reflect the expectation that perfect symmetry does
not usually hold. Therefore they put the "field" name on every "field"
(element). You *must* have that information somewhere if you can't predict
what the next one is.

So, when putting perfectly predictable/symmetrical data in XML, you have an
awkward situation (just like RDBs have an awful time representing SGML-like
structures). You can re-introduce as much SGML or similar minimization as
you want, to save as many bytes as you want. At best/worst you can get down
to a single byte separator (wow, just like tab-delimited files!). 

But every step that helps for RDB-ish data hurts for document data (by
complicating the parser, compromising error-detection possibilities,
complicating the DTD and perhaps making it required, reducing redundancy,
etc). The short-end-tag proposal is just one point along the continuum.
Where do we want to be?

<REC><FIRST>John</FIRST>
     <LAST>Smith</LAST>
     <STATE>NC</STATE>     or add short end-tags for

<REC><FIRST>John</>
     <LAST>Smith</>
     <STATE>NC</>          or add NET for

<REC><FIRST/John/
     <LAST/Smith/
     <STATE/NC/            or add declarations and OMITTAG for

<REC><FIRST>John
     <LAST>Smith
     <STATE>NC             or possibly add more declarations for

<REC><>John
     <>Smith
     <>NC                  or add a lot of SHORTREF maps for

<REC>John,Smith,NC

You pays your money and you takes your choice.

Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
   (formerly EBT)

Received on Monday, 19 May 1997 11:42:18 UTC