Re: SD1 - Short End Tags [fmt] from altheim on 1997-05-16 (w3c-sgml-wg@w3.org from May 1997)

From: altheim <altheim@mehitabel.Eng.Sun.COM>
Date: Fri, 16 May 1997 15:42:37 -0700 (PDT)
To: w3c-sgml-wg@w3.org
Message-ID: <libSDtMail.9705161542.5040.altheim@mehitabel>
From: Jean Paoli <jeanpa@microsoft.com>
[...]
>  SD1 - Short End Tags
>  -----------------------------------
> 
>  Structured data introduces a lot of tags, which can overwhelm the
>  actual data content. Repetition of an element's name within an end tag
>  can increase readability of a document when the tags are physically
>  distant. However, when used for representing database records, on the
>  leaf nodes, the repetition adds only clutter.

There's a number of strong suppositions here that have emotionally-charged
words attached. "Overwhelm" and "clutter" have no bearing on processing.
A 486 processor has no difficulty dealing with data content that has
end tags -- it's easier (and possibly faster) *with* them. Database storage
need contain no XML markup, so this seems to be an issue only for
transferring or transforming data outside of the database. What you call
clutter may also be deemed very helpful information for both simple
processors, CGI scripts, and people reading or editing the markup. 

Where a simple processor could simply search forward for the end tag, this
would no longer be the case. We now need to build a parse tree. I'm sure that
Microsoft's (or any large vendor's) application design would have no
problems dealing with this, but our college programmer's project just
got a lot more complex, and that little perl script just got a *lot* bigger.
And documents got a lot harder to read. For what? Some savings in disk
space? 2GB drives come cheap nowadays, and the difference in file sizes
at 28.8K would hardly be noticed. We're not transferring 15MB databases
of predominantly markup, are we? 
 
>  Proposal: For readability and convenience when dealing with database
>  records, the element name within an end tag is optional. That is,
>  </NAME> is same as </>.

This seems somewhat Orwellian. For "readability and convenience", the XML spec
has made end tag GIs *required*. You're turning that idea upside-down.

This proposal also diminishes its own impact by showing short end tags
only applied to leaf nodes. Let's be more realistic. It gets a whole lot
more complicated and difficult to parse if this proposal is accepted as
stated, which allows short end tags everywhere. A typical example from my desk:

    <informaltable frame="none" pgwide="1">
    <tgroup cols="2" colsep="0" rowsep="0">
    <colspec colname="COLUMN2" colwidth="165*">
    <colspec colname="COLUMN3" colwidth="231*">
    <thead><row><entry align="left" valign="bottom">
    <para>Problem</para></entry>
    <entry align="left" valign="bottom">
    <para>How to Fix the Problem</para></entry></row></thead>
    <tbody><row><entry align="left" valign="top">
    <para>The system is not connected to the network.</para></entry>
    <entry align="left" valign="top">
    <para>If this is a non-networked system, ignore this message.
    If this is a networked system, make sure the Ethernet cabling
    is attached securely.
    </para></entry></row></tbody></tgroup></informaltable>
 
becomes

    <informaltable frame="none" pgwide="1">
    <tgroup cols="2" colsep="0" rowsep="0">
    <colspec colname="COLUMN2" colwidth="165*">
    <colspec colname="COLUMN3" colwidth="231*">
    <thead><row><entry align="left" valign="bottom">
    <para>Problem</></>
    <entry align="left" valign="bottom">
    <para>How to Fix the Problem</></></></>
    <tbody><row><entry align="left" valign="top">
    <para>The system is not connected to the network.</></>
    <entry align="left" valign="top">
    <para>If this is a non-networked system, ignore this message.
    If this is a networked system, make sure the Ethernet cabling
    is attached securely.
    </></></></></></>

and we save how many characters? About 70 out of 680. Allowing only
leaf node minimization saves only 16 chars. And if the database (or
more generally, any XML content) contains a higher ratio of content to
markup, the savings are even less. In a typical book or article, the 
difference would be neglible. It would be the same in a database whose
field contents were of any length.

We also pay a very high price in losing the simplicity of matching
start and end tags, particularly for the grassroots-type application
and content developers. CGI and perl script writers dealing with
transactions now would have to keep track of level, and humans would
probably just give up after about three or four levels. Heck, that's the
reason I don't like chess.

Is this really worth all the headache?

Murray

...........................................................................
Murray Altheim, SGML Grease Monkey                    <altheim@eng.sun.com>
Member of Technical Staff, Tools Development & Support
Sun Microsystems, 2550 Garcia Ave., MS UMPK17-102, Menlo Park, CA 94043 USA
         "Give a monkey the tools and he'll build a typewriter."
Received on Friday, 16 May 1997 18:43:10 UTC