Re: SD1 - Short End Tags from Paul Prescod on 1997-05-19 (w3c-sgml-wg@w3.org from May 1997)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Mon, 19 May 1997 14:46:48 -0400
To: w3c-sgml-wg@w3.org
Message-ID: <3380A018.81E6E3E9@calum.csclub.uwaterloo.ca>
Steven J. DeRose wrote:
> 
> But every step that helps for RDB-ish data hurts for document data (by
> complicating the parser, compromising error-detection possibilities,
> complicating the DTD and perhaps making it required, reducing redundancy,
> etc). The short-end-tag proposal is just one point along the continuum.

Thanks, Steven, for that well-reasoned post on documents and RDB data.
Here's my take:

When we started talking about DTD-less documents, I had all kinds of
interesting ideas about database records in XML, .ini/.rc files, catalog
files etc. The furor over XML-style DTDs and catalogs show that others
have these same ideas. But the more I think about it the less I care
about those other applications. How often do you really want to process
a relational database or .ini file in an SGML editor? How often do you
want to look at a relational database using the Grove model? Why did I
care back in the heady days of October?

I think I had the kind of monopolistic ideas that Lisp programmers from
the sixties had: code is data, data is code. If everything shares the
same syntax everything can be manipulated uniformly. It turns out not to
be so interesting. Hardly any Lisp programmers care about the fact that
Lisp uses a data-like syntax anymore. Hardly anyone builds code at
runtime as parenthesized strings. I mean there are major benefits to the
fact that Lisp has a simple syntax, but not that it has a *uniform
syntax that is the same as its data*. That's just an analogy, but I
think an important one. Simplicity is important but uniformity is not.
.INI files and comma delimited database files are simple: easy to parse
and use. Why change them? Five years from now the world will not be a
significantly different place if CDF of OCF is XML-based or not. I would
be interested to hear from the MS folks if I am wrong: are there
significant technical benefits to uniting these syntaxes or are we just
playing buzzword games?

Now if Microsoft were thinking about redoing RTF or HTML, markup
languages, then I would listen very carefully. Markup languages are our
target audience. A failure in dealing with RTF will quite possibly
indicate a failure in dealing with other markup languages. The line
between a "document" and a "database" is vague, but we can usually make
the distinction by asking "what is the right formalism for this data"?
If it is a grove, the thing may well be a document. If it is a
relational or flat-file database the thing is probably not.

I think it would be an interesting project to make a standard that
unites all file formats under a machine-readable syntax description
file. But I don't think that XML and XML DTDs are really the right
starting place. Perhaps BNF, or ASN.1, or YACC, or even SGML (which
already has features for minimizing/changing syntax).

 Paul Prescod
Received on Monday, 19 May 1997 14:50:30 UTC