Re: XML, SGML & the Web (was: Shorthand for default attributes) from Steven J. DeRose on 1997-05-15 (w3c-sgml-wg@w3.org from May 1997)

From: Steven J. DeRose <sjd@eps.inso.com>
Date: Thu, 15 May 1997 16:40:26 -0400
To: Bert Bos <bbos@mygale.inria.fr>, w3c-sgml-wg@w3.org
Message-Id: <2.2.32.19970515204026.00b46858@pop>

At 03:00 PM 05/15/97 +0200, Bert Bos wrote:

> > Not true! Most HTML *files* are small. There are many massive documents on
> > the Web that are broken into non-intuitive, hard to use chunks because the
> > Web is massively optimized for small documents instead of for retrieving
> > small parts of large documents. *WE MUST NOT PERPETUATE THIS MISTAKE*.
>
>OK, the Web is one huge document...
>
>No, I don't agree with you. There are nodes in the Web, we usually
>call them documents. It is convenient for people to work with chunks
>of information of a certain size. There is usually some intuitive

But rehetorical/conceptual convenience is not what is going on on the Web
for the most part. Things are broken up due to bandwidth constraints, and
because navigational sophistication is limited by limited markup and interfaces.

>Anything larger is also unlikely to be hierarchical. It is hard enough
>to create a linear document of a dozen pages, for something the size
>of a book you already need several months. The Web gives an alternate
>structuring method, so use it! What is XML-link for, if not for that?

This is incorrect. Most big documents are richly, intensely, fundamentally
hierarchical. There are lots of reasons for this, including cognitive and
linguistics ones as well as practical/access ones. I've done statistical
analysis on the markup of large documents (ranging up to hundreds of MB). 

>With current network speeds, a book of 300 pages will not yet be
>downloaded in 3 seconds, but that situation will improve. Parsing 300
>pages is not a problem for current computers. Maybe it would be a
>problem to parse the whole Encyclopeadia Brittannica, but as I said,
>that "document" is an exception.

Parsing 300 pages will always be annoyingly slow. Try it off your local HD;
the net is not the only problem. If document open time rises from one second
to three, it's a big problem. And the last time I benchmarked NS 3 on a
Pentium 120, it took several *minutes* to bring up a 400 page document off a
*local* and very fast disk.

>    3. XML shall be compatible with SGML.
>
>       1.Existing SGML tools will be able to read and write XML data.
>
>       2.XML instances are SGML documents as they are, without changes to
>	 the instance.
>
>       3.For any XML document, a DTD can be generated such that SGML will
>	 produce "the same parse" as would an XML processor.
>
>       4.XML should have essentially the same expressive power as SGML.

>
>Clearly points 1 and 2 are not met, so, according to the note, the
>spec should instead have a section on the recommended way to translate
>back and forth, with minimal loss of information.

Huh? A very large set of existing SGML tools can and do read XML documents.
And a lot of them didn't need anything but a new SGML declaration. And XML
document instances *are* SGML document instances. We've said from the very
beginning that we were not requiring them to be SGML *under the same DTD and
SGML declaration*; just that such declarations exist.

Steven J. DeRose, Ph.D., Chief Scientist
Inso Electronic Publishing Solutions
   (formerly EBT)

Received on Thursday, 15 May 1997 16:43:30 UTC