- From: Bert Bos <bbos@mygale.inria.fr>
- Date: Thu, 15 May 1997 23:57:55 +0200 (MET DST)
- To: w3c-sgml-wg@w3.org
Steven J. DeRose writes:
> At 03:00 PM 05/15/97 +0200, Bert Bos wrote:
>
> > > Not true! Most HTML *files* are small. There are many massive documents on
> > > the Web that are broken into non-intuitive, hard to use chunks because the
> > > Web is massively optimized for small documents instead of for retrieving
> > > small parts of large documents. *WE MUST NOT PERPETUATE THIS MISTAKE*.
> >
> >OK, the Web is one huge document...
> >
> >No, I don't agree with you. There are nodes in the Web, we usually
> >call them documents. It is convenient for people to work with chunks
> >of information of a certain size. There is usually some intuitive
>
> But rehetorical/conceptual convenience is not what is going on on the Web
> for the most part. Things are broken up due to bandwidth constraints, and
> because navigational sophistication is limited by limited markup and interfaces.
>
>
> >Anything larger is also unlikely to be hierarchical. It is hard enough
> >to create a linear document of a dozen pages, for something the size
> >of a book you already need several months. The Web gives an alternate
> >structuring method, so use it! What is XML-link for, if not for that?
>
> This is incorrect. Most big documents are richly, intensely, fundamentally
> hierarchical. There are lots of reasons for this, including cognitive and
> linguistics ones as well as practical/access ones. I've done statistical
> analysis on the markup of large documents (ranging up to hundreds of MB).
Statistics never tell you which direction the correlation goes. Your
finding supports my argument exactly: people can only deal with large
documents only if they have a rigid structure. If information doesn't
have that structure, it will be put in hypertext instead.
XML is not a hypertext format. XML is a format for *one node* in a
hypertext, just as HTML.
>
> >With current network speeds, a book of 300 pages will not yet be
> >downloaded in 3 seconds, but that situation will improve. Parsing 300
> >pages is not a problem for current computers. Maybe it would be a
> >problem to parse the whole Encyclopeadia Brittannica, but as I said,
> >that "document" is an exception.
>
> Parsing 300 pages will always be annoyingly slow. Try it off your local HD;
> the net is not the only problem. If document open time rises from one second
> to three, it's a big problem. And the last time I benchmarked NS 3 on a
> Pentium 120, it took several *minutes* to bring up a 400 page document off a
> *local* and very fast disk.
>
>
> > 3. XML shall be compatible with SGML.
> >
> > 1.Existing SGML tools will be able to read and write XML data.
> >
> > 2.XML instances are SGML documents as they are, without changes to
> > the instance.
> >
> > 3.For any XML document, a DTD can be generated such that SGML will
> > produce "the same parse" as would an XML processor.
> >
> > 4.XML should have essentially the same expressive power as SGML.
>
> >
> >Clearly points 1 and 2 are not met, so, according to the note, the
> >spec should instead have a section on the recommended way to translate
> >back and forth, with minimal loss of information.
>
> Huh? A very large set of existing SGML tools can and do read XML documents.
> And a lot of them didn't need anything but a new SGML declaration. And XML
> document instances *are* SGML document instances. We've said from the very
> beginning that we were not requiring them to be SGML *under the same DTD and
> SGML declaration*; just that such declarations exist.
Originally I thought that it should be possible, and it was an
interesting puzzle to try to find such a declaration and a rewrite of
a DTD. I never managed to do it. And it isn't of any practical use
either, since rewriting the DTD will not be a mechanical process and
most tools can't change the SGML declaration anyway.
Which tools did you try (and what SGML declaration)?
- (n)sgmls can't read them without a doctype.
- Even with a doctype it can't deal with "/>", unless I set NET to
be "/>", but that is incorrect and leads to erroneous results in
many cases.
- (n)sgmls also ignored some REs, no matter what the content model
(I even tried to rewrite the DTD to use inclusion
exceptions as much as possible - it helped some, but not enough).
- The various HTML browsers I tried couldn't deal with "/>", some
could when I preceded it by a space.
- Dan Connolly's sgml-lex
(http://www.w3.org/pub/WWW/MarkUp/SGML/#sgml-lex) couldn't
either.
- psgml can't deal with "/>".
Bert
--
Bert Bos ( W 3 C ) http://www.w3.org/
http://www.w3.org/pub/WWW/People/Bos/ INRIA/W3C
bert@w3.org 2004 Rt des Lucioles / BP 93
+33 4 93 65 77 71 06902 Sophia Antipolis Cedex, France
Received on Thursday, 15 May 1997 17:57:58 UTC