- From: Bert Bos <bbos@mygale.inria.fr>
- Date: Thu, 15 May 1997 23:57:55 +0200 (MET DST)
- To: w3c-sgml-wg@w3.org
Steven J. DeRose writes: > At 03:00 PM 05/15/97 +0200, Bert Bos wrote: > > > > Not true! Most HTML *files* are small. There are many massive documents on > > > the Web that are broken into non-intuitive, hard to use chunks because the > > > Web is massively optimized for small documents instead of for retrieving > > > small parts of large documents. *WE MUST NOT PERPETUATE THIS MISTAKE*. > > > >OK, the Web is one huge document... > > > >No, I don't agree with you. There are nodes in the Web, we usually > >call them documents. It is convenient for people to work with chunks > >of information of a certain size. There is usually some intuitive > > But rehetorical/conceptual convenience is not what is going on on the Web > for the most part. Things are broken up due to bandwidth constraints, and > because navigational sophistication is limited by limited markup and interfaces. > > > >Anything larger is also unlikely to be hierarchical. It is hard enough > >to create a linear document of a dozen pages, for something the size > >of a book you already need several months. The Web gives an alternate > >structuring method, so use it! What is XML-link for, if not for that? > > This is incorrect. Most big documents are richly, intensely, fundamentally > hierarchical. There are lots of reasons for this, including cognitive and > linguistics ones as well as practical/access ones. I've done statistical > analysis on the markup of large documents (ranging up to hundreds of MB). Statistics never tell you which direction the correlation goes. Your finding supports my argument exactly: people can only deal with large documents only if they have a rigid structure. If information doesn't have that structure, it will be put in hypertext instead. XML is not a hypertext format. XML is a format for *one node* in a hypertext, just as HTML. > > >With current network speeds, a book of 300 pages will not yet be > >downloaded in 3 seconds, but that situation will improve. Parsing 300 > >pages is not a problem for current computers. Maybe it would be a > >problem to parse the whole Encyclopeadia Brittannica, but as I said, > >that "document" is an exception. > > Parsing 300 pages will always be annoyingly slow. Try it off your local HD; > the net is not the only problem. If document open time rises from one second > to three, it's a big problem. And the last time I benchmarked NS 3 on a > Pentium 120, it took several *minutes* to bring up a 400 page document off a > *local* and very fast disk. > > > > 3. XML shall be compatible with SGML. > > > > 1.Existing SGML tools will be able to read and write XML data. > > > > 2.XML instances are SGML documents as they are, without changes to > > the instance. > > > > 3.For any XML document, a DTD can be generated such that SGML will > > produce "the same parse" as would an XML processor. > > > > 4.XML should have essentially the same expressive power as SGML. > > > > >Clearly points 1 and 2 are not met, so, according to the note, the > >spec should instead have a section on the recommended way to translate > >back and forth, with minimal loss of information. > > Huh? A very large set of existing SGML tools can and do read XML documents. > And a lot of them didn't need anything but a new SGML declaration. And XML > document instances *are* SGML document instances. We've said from the very > beginning that we were not requiring them to be SGML *under the same DTD and > SGML declaration*; just that such declarations exist. Originally I thought that it should be possible, and it was an interesting puzzle to try to find such a declaration and a rewrite of a DTD. I never managed to do it. And it isn't of any practical use either, since rewriting the DTD will not be a mechanical process and most tools can't change the SGML declaration anyway. Which tools did you try (and what SGML declaration)? - (n)sgmls can't read them without a doctype. - Even with a doctype it can't deal with "/>", unless I set NET to be "/>", but that is incorrect and leads to erroneous results in many cases. - (n)sgmls also ignored some REs, no matter what the content model (I even tried to rewrite the DTD to use inclusion exceptions as much as possible - it helped some, but not enough). - The various HTML browsers I tried couldn't deal with "/>", some could when I preceded it by a space. - Dan Connolly's sgml-lex (http://www.w3.org/pub/WWW/MarkUp/SGML/#sgml-lex) couldn't either. - psgml can't deal with "/>". Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/pub/WWW/People/Bos/ INRIA/W3C bert@w3.org 2004 Rt des Lucioles / BP 93 +33 4 93 65 77 71 06902 Sophia Antipolis Cedex, France
Received on Thursday, 15 May 1997 17:57:58 UTC