- From: Bert Bos <bbos@mygale.inria.fr>
- Date: Thu, 15 May 1997 15:00:17 +0200 (MET DST)
- To: w3c-sgml-wg@w3.org
Paul Prescod writes: > > > Yes, but it creates unbounded linear dependencies, forcing the parsing of > > > an entire document from the beginning, with all entitiy references > > > resolved. A State-independent solution allows "lazy" entity parsing, and > > > re-use of partial documents as well-formed XML fragments. > > > > True, in the worst case, but there are several arguments why this is > > not a big problem: > > > > - The vast majority of documents is small, on the Web that is even > > more true than elsewhere. > > Not true! Most HTML *files* are small. There are many massive documents on > the Web that are broken into non-intuitive, hard to use chunks because the > Web is massively optimized for small documents instead of for retrieving > small parts of large documents. *WE MUST NOT PERPETUATE THIS MISTAKE*. OK, the Web is one huge document... No, I don't agree with you. There are nodes in the Web, we usually call them documents. It is convenient for people to work with chunks of information of a certain size. There is usually some intuitive reason for putting a certain amout of information in a document, and it turns out that most people write documents (both on the Web and elsewhere) that are a similar size. Letters are one or two pages, articles are less than a dozen pages, books are about 300 pages. Anything larger than that is an exception. If you look at a graph of the number of documents versus their size, you'll see a curve that falls off exponentially with increasing document size. This is not (only) due to the computer; it is the way people function. Anything larger is also unlikely to be hierarchical. It is hard enough to create a linear document of a dozen pages, for something the size of a book you already need several months. The Web gives an alternate structuring method, so use it! What is XML-link for, if not for that? With current network speeds, a book of 300 pages will not yet be downloaded in 3 seconds, but that situation will improve. Parsing 300 pages is not a problem for current computers. Maybe it would be a problem to parse the whole Encyclopeadia Brittannica, but as I said, that "document" is an exception. And the example of the encyclopedia also shows that large documents tend to be very regular in structure: they are databases made up of records. It is no coincidence that the only really large documents are databases. To handle things that large, people need a rigid structure. DBMSs deal with gigabytes pretty well, requiring a generic XML parser to deal with it doesn't sound reasonable to me. Instead, pipe the DBMS output into the XML parser and be done with it. > > > - You can arbitrarily limit namespaces by putting a !doctype > > somewhere. > > Then you introduce many OTHER namespace problems like IDs, entities etc. ID's must be unique in the whole document, not just the subdocument. (Otherwise we'll have to change the xpointer syntax, and I rather like it the way it is.) Of course, parsers don't care whether an ID is unique or not, they just assume it is. I don't need entities (but if you can convince me that I do, they are local, just like attributes). > > > I agree with you there, but there is a fallacy in calling them "PIs", > > since PIs are a term from SGML, and in SGML they are not targeted at > > SGML parsers, but at the applications built on top of the parsers. > > > > You're defining XML, you need a widget to define something that is > > common to, and obligatory for all XML parsers. You can use whatever > > syntax you like. Who cares whether it looks like SGML or not? > > Please see: http://www.textuality.com/sgml-erb/dd-1996-0001.html > > These are our goals and I feel that it is too late to change them. XML would > be a very different language if SGML compatibility were not an important > goal. Maybe. But how important is this compatibility? Here is a quote from the document you mentioned: 3. XML shall be compatible with SGML. 1.Existing SGML tools will be able to read and write XML data. 2.XML instances are SGML documents as they are, without changes to the instance. 3.For any XML document, a DTD can be generated such that SGML will produce "the same parse" as would an XML processor. 4.XML should have essentially the same expressive power as SGML. Note: #1 and #2 describe our goal in its ideal form. If this goal is not achievable in its fullest form, then we may back out to a weaker form: it shall be simple to transform XML documents into equivalent SGML documents, and vice versa. Our intention, however, is to bite the bullet and ensure if we can that no transformation is needed to allow SGML tools to read and write XML document instances. #3 and #4 indicate our intentions accurately, but it is not yet clear how best to formalize and explain the phrase "the same parse", or the phrase "essentially the same expressive power". These remain open questions; see point 8 also. Clearly points 1 and 2 are not met, so, according to the note, the spec should instead have a section on the recommended way to translate back and forth, with minimal loss of information. It is my feeling that points 1 and 2 *had* to fail, and I'm glad that they did. Now the WG should indeed `bite the bullet' and spend some resources on discussing the best translation. (Not too many resources, though, because there are more important things to do.) (I said "minimal loss of information", because it is not clear what the information content of an SGML document is (nor of an XML document for that matter, but it's still early enough to fix that; see point 8 in the abovementioned document). The "grove" concept that was retrofitted onto SGML is an intellectual tour-de-force, but also proof that the SGML spec was incomplete. If the SGML spec had said explicitly that no meaning must be attached to such things as the choice of delimiters or the order of attributes, then the grove wouldn't have been necessary.) Bert -- Bert Bos ( W 3 C ) http://www.w3.org/ http://www.w3.org/pub/WWW/People/Bos/ INRIA/W3C bert@w3.org 2004 Rt des Lucioles / BP 93 +33 4 93 65 77 71 06902 Sophia Antipolis Cedex, France
Received on Thursday, 15 May 1997 09:00:35 UTC