- From: David G. Durand <dgd@cs.bu.edu>
- Date: Mon, 30 Sep 1996 12:10:42 -0400
- To: w3c-sgml-wg@w3.org
This is an analysis spurred by Paul's Posting "Re: Newlines in element content (i.e TABLES)" At 9:51 AM 9/30/96, Paul Prescod wrote: >The point of Charles' True Information spiel is that the application should >never see data that has not been normalized according to SGML/XML's rules. >If a data character (such as a newline under the banish RS/RE proposal) >occurs in element content, it should be an ERROR, and in an SGML parser's >interpretation it will be. So an SGML-based application (i.e. Panorama) will >report an error (if it supports remapping RS/RE). > >That's why RS/RE must either remain as it stands or must be banished from >element content and replaced by a convention like this: ><P >>A new paragraph</P> This same problem occurs for spaces and tabs in element content. My original proposal (for SGML) avoids this problem because \n and \r characters would be declared as SPACE characters, and thus would be ignored in element content. But for XML, we have a problem with any kind of space elimination in element content when used with DTD-less processing. It's easy to use my approach with SGML, but with XML, there is a real problem because without a DTD, we can't tell the difference between element content and other content. So, contra your claim, and my previous assumptions, RE handling is not the key issue here. Whatever we decide on RE processing we will still have to deal with element content in a nasty way because of other whitespace being treated as data. In fact, it's not clear to me how XML and SGML can be compatible when processing element content in the absence of a DTD, since we don't know in that case whether or not we have element content. We basically cannot afford to process element and non-element content differently with regard to whitespace or anything else. ==> So we can't allow any ignored whitespace anywhere without resorting to quoting, because of the non-DTD parsing requirement. Perhaps the correct approach to DTD-less processing is to say that the information returned _is_ different in that case. In this case, if an instance had whitespace in element content, it would be required to send the DTD (or at least the content model for the relevant elements). I don't like this at all, because we now have two possible correct abstract syntaxes for the document. This should be a non-starter. I hate the quoting syntax with a passion, and I suspect that selling it would be pretty hard. I'd rather just outlaw whitespace in element content, and live with the problem (which is at least already familiar from HTML). We can leave it to applications to implement whitespace-ignoration based on stylesheets, but the parse tree should simply make it _all_ significant everywhere. If we do implement some kind of quoting, why not go all the way, with one of the radical syntaxes that were proposed earlier on the list, which make markup syntax isomorphic to LISP syntax. In any case, we might want to figure out a way to not SGML-ify the '"' character as well as the '<>' characters. So I guess stupid NET tricks might be useful after all. Of course, stupid NET tricks have the same concrete disadvantage as the other variants of the quoting proposal, i.e. there is ZERO surface level compatibility with current practice in tagging document instances. I was just reading Richard Gabriel's excellent new book "Patterns of Software," and his comments on programming language design make me think that any syntax that looks unfamiliar will severely endanger the acceptance of XML. I commend to your attention the chapters "The end of History and the Last Programming Language", and "Money through Innovation Reconsidered." His take on how products and languages (ie standards) get accepted in the computing community seems very compelling in the light of history. My application of his theories says that we should change as little as possible from HTML (the market leader), while adding the minimum we can manage to get the most useful new functionality. I must say that I don't see the point of targeting only the SGML community, because they already have SGML. -- David RE delenda est. --------------------------------------------+-------------------------- David Durand dgd@cs.bu.edu | david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
Received on Monday, 30 September 1996 12:12:22 UTC