- From: Derek Denny-Brown <ddb@criinc.com>
- Date: Fri, 13 Dec 1996 08:33:38 -0800
- To: w3c-sgml-wg@w3.org
At 09:34 AM 12/13/96 -0500, Gavin wrote: >>>We seem to be confusing parsing XML, and parsing the grammar defined >>>by the DTD is you ask me... >> >>But one of the important points about SGML (of which XML is a subset) is a >>contract between the parser and the application: "I will not hand you data >>which does not conform to the DTD." Is it expected that the parser will parse the instance differently if it has a DTD vs. if it does not? i.e. if I were to construct a grove from the result of the parse, would the portion of the grove representing the instance be equivalent (isomorphic?) or would the presence of the DTD imply the (strong) likelihood of a differing parse? It would seem to me that it would be best if I could expect the result from the parses to be the same, regardless of DTD, for some applications. Parsing with regard to the DTD could be viewed as a filter of the parse w/o the DTD (a subset, almost). It may be easier to handle this issue if the parse is treated as a (potential) 2nd step which requires a DTD. >>This is *central*. Without it, we can seldom do intelligent things >>with documents. (Gavin cut this from the above, but I assume it went with the previous quote.) The problem, as I see it is that the application may have an idea what the DTD would be, but the parse does not. So long as the application know exactly what it is going to get, this should not be a problem. >>Your solution would leave it up entirely to applications, which will (IMO) >>almost inevitably lead to incompatibility. > >Depends. At least all the applications will know *exactly* what >they'll be handed. Unfortunately, a agree on both fronts. Since we do have a number of concrete ideas about how the parser should report the document to the application, if it has a DTD, why not consolidate those ideas, define what the parser should return if it is parsing a document relative to a DTD, then say it is up to the application to treat an instance which it knows to be conforming to a DTD as if it were parsed with regard to that DTD. Going back to my 2 step parser model, an application with a fixed set of DTDs would always take the raw parse and then have hard coded into it some procedures which filtered the parsers events appropriately (with regard to the DTD). A generic DSSSL style sheet might include some extension which would tell the parser whether to return the raw parse or to require a DTD.... I see an almost unanimous agreement that there is no clean way to tell how to handle white space in the document without a DTD. I also hear people pounding that they want DTDless operation. I don't see any easy way to resolve this short of making all white-space relevant, which throws everyone who wants readable XML into fits (with very good reason, if that is your criteria). The only way to know how to deal with white space is to have the DTD, so one end of the transaction needs to know the DTD in order to normalize white space. The only other solution, i can seriously think of, is to say that all white space is significant in a document, unless you have a DTD. This opens the door for all sorts of mess (why does it look different over there vs here....) though and I hestitate even on that. The view of a document as a formatted (for human readability) text file vs. a hierarchical encoding of some data are fundamentally different views, which happened to correspond exactly to the view of the human author/viewer vs the parser/application. Reconciling these views is one of the problems I have had with SGML all along... -derek "that which is not slightly distorted lacks sensible appeal: from which it follows that irregularity - that is to say, the unexpected, surprise, and astonishment, are an essential part and characteristic of beauty" - Charles Baudelaire
Received on Friday, 13 December 1996 11:37:23 UTC