- From: Paul Grosso <paul@arbortext.com>
- Date: Mon, 16 Dec 96 12:26:41 CST
- To: w3c-sgml-wg@w3.org
I have tried to re-read the last week's worth of postings on REs, and I'm still not sure I've internalized all the pros and cons. Nevertheless, I'm going to try to summarize some of my current thoughts. I want to be able to use HyTime treelocs and datalocs to address into my XML document, and I want the addresses that I created in my authoring tool that has/reads/uses the DTD to point to the same place when I bring the document up in a browser that doesn't have/read/use the DTD. Insofar as XML defines whitespace handling so that the datalocs resolve differently in the two cases, I think we have made a mistake. (There are other reasons I want whitespace handling to be identical with and without the DTD, but this is one obvious reason.) The above implies for me that we cannot leave the decision of what whitespace is significant to the style sheet. (We can and should leave the decision of what the application does with the significant whitespace up to the application in consultation with the style sheet. Therefore, deciding if an RE in data characters should be formatted as an interword space or if consecutive whitespace is collapsed should be a matter left to the style sheet and application, but whether the XML parser/grove constructor passes a given whitespace character on to the application as data must be well-defined in the XML spec without reference to the style sheet.) I really do not like having special attributes in the document instance. Having magic attributes that affect parsing feels quite wrong to me to say nothing of how kludgy it looks to have -XML-... attributes mentioned in the XML spec. It also means all tools that will use DTDs will have to know about these magic attributes, since they may be in the instance but not in the DTD (trust me, someone will use a DTD without -XML- attrs and generate some XML instance, send it to another tool that doesn't use DTDs, use that tool to insert -XML- attrs, then return it to the original tool and be surprised if that original tool complains about the -XML- attrs that are part and parcel of the XML standard itself). If we have defined the concept of well-formed XML precisely so that we can deal with XML instances without DTDs, then I suggest we refine the definition of well-formedness to include what we might call "normalized whitespacing." An XML document is well-formed (and therefore can be properly processed without reference to a DTD) *only* if it contains no (non-markup) whitespace that would be insignificant if it were parsed with reference to its DTD. The above paragraph doesn't have to mean that every RE is significant. As Tim posted recently, editors will break lines, "deal with it." (For those who want to debate whether this is really a requirement or not by making technical arguments, please accept the following: my customers require it--end of discussion. For what it's worth, my customers do not want record ends introduced in markup. We have actually had several customers submit "bugreports" when they find we have introduced a record end within a start tag.) What the previous paragraph does seem to mean is that whatever REs XML deems to be insignificant must be determined to be insignificant regardless of the content model since we may not have the DTD. For example, we can say (as I believe we have in the current XML draft) that an RE immediately following a start tag is insignificant thereby providing places editors and others can introduce record ends while still allowing all XML processors to interpret the results identically without requiring knowledge of the DTD. paul
Received on Monday, 16 December 1996 13:32:45 UTC