Re: RS/RE, again (sorry)

I have tried to re-read the last week's worth of postings on REs,
and I'm still not sure I've internalized all the pros and cons.
Nevertheless, I'm going to try to summarize some of my current
thoughts.

I want to be able to use HyTime treelocs and datalocs to address into
my XML document, and I want the addresses that I created in my
authoring tool that has/reads/uses the DTD to point to the same place
when I bring the document up in a browser that doesn't have/read/use
the DTD.  Insofar as XML defines whitespace handling so that the
datalocs resolve differently in the two cases, I think we have made a
mistake.  (There are other reasons I want whitespace handling to be
identical with and without the DTD, but this is one obvious reason.)

The above implies for me that we cannot leave the decision of what
whitespace is significant to the style sheet.  (We can and should
leave the decision of what the application does with the significant
whitespace up to the application in consultation with the style sheet.
Therefore, deciding if an RE in data characters should be formatted as
an interword space or if consecutive whitespace is collapsed should be
a matter left to the style sheet and application, but whether the XML
parser/grove constructor passes a given whitespace character on to the
application as data must be well-defined in the XML spec without
reference to the style sheet.)

I really do not like having special attributes in the document
instance.  Having magic attributes that affect parsing feels quite
wrong to me to say nothing of how kludgy it looks to have -XML-...
attributes mentioned in the XML spec.  It also means all tools that
will use DTDs will have to know about these magic attributes, since
they may be in the instance but not in the DTD (trust me, someone will
use a DTD without -XML- attrs and generate some XML instance, send it
to another tool that doesn't use DTDs, use that tool to insert -XML-
attrs, then return it to the original tool and be surprised if that
original tool complains about the -XML- attrs that are part and parcel
of the XML standard itself).

If we have defined the concept of well-formed XML precisely so that we
can deal with XML instances without DTDs, then I suggest we refine the
definition of well-formedness to include what we might call "normalized
whitespacing." An XML document is well-formed (and therefore can be
properly processed without reference to a DTD) *only* if it contains no
(non-markup) whitespace that would be insignificant if it were parsed
with reference to its DTD.

The above paragraph doesn't have to mean that every RE is significant.
As Tim posted recently, editors will break lines, "deal with it." (For
those who want to debate whether this is really a requirement or not by
making technical arguments, please accept the following:  my customers
require it--end of discussion.  For what it's worth, my customers do
not want record ends introduced in markup.  We have actually had
several customers submit "bugreports" when they find we have introduced
a record end within a start tag.)  What the previous paragraph does
seem to mean is that whatever REs XML deems to be insignificant must be
determined to be insignificant regardless of the content model since we
may not have the DTD.  For example, we can say (as I believe we have in
the current XML draft) that an RE immediately following a start tag is
insignificant thereby providing places editors and others can introduce
record ends while still allowing all XML processors to interpret the
results identically without requiring knowledge of the DTD.

paul

Received on Monday, 16 December 1996 13:32:45 UTC