Re: Whitespace from Peter Murray-Rust on 1997-05-08 (w3c-sgml-wg@w3.org from May 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Thu, 08 May 1997 23:31:16 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <6363@ursus.demon.co.uk>
In message <3.0.32.19970507174505.009e7b00@pop.intergate.bc.ca> Tim Bray writes:
[...]
> 
> Actually, it's worse than Peter thinks.  There are at least three ways
> in which DTD-less and DTD-ful processing can produce different 
> results:
> 
>  1. White space in element content
>  2. Default attributes
>  3. Attribute values that are space/case normalized only if you
>     read the DTD and know they are NMTOKEN or ID or something.
> 
> Actually, I don't find #1, the issue that got Peter going, is all
> that severe.  I think that addressing into documents based on counting

1. Now that Joe and others have shown me how to get round it, I have no real
problem as it is possible to create files without whitespace, ugly though
they are.  (When I first started CML it was crafted in SGML and parsed 
against the DTD, so that this problem didn't arise.)

2. I dislike defaults, because (whether we like it or not) people will read
CML/XML documents without reading the DTD.  And even if they *do* have the
DTD it's often so full of PEs that it's quite impossible for anyone to expand
them.  So - unless we get universally good software for presenting the
structure of the DTD to authors and readers - people are going to be confused.
For that reason CML has no defaults.

3. For the same reason, CML does not - at present - have IDs.  It *does*
have NAME(CDATA) (a la HTML) since people are familiar with that.  I'd like
suggestions about how to introduce IDs gently - they are clearly not possible
in WF docs without a DTD subset, and even then they have to be spelt out for
every element (whereas <!ATTLIST * ID ID #IMPLIED> would be a reasonable
magic incantation for most chemists to be introduced to and would wean 
them to the purpose.

> nodes, without checking their type, or even that they *are* real nodes,
> is inherently non-portable and shouldn't be done, unless you *know*
> the doc is read-only.  ID's, or element types, or attribute values,

I am *not* a supporter of node counting - I simply used it as an example.
I will have to navigate by counting node types and attributes.  IDs will
not be easy, because it will be very difficult to create CML documents where 
these are unique.  The reason for this is that CML documents will frequently
be created by mixing together comonents from severla different sources and
will need good software to manage not only the uniqueness but also the links
to any nodes which have to be renumbered.  IMO this is a major challenge for 
XML authoring.

> are a much better way to go.  Yes, I appreciate that for maneuvring
> through read-only docs, node-counting is sometimes de-rigeur (although
> I'd think that in something like CML you could at least count
> nodes of particular types) - in the case where counting children is
> absolutely necessary, such a count is going to have to be qualified
> as DTD-ful or DTD-less, to be sure.
> 
> On the other hand, James has suggested just bagging the pseudo-element
> thing entirely.  On the one hand this would be painful, but I suspect
> you have a better understanding now of why he's saying this. -T.

If it isn't going over old ground, and leads to a useful way forward, I
would be intersted to have this expanded here...

	P.
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/
Received on Thursday, 8 May 1997 19:18:10 UTC