Re: RE deleta est from Peter Murray-Rust on 1997-06-12 (w3c-sgml-wg@w3.org from June 1997)

From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
Date: Thu, 12 Jun 1997 22:47:22 GMT
To: w3c-sgml-wg@w3.org
Message-Id: <7936@ursus.demon.co.uk>

In message <199706111436.KAA20852@www10.w3.org> Michael Sperberg-McQueen writes:
[...]
> 
> So the difference between processors which read the DTD and those
> which don't is not validating vs. non-validating, and not conformant
> vs. non-conformant.  It's just processors which read the DTD vs.
> those which don't.  The former is a superset of validating
> processors.

We are still struggling with this on xml-dev.  May I suggest an alteration
to the parsing/reading philosophy?

2.10 rightly identifies three areas where knowledge of the DTD is required to
pass 'correct' output to the application.  #1 is default attribute values; 
#2 is entities.  *IF* these are declared in the DTD I cannot see that it's
reasonable for a processor/parser to ignore them.  The spec states that
RMD="NONE" is allowed only if such declarations don't exist; otherwise it's 
an error, but a non-DTD-reading parser (or one blindly following NONE) can never
flag this error.  Therefore the only course that seems reasonable to me is:

<PROPOSAL>
All XML processors must read the DTD(s)
</PROPOSAL>

The problem seems to arise with #3, the analysis of whitespace :-(  The 
appropriate treatment of this depends on:

"reading and analysing the element content"

and I suggest that a phrase like this should is what is intended by "reads the
DTD".  (I think it only occurs in this context).  (Whether it is possible to
analyse the whitespace problem without validating the content at the same time
anyway I don't know.)

The consequences of this - if accepted - are that all parsers have to be able 
to deal with any construct in the DTD, including PEs :-).  So my final
analysis - shoot it down - is:

1. all parsers must read the DTD or DTDs if present and process them including:
	- substitution of parameter entities
	- handling of multiple ATTLIST declarations
1a. all parsers must add default attributes to start-tags in the document
1b. all parsers must expand entities in the document as in 4.3.1 and 4.3.2
2. a parser is validating or non-validating.
2a. a non-validating parser may/must/must_not[1] analyse an element's 
content and may/must/must_not apply the current rule for whitespace insertion
or deletion.
3. [2]
4. [3]

[1] to be determined by the ERB
[2] error recovery in validation of a WF document, to be determined by the ERB
[3] error recovery in WFness of a non-WF document, ditto.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

Received on Thursday, 12 June 1997 18:14:38 UTC