Re: RS/RE, again (sorry) from Paul Prescod on 1996-12-18 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Wed, 18 Dec 1996 12:48:00 -0500 (EST)
To: gtn@ebt.com (Gavin Nicol)
Cc: w3c-sgml-wg@w3.org
Message-Id: <199612181748.MAA28345@calum.csclub.uwaterloo.ca>

> >Fine. So in your opinion, should the "validator" be part of the XML 
> >specification or not? 
> 
> Not part of the *language* specification.

Well, XML is a language (as well as a meta-language), so I interpret this to 
mean: "XML should define the syntax of markup declarations and markup, 
but should not specify the meaning of the markup declarations or the 
constraints they place on the instance markup."

Is that accurate?

> I do think we need pGroves and validator behaviour defined though.

Could you expand on that? Do you mean that you want to define what is and 
isn't valid, or how to "hook in" an arbitrary validator?

> >If so, does it really matter what the "parser" returns?
> 
> Yes, because it is the *foundation* of the entire application
> architecture. If it is not rigourously defined such that it
> is trivial to prove it correct, no other part of the system can
> be known to be correct. I like foundations of stone, not sand.

I don't think that languages with parsers that are of moderate complexity
are "built on foundations of sand." The proposals for whitespace elimination
in XML are not brain surgery: "look out for this attribute", "look out for
this character", "watch for this list of tags."

> >Should we specify a single standardized validation scheme or not? If we do,
> >what do you propose it should say about whitespace? If we do not, how can we
> >claim to be even vaguly SGML compatible? As you mentioned in your last 
> >message, SGML's validation scheme would be just one of an infinite number
> >of equally "valid" schemes.
> 
> I propose two: a "pure" XML validator, which does no transformation,
> of pGroves and another "SGML" validator, that removes whitespace 
> according to "normal" SGML rules.

So a document can be valid according to the SGML validator, but 
invalid according to the "pure" XML validator because it has whitespace in the 
wrong place? And both validators are "correct?"

And when the same document is parsed and filtered through these two 
different systems, one could give a real "RE Delenda est" behaviour (for
instance a browser written from scratch) and one could remove whitespace
according to SGML rules. So the behaviour of the "parser" would be absolutely 
dead-simple (a tokenization), but the input to the formatting process would
still be up in the air as far as the user is concerned. And both systems would
be correct.

It also isn't clear to me from your proposal above if the two that you propose
are a) actually in the XML spec, or in some other spec and b) exclusively
"valid." Is my foo-sep filter equally valid? Could I write up a spec for it
and have my documents be "valid XML"? Or are only the two filters you propose
in the actual spec?

 Paul Prescod

Received on Wednesday, 18 December 1996 12:48:05 UTC