Re: RS/RE, again (sorry) from Paul Prescod on 1996-12-18 (w3c-sgml-wg@w3.org from December 1996)

From: Paul Prescod <papresco@calum.csclub.uwaterloo.ca>
Date: Wed, 18 Dec 1996 08:34:08 -0500 (EST)
To: gtn@ebt.com (Gavin Nicol)
Cc: w3c-sgml-wg@w3.org
Message-Id: <199612181334.IAA10239@calum.csclub.uwaterloo.ca>

It's becoming a little clearer. I'm starting to wonder if we're talking 
apples and oranges.  Your proposal seems to define an application
architecture (what components say what to what). But I don't understand
how that translates into a *language* (what is valid, invalid, and what
constructs mean).

But in XML and SGML, the concept of "what is the parse" and "what is the 
validator" are not as interesting to users as "what is the parser going to
return to the application" and "what is the validator going to report as 
correct." So separating the "parser" and the "validator" is only interesting
to CS types. Although on other days I would find that fascinating, today I
only care about the language.

What do you want the specification to say about a "valid" XML document? That
it has a different definition in different situations, depending on your legacy
needs? That whitespace is allowed in #PCDATA content, or not?

What do you want the specification to say about what the parser returns to the
application? The same thing? Depends on what you want? Let the tools figure it
out?

Thanks for furthering my education...

 Paul Prescod

Gavin said:
> >>that having all whitespace be significant still seems a reasonable
> >>way to go. 
> >
> >Can you please describe *exactly* what that means?
> ....
> >
> >At other points, there has been discussion of having a DTD-reading "filter"
> >remove the whitespace. Which seems to imply that the former would be *valid*
> >as long as the filter is applied before the validation takes place. In this
> >case, the grove which is being validated is different from the grove that a
> >DTD-less parser would use.
> 
> I repeat my viewpoint:
> 
>    1) The *parser* does not use a DTD, and so creates a pGrove (to use
>       Elliot's term) in which *all* non-markup charaters occur (lot's
>       of psuedo-elements). 
>       [pGrove -> pGrove]
>    2) For pure XML *validators* of the pGrove, the following:
> 
>           <LIST>
>           <ITEM>foo</ITEM>
>           </LIST>
> 
>       would cause an error if LIST couldn't contain #PCDATA.       
>       [pGrove -> validator]
>    3) For XML *validators* of the pGrove that are built to support
>       legacy SGML systems, the following:
> 
>           <LIST>
>           <ITEM>foo</ITEM>
>           </LIST>
> 
>       would not cause an error (ie. "normal" SGML behaviour because
>       they would perform some transformation of the pGrove).
>       [pGrove -> validator -> epGrove].
> 
> I expect to see most new applications built around (1), and many
> others to use (3) to obtain the semantics they desire.
> 
> A "parser" is something that tokenises the stream, and checks only
> the syntactic constraints imposed by the XML grammar.
> 
> A "validator" is something that takes a pGrove, and checks that it
> comforms to the constraints imposed by the grammar as defined by a
> DTD.

Received on Wednesday, 18 December 1996 08:34:08 UTC