Re: RS/RE, again (sorry) from Gavin Nicol on 1996-12-18 (w3c-sgml-wg@w3.org from December 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Wed, 18 Dec 1996 13:38:10 -0500
To: papresco@calum.csclub.uwaterloo.ca
CC: w3c-sgml-wg@w3.org
Message-Id: <199612181838.NAA00499@nathaniel.ebt>
>> Not part of the *language* specification.
>
>Well, XML is a language (as well as a meta-language), so I interpret this to 
>mean: "XML should define the syntax of markup declarations and markup, 
>but should not specify the meaning of the markup declarations or the 
>constraints they place on the instance markup."

Close, though I would also have the parser perform simple validation
of the gorss tag structure (ie. make sure all end tags are present in
the right place etc.)

>> I do think we need pGroves and validator behaviour defined though.
>
>Could you expand on that? Do you mean that you want to define what is and 
>isn't valid, or how to "hook in" an arbitrary validator?

pGroves define what the validator has to work with. Validator
definition would ential specifying what tasks a validator is to
perform. 

>I don't think that languages with parsers that are of moderate complexity
>are "built on foundations of sand." The proposals for whitespace elimination
>in XML are not brain surgery: "look out for this attribute", "look out for
>this character", "watch for this list of tags."

These are all a) not necessary, and b) imply some feedback between the
parser and whatever in interpreting the parser output.

>>I propose two: a "pure" XML validator, which does no transformation,
>>of pGroves and another "SGML" validator, that removes whitespace 
>>according to "normal" SGML rules.
>
>So a document can be valid according to the SGML validator, but 
>invalid according to the "pure" XML validator because it has
>whitespace in the wrong place? And both validators are "correct?"

Yes. A document that is valid according to the (stricter) XML
validator would also be valid in the "SGML" validator, though the
opposite may not be true.

>And when the same document is parsed and filtered through these two 
>different systems, one could give a real "RE Delenda est" behaviour (for
>instance a browser written from scratch) and one could remove whitespace
>according to SGML rules. So the behaviour of the "parser" would be absolutely 
>dead-simple (a tokenization), but the input to the formatting process would
>still be up in the air as far as the user is concerned. And both systems would
>be correct.

No. The parser would be very simple: it would tokenise, and check the
gross structural soundness of an instance. Input to the formatting
process is not "up in the air" because "the formatting process" would
accept a pGrove. It could then *choose* which "validator" it preferred.

>It also isn't clear to me from your proposal above if the two that
>you propose are a) actually in the XML spec, or in some other spec
>and b) exclusively "valid." Is my foo-sep filter equally valid? Could
>I write up a spec for it and have my documents be "valid XML"? Or are
>only the two filters you propose in the actual spec?

I would say that these two validators be defined as part of the
overall XML effort. The "pure XML" validator is simple, and the "SGML
validator" not much more difficult. As you note, there could be
others (after all, the application gets to choose), but with these two
we get:

    a) A parser that is truly trivial to implement
    b) SGML whitespace compatability for those who really, really need
       it (and I do not think this group will be as large as some make
       out). 
    c) A well-defined foundation upon which to build new application.
Received on Wednesday, 18 December 1996 13:39:39 UTC