W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > January 1997

Re: Production 21 (and others)

From: W. Eliot Kimber <eliot@isogen.com>
Date: Thu, 30 Jan 1997 08:56:51 -0900
Message-Id: <>
To: w3c-sgml-wg@www10.w3.org
At 08:31 PM 1/29/97 +0000, Digitome Ltd. wrote:
>A related question, in SGML two docs are considered equal if they produce
>the same ESIS stream.

Unless I'm very much mistaken, nothing in any of the system of standards
making up SGML defines a formal or fixed definition of equality of
documents.  It is up to data owners to define their equality requirements.
If ESIS is good enough, fine.  Others only consider two documents to be the
same if they are byte-for-byte identical.

With property sets and groves it is possible to formally define your
equality requirements by defining a grove plan that defines the subset of
properties you care about.  If you use the entire SGML property set, parse
two documents into groves, then compare those two groves; the two groves
will be equal IFF both documents are byte-for-byte identical (because the
full SGML property set includes all the original data strings from the
parsed document--the so-called "markup properties").  If, on the other
hand, you use a grove plan that omits all the markup properties, then you
get more or less "ESIS"-level equality.  The key is you can choose.  By the
same token, programs that modify SGML documents can define what they do and
don't preserve by publishing the grove plan that omits those properties
they don't preserve (for example, most SGML editors don't preserve the
markup properties for elements).  [If they were really cool, tools would
also support the definition of the grove plan you want it to use, letting
you choose how you want to balance data preservation with performance--if I
want every byte preserved, I should have that option, with the
understanding that performance may suffer--this would allow, for example,
SGML database tuning without requiring the database vendor to make
unalterable optimization choices.]

The HyTime TC also defines a string representation of groves (the
"canonical grove representation (CGR) document type") intended to enable
string comparisons of groves: parse two documents into groves, generate the
CGR documents from each, run DIFF on those documents, see where they differ.

XML can formally define its normative equality requirements by defining a
grove plan against the SGML property set with which XML documents should be
compared (for example, it could exclude the DOCTYPE property to allow the
comparison of well-formed and valid documents for equality).


Received on Thursday, 30 January 1997 09:59:44 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:25:07 UTC