- From: W. Eliot Kimber <eliot@isogen.com>
- Date: Thu, 30 Jan 1997 08:56:51 -0900
- To: w3c-sgml-wg@www10.w3.org
At 08:31 PM 1/29/97 +0000, Digitome Ltd. wrote: >A related question, in SGML two docs are considered equal if they produce >the same ESIS stream. Unless I'm very much mistaken, nothing in any of the system of standards making up SGML defines a formal or fixed definition of equality of documents. It is up to data owners to define their equality requirements. If ESIS is good enough, fine. Others only consider two documents to be the same if they are byte-for-byte identical. With property sets and groves it is possible to formally define your equality requirements by defining a grove plan that defines the subset of properties you care about. If you use the entire SGML property set, parse two documents into groves, then compare those two groves; the two groves will be equal IFF both documents are byte-for-byte identical (because the full SGML property set includes all the original data strings from the parsed document--the so-called "markup properties"). If, on the other hand, you use a grove plan that omits all the markup properties, then you get more or less "ESIS"-level equality. The key is you can choose. By the same token, programs that modify SGML documents can define what they do and don't preserve by publishing the grove plan that omits those properties they don't preserve (for example, most SGML editors don't preserve the markup properties for elements). [If they were really cool, tools would also support the definition of the grove plan you want it to use, letting you choose how you want to balance data preservation with performance--if I want every byte preserved, I should have that option, with the understanding that performance may suffer--this would allow, for example, SGML database tuning without requiring the database vendor to make unalterable optimization choices.] The HyTime TC also defines a string representation of groves (the "canonical grove representation (CGR) document type") intended to enable string comparisons of groves: parse two documents into groves, generate the CGR documents from each, run DIFF on those documents, see where they differ. XML can formally define its normative equality requirements by defining a grove plan against the SGML property set with which XML documents should be compared (for example, it could exclude the DOCTYPE property to allow the comparison of well-formed and valid documents for equality). Cheers, E.
Received on Thursday, 30 January 1997 09:59:44 UTC