and being identical
*except for* the contents of this div are considered equivalent.
This kind of measure helps confirm that pages have a consistent
presentation.
(7) As (6), but in addition to ignoring the content of
, we may
apply some further normalisation to the remaining content - e.g.
to ignore differences to a document title meta elements, date,
and an advertising banner.
We can express equivalence by defining arbitrary equivalence classes
of markup. We should specify a relation we can apply programmatically
and which others can replicate: basically, the same rules discussed
for normalisation apply.
For example, taking a metadatum that is invalidated if the document
element structure changes but which ignores attributes, text content,
CDATA, etc, we might:
(1) Normalise to XML DOM.
(2) Discard attribute nodes and text nodes.
(3) Compute and store a Base64 hash on the result.
(4) Trust our metadatum so long as the hash is unchanged.
http://example.org/elements.xsl
http://example.org/dom-elements-svc
If we have a webservice that combines the entirity of the above, we
might reference that instead, though we should preferably also specify
the full method:
http://example.org/norm_elements_hash-svc
http://example.org/norm_elements_hash.html
(as above)
Storing such checksums with the metadata offers us a means of ascertaining
whether the metadata are still valid after document change. In cases
where metadata validity is not a simple binary property, we might
reference it to multiple different checksums, and regard different
combinations of pass/fail as different outcomes such as "partially
invalidated".
Experience with Site Valet's problem reporting and tracking database
is that a wide range of metadata can be usefully referenced to a
smaller number of such checksums such as the above, as the same
equivalence relations serve a range of different metadata.