a URI is not a unique ID fit for a skeptic

AG::

Summary:

To collect or compare evaluation results, you need a more precise
identification [Note 1] of what was evaluated than a URI-reference
provides, in
all but rare cases.  Keep copies or hashes of what you evaluated.  We need to
be able to ascertain whether differences in evaluation results were caused by
differences in the processing or differences in the input.

An evaluation transaction is like a scientific experiment.  When you publish
the results, you have to give enough information about the conditions under
which you obtained your results so that an independent agent can reproduce
your
results.  Identifying what you evaluated by a URI-reference to the service you
accessed does not meet this requirement.  Instability in that service process
is one of the suspects when results mis-compare.  The evaluation activity, if
it deals in Web resources, must be structured to observe if that is what is
happening.

Details follow interleaved.

At 03:55 PM 2001-03-19 +0900, Martin Duerst wrote:
>Hello Daniel,
>
>At 16:49 01/03/14 +0100, Daniel Dardailler wrote:
>
>>Starting from the f2f notes at
>>  
<http://www.w3.org/WAI/ER/2001/03/01-f2f-minutes.html>http://www.w3.org/WAI/
ER/2001/03/01-f2f-minutes.html
>>
>>and detailing a bit (in custom data typing notation).
>
>>For the resource being evaluated we mostly want to have
>>   unique id: URL
>>     e.g.  <http://example.org/page#img[3]>http://example.org/page#img[3]
>>           <http://foo.com/svgplayer1.23>http://foo.com/svgplayer1.23
>>   nature of resource: ENUM [web-content, tool]
>>   optionally:
>>     a version number
>>     a date (date the resource was last modified, released)
>>     a snapshot: URL (copy of resource at the time it was evaluated)
>
>You also need the values of various negotiation parameters,
>such as Accept-Language. Different language versions of a language
>negotiated resouce may exhibit different WAI conformance.
>

AG::

Negotiation parameters are but one example of why the URL is not a "unique id"
sufficient to use in a test report.

For evaluated objects, you need a stronger ability to know when someone has
used this same URI but got different recovered contents.  In other words, when
the results of evaluation should be expected to differ.

If you have anything detailed to say in your evaluation results, there should
be a copy retained of the actual contents evaluated.  In the report, a URI
referencing the retained replica of the resource as evaluated could be used. 
But some log of what was actually evaluated has to be maintained by the
evaluator.   Even in the unusual case that a sourcing service has established
and promised persistence policies, it won't be trustworthy unless audited. 
And
their auditors will need independent copies of the samples checked as part of
their mode of operation.  If there are no detailed results reported, a
signature (see XML Signature work for appropriate methods) could be
retained in
lieu of a full copy.

There is no guarantee in general that what one gets on different "recover"
transactions with the same URI-reference will produce the same results on
application of any given [evaluation] transform operation.  In practice, there
is rarely any guarantee.

The Internet has become popular because of its speed.  A major attraction of
this channel is its ability to share rapidly changing information such as
stock
prices.   And those things are not given dated URIs.

The evaluation activity has to create a record of what it evaluated. 
URI-references are not sufficient to answer challenges or other questions
concerning the basis for the evaluation results.

I would be curious to know what the accounting practices are that are used in
the auditing industry.  Do the auditors keep independent copies of the records
that they evaluate?  The financial accounting industry has a lot of practice
that it follows to assure the integrity of the records that are their working
medium.  The Web has no such established pattern of practice, and cannot be
assumed to provide resource persistence to any given standard of resource
integrity.  There is not Web-wide standard for what constitutes resource
integrity, or standard practice for assuring it.  There are scraps of a
practice in things like 'expires' headers used in cache management.  But these
do not add up to a whole solution for much of the Web at all.

One of the practices that industry sometimes follows in search of quality or
management wisdom is "benchmarking."  A review of how auditing functions
within
the overall practice of financial accounting, and in data archiving, could
be a
useful "benchmarking" step for a quality initiative within the Web.

Al

[Note 1] Here where I say "precise identification" I am using 'identification'
in the sense of software configuration identification.  Trustworthy
descriptors.  Review the literature on data integrity and software
configuration control.  This has to be strong enough to support practices
which
control the integrity of information [re-] sources.

PS: [for the quality activity]  Quality programs are in a sense all about
sampling strategies.  One of the tools of a quality program is to monitor and
observe the actual quality of what you are producing.  This involves a
systematic, structured exercise in skepticism so as to provide a greater
worthiness that other should extend trust, a leap of faith, in what you are so
monitoring.  The concept of URI as identifying an information configuration is
just one of the things that we have to be prepared to be skeptical about to
perform the "observe actuals" function within a quality program (for services
that touch or pass through the Web).

Slogan for the "observe actuals" function within a quality program (old
industrial saying):
"Do not _ex_pect what you do not _in_spect."


>
>Regards,   Martin.
>  

Received on Monday, 19 March 2001 10:22:54 UTC