W3C home > Mailing lists > Public > public-xg-prov@w3.org > September 2010

Handles for data, accesses, time and space

From: William Waites <ww@styx.org>
Date: Thu, 23 Sep 2010 12:44:10 +0100
Message-ID: <4C9B3D8A.30409@styx.org>
To: public-xg-prov@w3.org
Hi all,

I was looking at the list archives and I found a discussion
back in December [0] about where access time belongs. There
seemed to be some argument that because the access/retrieval
operation ostensibly did not change the data, it shouldn't
be treated as a Process in the OPMV sense.

Recently, Ed Summers was doing some analysis and making some
pretty pictures from the billion triple challenge data [1]
and as it turned out the data he retrieved was corrupted.
Worse this wasn't noticed until later, after he had already
transformed and summarised it (e.g. applying further
processes).

In this context I would like to make these points,

 * Even where the process is expected to make an identity
   transformation (e.g. no change), it might actually
   enexpectedly change the data.
 * The access *time* doesn't seem as relevant here as
   a checksum or hash would be.
 * Even if the retrieval is successful, with no errors,
   I think it still makes sense to treat it as a process
   that makes an identity transformation, if only to have
   a place to record things like hashes and access times
   and to confirm that it happened successfully.

The idea of a checksum or hash might be useful as well in
Requirement 1.1 - constructing a handle or token that
represents a piece of data. If this were done simply, i.e.
just hash(data) it would mean that if two people had the
same data they would know it by comparing the hashes. Is
this approach feasible?

Harder, but potentially interesting, is where the data in
question is RDF. Can we come up with a serialisation
independent algorithm for representing a graph? Or put
another way, can we find a way to treat graph equivalence
via a function such that f(g1) = f(g2) iff g1 and g2
are equivalent? Some transformation to ground form,
imposing of lexical ordering when serialising, treating
blank nodes and variables specially? (further question,
what about nested graphs?)

Cheers,
-w

[0] http://lists.w3.org/Archives/Public/public-xg-prov/2009Dec/0002.html
[1] http://lists.w3.org/Archives/Public/public-esw-thes/2010Sep/0009.html
-- 
William Waites                       <ww@styx.org>
Mob: +44 789 798 9965
Fax: +44 131 464 4948
CD70 0498 8AE4 36EA 1CD7  281C 427A 3F36 2130 E9F5


Received on Thursday, 23 September 2010 11:45:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 23 September 2010 11:46:00 GMT