Re: where does access time belong in the provenance dimension?

Hey Paul,

On Monday 07 December 2009 08:55:31 Paul Groth wrote:
> Hi Olaf,
> 
> So I agree with you that access time is another time. But I think it's
> part of what I'll call the access process.
> [...]
> It may be a particular important process but it's a process none the less.
> If we were to add a dimension I would therefore put it under process.

Okay, I see how data access can be understood as a specific kind of process. 
On the other hand, many people seem to understand "process" as something 
during which things are created. For instance, in our wiki it says: 
"provenance as the process used to create a new artifact". Similarily, the OPM 
document defines process as an "Action or series of actions performed on or 
caused by artifacts, and resulting in new artifacts." Both notions of process 
do even stress that the things that are created are new. This is clearly not 
the fact for a data item that is retrieved from the Web during a data access 
process. Hence, in order to put data access under the Process dimension 
requires a broader understanding of "process". For this reason, I propose to 
adjust the wiki entry to "provenance as the process that yielded an artifact."

> Also I think the name "Data Access" maybe should be changed because we
> already have an "Access" under the heading management.

Any suggestions? The only thing that comes to my mind is "Retrieval" which 
could easily be confused with information retrieval and, thus, is not a good 
name.

Greetings,
Olaf

> Regards,
> Paul
> 
> Olaf Hartig wrote:
> > Hey Paul,
> >
> > On Friday 04 December 2009 17:42:34 you wrote:
> >> Hi Olaf,
> >>
> >> It seems to me that the generation time of information is part of the
> >> process (e.g.  b was generated from a version of x that was created at
> >> 10:13) Thus, I think it belongs under the process dimension.
> >
> > I agree: the generation time (or creation time as I called it in the
> > timeliness use case) belongs to the process dimension.
> >
> > However, the use case mentions another time: the access time. Both, b and
> > c, were created by using x and before using x it had to be retrieved from
> > the Web. The use case demonstrates that information about the access time
> > might be relevant for timeliness assessment (due to missing information
> > about the creation time of x in the case of Carol's data creation). The
> > question is, to which of the dimensions in the Content category does the
> > access time belong. I think it doesn't fit in one of the proposed
> > dimensions. Instead, I suggest to add another dimension, called "Data
> > Access", here. This dimension comprises all kinds of information about
> > the access of data items on the Web. This includes not only access time
> > but, for instance, information what server has been accessed as well as
> > the provider/operator of the server. Such information might also be
> > relevant in other information quality assessment scenarios not just
> > timeliness. For instance, in the other use case discussed today - simple
> > trustworthiness: here we have Alice providing a data publishing server.
> > Someone may decide not to trust any data accessed from this server
> > because he/she thinks Alice is not trustworthy and may have manipulated
> > Bob's and Carol's data provided by her server. And again, it's not just
> > about the access of the assessed data itself but also about the access of
> > source data as the timeliness use case illustrates.
> >
> > Greetings,
> > Olaf
> 

Received on Monday, 7 December 2009 10:52:45 UTC