W3C home > Mailing lists > Public > public-openannotation@w3.org > August 2013

Re: OA and provenance

From: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Date: Thu, 15 Aug 2013 10:08:47 -0400
Message-ID: <CAFPX2kBXcQ1-p86UUeq1gskQkz9Kz36VYqNg8DGe0N115cG0Dg@mail.gmail.com>
To: Jacco van Ossenbruggen <Jacco.van.Ossenbruggen@cwi.nl>
Cc: public-openannotation <public-openannotation@w3.org>
Hi Jacco,
thank you for your observations, comments inline.

On Thu, Aug 15, 2013 at 4:08 AM, Jacco van Ossenbruggen <
Jacco.van.Ossenbruggen@cwi.nl> wrote:

> On 14-08-13 16:00, Paolo Ciccarese wrote:
>> I would like to keep track of:
>> - the agent that creates the OA annotation
>> - the application the agent used to create the annotation (could be
>> different than the application that serialized the annotation)
>> - the author of the body of the annotation (third party)
>> - the author of the original association of the annotation with the
>> original text
> Paolo,
> In the cultural heritage I've seen cases that are similar but not exactly
> like your use case.  Two observations might be useful:
> 1. Sometimes two agents work on the same artifact but have a different
> role in its creation, e.g. the author versus the publisher of a book, or
> the artist versus the printer of a graphical print. In these cases it is
> common to model the different roles explicitly, along with the dates,
> places etc that are associated with the different roles. You can have the
> same situation with annotations, and I think you can achieve all of this
> with subclasses/subproperties in OA.  The special semantics of the roles,
> however, might get lost if the data was processed by a general OA
> application.

Normally (if you check out the last example in response to Stian email), I
encode - with the annotation - the different agents according to their
roles. So I can have generic contributors (pav:contributedBy) or more
specialized authors (pav:authoredBy sub-property of
pav:contributedBy)/curators (pav:curatedBy sub-property of
pav:contributedBy)/editors and so on. I agree with you that this rich set
of metadata might get lost when a generic OA application reads the
document. Sometimes that is an issue (like in my use cases), sometimes I
guess it is not.

Could you elaborate on  "achieve all of this with subclasses/subproperties
in OA"? Are you thinking of using sub-properties of oa:annotatedBy for

> 2. Sometimes there are really two annotations, one annotating the work and
> a second annotating the first annotation.  We use this, for example, to
> model annotations that arise when one agent is tagging or rating the
> annotation of another agent. So in your case you could have one annotation
> modeling the orginal annotation and one annotation modeling the things you
> wanted to say about the creation process of digitizing the first
> annotation. Again, OA allows annotations to be the target of other
> annotations, so there is no problem there, while it remains questionable
> how other OA applications would treat them.

That is an interesting point. I think the annotation of annotation is a
possible approach.
However, practically speaking, I'd rather have a single annotation with
richer metadata.
I'll try to explain why, hopefully I will make some sense.

Let's say I have:

<ann1> oa:annotatedBy <Darwin>
<ann2> oa:annotatedBy <the person who digitized Darwin's annotation <ann1>>

If I look at the document of <ann1> and I don't have access to <ann2> I
have no idea about who encoded that into the digital artifact.

I wonder:
(1) is it fair for a client accessing document of <ann1> to understand that
Darwin created that annotation even if there has been a transformation
process in between that we did not track down provenance wise?
(2) is it fair to use Annotations to do the job that RDF does already?
Where am I drawing the line?

In regards to (2) I would stick to the annotation activity: a Student
digitizing Darwin annotation is one task.

In regards to (1) I would probably rather say:

<ann1> oa:annotatedBy <Darwin>
             pav:authoredBy <Darwin> # This is redundant for some reasons
related to my application but could be omitted
             pav:curatedBy <Student1> #Student that extracted the
             pav:createdBy <Student2> # Student that encoded the annotation

This is legal,  self-contained and it does not require to alter the
interpretation of the current model and carries with it the full provenance
of what happened (without de-chaining).
If a OA client does not understand PAV, it misses the additional provenance
details. The discussion here might be: is that ok?

Also this approach does not preclude the annotation of annotation approach
if that is what you need for other reasons.

For instance:
<ann2> can say <ann1> is wrong as <ann1> was curated by somebody else
instead. And that would be perfectly fine.
I still know everything I needed to know about the provenance of <ann1>.

Or if I find an annotation without the full provenance I can annotate it by
adding what I claim it is missing there.
But that is a second task, different than the one that created the first
annotation document.

I guess these are all topics potentially up for discussion.

In my specific case, if the mapping of the pav:authoredBy to oa:annotatedBy
seems reasonable, I am good for now.

Received on Thursday, 15 August 2013 14:09:14 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:38:23 UTC