OA and provenance

Dear all,
I would like to share a solution that I am currently implementing in Domeo
in relation to provenance and a question related to it. Apologies in
advance for the length of the email.

Use Case: I am dealing with an existing annotation that is written on
paper. The author of the annotation can be the author of the original
manuscript or a third party (let's assume the latter for this example). The
annotation is anchored in a specific location of the original text. My user
is transforming that annotation into a OA annotation. It is very similar to
the Darwin's annotation in the specs [1] but I got to a slightly different
conclusion.

I would like to keep track of:
- the agent that creates the OA annotation
- the application the agent used to create the annotation (could be
different than the application that serialized the annotation)
- the author of the body of the annotation (third party)
- the author of the original association of the annotation with the
original text

In Domeo I use PAV (Provenance Authoring and Versioning ontology) [2][3]
and I append to the oa:Annotation the following properties

1) pav:createdBy -> Domeo user
An agent primarily responsible for encoding the digital artifact or
resource representation. This creation is distinct from forming the
content, which is indicated with pav:contributedBy or its subproperties.
It is more specific than dct:createdBy - which might or might not be
interpreted to also cover the creation of the content of the artifact.

2) pav:createdOn -> When the Domeo user created the digital object
The date of creation of the digital artifact or resource representation.
The agents responsible can be indicated with pav:createdBy.

3) pav:createdAt -> Where the user created the digital object
The geo-location of the agent that created the annotation.

4) pav:createdWith -> In may case the Domeo tool
The software/tool used by the creator (pav:createdBy) when making the
digital resource, for instance a word processor or an annotation tool. A
more independent software agent that creates the resource without direct
interactions by a human creator should instead be indicated using
pav:createdBy.

5) pav:authoredBy -> The author of the original annotation on paper
Indicates an agent that originated or gave existence to the work that is
expressed by the digital resource. The author of the content of a resource
may be different from the creator of that resource representation
(pav:createdBy), although they are often the same. The author is usually
not a software agent (which would be indicated with pav:createdWith,
pav:createdBy or pav:importedBy), unless the software actually authored the
content itself; for instance an artificial intelligence algorithm which
authored a piece of music or a machine learning algorithm that authored a
classification of a tumor sample

6) pav:authoredOn -> The date of the original annotation
Indicates the date this resource was authored by the agents given by
pav:authoredBy. Note that pav:authoredOn is different from pav:createdOn,
although their values are often the same.

In summary I have something like:

<ann1> a oa:Annotation
   pav:createdBy -Paolo-
   pav:createdOn -today-
   pav:createdWith -Domeo-
   pav:createdAt -Boston location-
   pav:authoredBy -Annotation’s author-
   pav:authoredOn -Date of the original annotation-

In other words, using PAV I can keep the distinction between the creator of
the digital artifact and the author of the original content/association.

However, there are possibly a couple of overlaps with the current OA
properties. As I would like to provide the OA provenance as well, I am
wondering which of the following applies:
<ann1> a oa:Annotation ;
    oa:annotatedBy <Paolo> .
or
<ann1> a oa:Annotation ;
    oa:annotatedBy <OriginalAuthor> .

Or compared to PAV:
- pav:createdBy =? oa:annotatedBy --or--
- pav:authoredBy =? oa:annotatedBy

Looking at the Darwin’s example in the specs, if the student is digitizing
a note from Darwin on his own content I would say:
<ann2> a oa:Annotation
   pav:createdBy -Student-
   pav:createdOn -2013-
   pav:createdWith -Domeo-
   pav:createdAt -Boston location-
   pav:authoredBy -Darwin-
   pav:authoredOn -Date of the original annotation-

Then of course the ‘body’ of the annotation can be also authored by the
original author of the annotation. But, as pointed out above, it is
important for me to attribute also the association of body and target to
the original author as that represent the historical provenance of it.

What this comes down to is basically what an oa:Annotation really is: “an
Annotation expresses the relationship between two or more resources, and
their metadata, using an RDF graph”. We talked about this before - my
question here becomes if oa:annotatedBy indicates who formed the
relationship (the ‘author’ of the conceptual annotation); or the person who
(using some OA aware tools) formalized this as an oa:Annotation data
structure (the RDF structure)?

Best,
Paolo


[1] http://www.openannotation.org/spec/core/core.html#Provenance
[2] http://arxiv.org/abs/1304.7224
[3] http://code.google.com/p/pav-ontology/


-- 
Dr. Paolo Ciccarese
http://www.paolociccarese.info/
Biomedical Informatics Research & Development
Instructor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
Member of the MGH Biomedical Informatics Core
+1-857-366-1524 (mobile)   +1-617-768-8744 (office)

CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
may contain information that is considered
to be sensitive or confidential and may not be forwarded or disclosed to
any other party without the permission of the sender.
If you have received this message in error, please notify the sender
immediately.

Received on Wednesday, 14 August 2013 14:00:57 UTC