W3C home > Mailing lists > Public > public-openannotation@w3.org > August 2013

Re: OA and provenance

From: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Date: Wed, 14 Aug 2013 21:22:56 -0400
Message-ID: <CAFPX2kDY7XySyCpNj8cBZsaMeidzQbiyn6mbFbzp-=DLyC_w0A@mail.gmail.com>
To: Robert Bolick <robert.bolick@gmail.com>
Cc: public-openannotation <public-openannotation@w3.org>
Dear Robert,
my comments inline.

On Wed, Aug 14, 2013 at 6:00 PM, Robert Bolick <robert.bolick@gmail.com>wrote:

> A curious case, Paolo.

the use case I presented is already in production (with Annotation Ontology
and PAV) and the idea is that Domeo tracks both the primary and the
secondary actors as they are both important.
In my case (biomedicine and science) the secondary actor is also very
important as there could be some substantial 'curation' involved and you
are really interested in knowing who performed it. I've used the Darwin
example as that is the one we put in the specs.

> In trying to place it in context, I thought of a scholar who might be
> preparing a digital set of volumes annotated by some notable person.
> In that context, would it be so important to keep track of the scholar at
> the individual annotation level?  Wouldn't the identity of the scholar be
> associated with the digital set -- Darwin's Annotated Library, The Digital
> Edition, edited by Paolo Ciccarese?

The way Domeo works is that we have 'annotation sets' that are created by
an agent but can then enriched and modified by other agents to whom the
first agent gave editing permissions.
We are basically allowing for crowd sourcing of the annotation creation and
curation (normally to scientists or scholars).

Surely you can have cases in which the whole annotation set is contributed
by one single agent, but that is not always the case. For instance, I can
crowd source to an entire class the task of encoding all the annotations of
a manuscripts and the editor could be the instructor. The students might be
even evaluated for their individual contributions.
Again that is just an example, in general the idea with Domeo is to allow
flexibility in the management of annotations and their collections.

> At the most granular level, the establishing of a relationship between a
> resource and the annotation of the resource would in most cases be the act
> (Darwin's annotation of a passage in a book in his library or in his
> manuscripts) that is more meaningful to track, not that of the secondary
> actor (the scholar).

I tend to track both agent and the application that has been used to
perform the task. When something is not accurate or even wrong I would want
to know what happened. Even the simple association task could require some

> I'm struggling to come up with a context in which there are multiple
> actors engaged in using some OA aware tools to formalize the capture of the
> conceptual annotation, and it is necessary or meaningful to track them
> individually.
> Perhaps some collaborative activity of commentators on annotations?   Or
> students in a VLMS each commenting on one or more of Darwin's annotations?
>  If the latter counts as a legitimate contextualization of the case, do we
> end up with some further complications?

> 1) Is there a need to designate differently the student who comments on
> another student's comments about the annotation?
> 2) Is there a need to designate differently the instructor who comments on
> each of the students' comments about the annotation?

I am not sure I fully comprehend the two questions. I'll give it a shot and
you can tell me if I am off topic.

I  track the agent that contributed the annotations. Some of this agents
can be students, some researchers, some students and so on.
By using proper provenance it is possible to distinguish all the possible
cases. In some contexts that is a very important distinction.
If we annotate classroom material we probably want to know if the comments
has been made by a students, by an instructor or by a very important expert
in the field.
Also we can have annotation of annotations. So a student/instructor can
comment on an existing annotation and so on.

In any case, getting back to my original email, I am debating how the
listed PAV properties and oa:annotatedBy relate to each other. I cannot
drop the PAV properties as they guarantee me the expressiveness I need but
I want to make sure the whole representation is coherent.


> On Wed, Aug 14, 2013 at 3:00 PM, Paolo Ciccarese <
> paolo.ciccarese@gmail.com> wrote:
>> Dear all,
>> I would like to share a solution that I am currently implementing in
>> Domeo in relation to provenance and a question related to it. Apologies in
>> advance for the length of the email.
>> Use Case: I am dealing with an existing annotation that is written on
>> paper. The author of the annotation can be the author of the original
>> manuscript or a third party (let's assume the latter for this example). The
>> annotation is anchored in a specific location of the original text. My user
>> is transforming that annotation into a OA annotation. It is very similar to
>> the Darwin's annotation in the specs [1] but I got to a slightly different
>> conclusion.
>> I would like to keep track of:
>> - the agent that creates the OA annotation
>> - the application the agent used to create the annotation (could be
>> different than the application that serialized the annotation)
>> - the author of the body of the annotation (third party)
>> - the author of the original association of the annotation with the
>> original text
>> In Domeo I use PAV (Provenance Authoring and Versioning ontology) [2][3]
>> and I append to the oa:Annotation the following properties
>> 1) pav:createdBy -> Domeo user
>> An agent primarily responsible for encoding the digital artifact or
>> resource representation. This creation is distinct from forming the
>> content, which is indicated with pav:contributedBy or its subproperties.
>> It is more specific than dct:createdBy - which might or might not be
>> interpreted to also cover the creation of the content of the artifact.
>> 2) pav:createdOn -> When the Domeo user created the digital object
>> The date of creation of the digital artifact or resource representation.
>> The agents responsible can be indicated with pav:createdBy.
>> 3) pav:createdAt -> Where the user created the digital object
>> The geo-location of the agent that created the annotation.
>> 4) pav:createdWith -> In may case the Domeo tool
>> The software/tool used by the creator (pav:createdBy) when making the
>> digital resource, for instance a word processor or an annotation tool. A
>> more independent software agent that creates the resource without direct
>> interactions by a human creator should instead be indicated using
>> pav:createdBy.
>> 5) pav:authoredBy -> The author of the original annotation on paper
>> Indicates an agent that originated or gave existence to the work that is
>> expressed by the digital resource. The author of the content of a resource
>> may be different from the creator of that resource representation
>> (pav:createdBy), although they are often the same. The author is usually
>> not a software agent (which would be indicated with pav:createdWith,
>> pav:createdBy or pav:importedBy), unless the software actually authored the
>> content itself; for instance an artificial intelligence algorithm which
>> authored a piece of music or a machine learning algorithm that authored a
>> classification of a tumor sample
>> 6) pav:authoredOn -> The date of the original annotation
>> Indicates the date this resource was authored by the agents given by
>> pav:authoredBy. Note that pav:authoredOn is different from pav:createdOn,
>> although their values are often the same.
>> In summary I have something like:
>> <ann1> a oa:Annotation
>>    pav:createdBy -Paolo-
>>    pav:createdOn -today-
>>    pav:createdWith -Domeo-
>>    pav:createdAt -Boston location-
>>    pav:authoredBy -Annotation’s author-
>>    pav:authoredOn -Date of the original annotation-
>> In other words, using PAV I can keep the distinction between the creator
>> of the digital artifact and the author of the original content/association.
>> However, there are possibly a couple of overlaps with the current OA
>> properties. As I would like to provide the OA provenance as well, I am
>> wondering which of the following applies:
>> <ann1> a oa:Annotation ;
>>     oa:annotatedBy <Paolo> .
>> or
>> <ann1> a oa:Annotation ;
>>     oa:annotatedBy <OriginalAuthor> .
>> Or compared to PAV:
>> - pav:createdBy =? oa:annotatedBy --or--
>> - pav:authoredBy =? oa:annotatedBy
>> Looking at the Darwin’s example in the specs, if the student is
>> digitizing a note from Darwin on his own content I would say:
>> <ann2> a oa:Annotation
>>    pav:createdBy -Student-
>>    pav:createdOn -2013-
>>    pav:createdWith -Domeo-
>>    pav:createdAt -Boston location-
>>    pav:authoredBy -Darwin-
>>    pav:authoredOn -Date of the original annotation-
>> Then of course the ‘body’ of the annotation can be also authored by the
>> original author of the annotation. But, as pointed out above, it is
>> important for me to attribute also the association of body and target to
>> the original author as that represent the historical provenance of it.
>> What this comes down to is basically what an oa:Annotation really is: “an
>> Annotation expresses the relationship between two or more resources, and
>> their metadata, using an RDF graph”. We talked about this before - my
>> question here becomes if oa:annotatedBy indicates who formed the
>> relationship (the ‘author’ of the conceptual annotation); or the person who
>> (using some OA aware tools) formalized this as an oa:Annotation data
>> structure (the RDF structure)?
>> Best,
>> Paolo
>> [1] http://www.openannotation.org/spec/core/core.html#Provenance
>> [2] http://arxiv.org/abs/1304.7224
>> [3] http://code.google.com/p/pav-ontology/
>> --
>> Dr. Paolo Ciccarese
>> http://www.paolociccarese.info/
>> Biomedical Informatics Research & Development
>> Instructor of Neurology at Harvard Medical School
>> Assistant in Neuroscience at Mass General Hospital
>> Member of the MGH Biomedical Informatics Core
>> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>> CONFIDENTIALITY NOTICE: This message is intended only for the
>> addressee(s), may contain information that is considered
>> to be sensitive or confidential and may not be forwarded or disclosed to
>> any other party without the permission of the sender.
>> If you have received this message in error, please notify the sender
>> immediately.
> --
> Robert Bolick
> Books On Books <http://www.scoop.it/t/books-on-books> site
> My Profile <http://uk.linkedin.com/pub/robert-bolick/4/8bb/ba2> site

Dr. Paolo Ciccarese
Biomedical Informatics Research & Development
Instructor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
Member of the MGH Biomedical Informatics Core
+1-857-366-1524 (mobile)   +1-617-768-8744 (office)
Received on Thursday, 15 August 2013 01:23:23 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:38:23 UTC