Re: OA and provenance from Antoine Isaac on 2013-08-16 (public-openannotation@w3.org from August 2013)

From: Antoine Isaac <aisaac@few.vu.nl>
Date: Fri, 16 Aug 2013 13:12:46 +0200
To: <public-openannotation@w3.org>
Message-ID: <520E092E.1000403@few.vu.nl>
Hi,

Wow, the return of a serious discussion, in an even more complex form, awesome ;-)

First a remark on Stian's comment:
"and the conclusion seemed to have been that it is simpler to merge the conceptual annotation with the formalized annotation as a
datastructure."

Yes, and this was about the data structure only. The annotation is really of conceptual nature. We just allow for attributes (e.g. oa:serializedBy) that shortcut some provenance info. A full, correct representation has the serialization appear as a fully-fledged (PROV) entity, distinct from the oa:Annotation, as pictured at
http://www.openannotation.org/spec/core/appendices.html#ProvMapping

Based on this indeed pav:authoredBy is a sub-property of oa:annotatedBy (or an equivalent, in the specific context).

The question next is how to handle the extra level of "digital annotation" - the guy who captures the annotation in the system (I'll just focus on the "creator" aspect, the discussion is long enough, let's ignore "with" "at" and "on").

I like Jacco's and Stian's suggestions of double annotations (whether one is the target or the body or the other...). It is complex, but it represents the situation quite well. In this case both are annotators.
An alternative is to create one annotation (oa:annotatedBy Darwin) and another non-annotation resource. Something similar to the PROV entity we have for the serialization. It would represent the act of capturing a annotation in the system, where the student plays the creator role.

In any case, that's two resources.

But as for the serialization case, you may want to have only one resource in a 'core' solution. Two options here:

1. Consider that the oa:Annotation is the result of the intellectual work of both Darwin and the student. In this case both are the object of an oa:annotatedBy. I think this choice is borderline, but in a specific application context, where students spend hard work deciphering/interpreting an annotation, why not?
If you want to use pav:curatedBy still, then you would need to have it a sub-property of oa:annotatedBy

2. Consider that the role of the student is minor. In this case, I think a property with a name like pav:curatedBy still makes sense. But it would be a specialization of something more general, maybe dc:contributor. And its semantic would in fact be the one of "short-cut" for the more complex situation where a second annotation (or a PROV entity) exist to represent the situation at the right granularity.

3. Consider that the role of Darwin is minor (very borderline maybe). In this case the student is the oa:annotatedBy, and Darwin a mere dc:contributor.
  

In any case I don't think you can do anything practical with a solution that would only have one resource of type oa:Annotation and a short-cut property with a name like pav:createdBy. The name and intuitive semantics are really too close to dc:creator and oa:annotatedBy (as the creator of the Annotation)! In fact pav:curatedBy is much better, which is why I think it could be defined as a short-cut in option 2 above.

Note that PAV mentions dct:createdBy as the super-property of pav:createdBy, which to my knowledge does not exist. In fact I really believe PAV would benefit from removing pav:createdBy. If you need it, re-introduce it with a better name, and clearer semantics!

Cheers,

Antoine


> Dear all,
> I would like to share a solution that I am currently implementing in Domeo in relation to provenance and a question related to it. Apologies in advance for the length of the email.
>
> Use Case: I am dealing with an existing annotation that is written on paper. The author of the annotation can be the author of the original manuscript or a third party (let's assume the latter for this example). The annotation is anchored in a specific location of the original text. My user is transforming that annotation into a OA annotation. It is very similar to the Darwin's annotation in the specs [1] but I got to a slightly different conclusion.
>
> I would like to keep track of:
> - the agent that creates the OA annotation
> - the application the agent used to create the annotation (could be different than the application that serialized the annotation)
> - the author of the body of the annotation (third party)
> - the author of the original association of the annotation with the original text
>
> In Domeo I use PAV (Provenance Authoring and Versioning ontology) [2][3] and I append to the oa:Annotation the following properties
>
> 1) pav:createdBy -> Domeo user
> An agent primarily responsible for encoding the digital artifact or resource representation. This creation is distinct from forming the content, which is indicated with pav:contributedBy or its subproperties.
> It is more specific than dct:createdBy - which might or might not be interpreted to also cover the creation of the content of the artifact.
>
> 2) pav:createdOn -> When the Domeo user created the digital object
> The date of creation of the digital artifact or resource representation. The agents responsible can be indicated with pav:createdBy.
>
> 3) pav:createdAt -> Where the user created the digital object
> The geo-location of the agent that created the annotation.
>
> 4) pav:createdWith -> In may case the Domeo tool
> The software/tool used by the creator (pav:createdBy) when making the digital resource, for instance a word processor or an annotation tool. A more independent software agent that creates the resource without direct interactions by a human creator should instead be indicated using pav:createdBy.
>
> 5) pav:authoredBy -> The author of the original annotation on paper
> Indicates an agent that originated or gave existence to the work that is expressed by the digital resource. The author of the content of a resource may be different from the creator of that resource representation (pav:createdBy), although they are often the same. The author is usually not a software agent (which would be indicated with pav:createdWith, pav:createdBy or pav:importedBy), unless the software actually authored the content itself; for instance an artificial intelligence algorithm which authored a piece of music or a machine learning algorithm that authored a classification of a tumor sample
>
> 6) pav:authoredOn -> The date of the original annotation
> Indicates the date this resource was authored by the agents given by pav:authoredBy. Note that pav:authoredOn is different from pav:createdOn, although their values are often the same.
>
> In summary I have something like:
>
> <ann1> a oa:Annotation
>     pav:createdBy -Paolo-
>     pav:createdOn -today-
>     pav:createdWith -Domeo-
>     pav:createdAt -Boston location-
>     pav:authoredBy -Annotation’s author-
>     pav:authoredOn -Date of the original annotation-
>
> In other words, using PAV I can keep the distinction between the creator of the digital artifact and the author of the original content/association.
>
> However, there are possibly a couple of overlaps with the current OA properties. As I would like to provide the OA provenance as well, I am wondering which of the following applies:
> <ann1> a oa:Annotation ;
>      oa:annotatedBy <Paolo> .
> or
> <ann1> a oa:Annotation ;
>      oa:annotatedBy <OriginalAuthor> .
>
> Or compared to PAV:
> - pav:createdBy =? oa:annotatedBy --or--
> - pav:authoredBy =? oa:annotatedBy
>
> Looking at the Darwin’s example in the specs, if the student is digitizing a note from Darwin on his own content I would say:
> <ann2> a oa:Annotation
>     pav:createdBy -Student-
>     pav:createdOn -2013-
>     pav:createdWith -Domeo-
>     pav:createdAt -Boston location-
>     pav:authoredBy -Darwin-
>     pav:authoredOn -Date of the original annotation-
>
> Then of course the ‘body’ of the annotation can be also authored by the original author of the annotation. But, as pointed out above, it is important for me to attribute also the association of body and target to the original author as that represent the historical provenance of it.
>
> What this comes down to is basically what an oa:Annotation really is: “an Annotation expresses the relationship between two or more resources, and their metadata, using an RDF graph”. We talked about this before - my question here becomes if oa:annotatedBy indicates who formed the relationship (the ‘author’ of the conceptual annotation); or the person who (using some OA aware tools) formalized this as an oa:Annotation data structure (the RDF structure)?
>
> Best,
> Paolo
>
>
> [1] http://www.openannotation.org/spec/core/core.html#Provenance
> [2] http://arxiv.org/abs/1304.7224
> [3] http://code.google.com/p/pav-ontology/
>
>
> --
> Dr. Paolo Ciccarese
> http://www.paolociccarese.info/
> Biomedical Informatics Research & Development
> Instructor of Neurology at Harvard Medical School
> Assistant in Neuroscience at Mass General Hospital
> Member of the MGH Biomedical Informatics Core
> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>
> CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s), may contain information that is considered
> to be sensitive or confidential and may not be forwarded or disclosed to any other party without the permission of the sender.
> If you have received this message in error, please notify the sender immediately.
Received on Friday, 16 August 2013 11:13:48 UTC