Re: OA and provenance

Hi Antoine,

On Fri, Aug 16, 2013 at 7:12 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:

> Hi,
>
> Wow, the return of a serious discussion, in an even more complex form,
> awesome ;-)
>
> First a remark on Stian's comment:
>
> "and the conclusion seemed to have been that it is simpler to merge the
> conceptual annotation with the formalized annotation as a
> datastructure."
>
> Yes, and this was about the data structure only. The annotation is really
> of conceptual nature. We just allow for attributes (e.g. oa:serializedBy)
> that shortcut some provenance info. A full, correct representation has the
> serialization appear as a fully-fledged (PROV) entity, distinct from the
> oa:Annotation, as pictured at
> http://www.openannotation.org/**spec/core/appendices.html#**ProvMapping<http://www.openannotation.org/spec/core/appendices.html#ProvMapping>
>
> Based on this indeed pav:authoredBy is a sub-property of oa:annotatedBy
> (or an equivalent, in the specific context).
>

I would say it is more an equivalent as pav:authoredBy is not only for
annotations.


>
> The question next is how to handle the extra level of "digital annotation"
> - the guy who captures the annotation in the system (I'll just focus on the
> "creator" aspect, the discussion is long enough, let's ignore "with" "at"
> and "on").
>
> I like Jacco's and Stian's suggestions of double annotations (whether one
> is the target or the body or the other...). It is complex, but it
> represents the situation quite well. In this case both are annotators.
>

This would complicate the implementation though. In some sense, I see the
"extra level of "digital annotation"" more as extra level of provenance so
I can stay 'compact'.
We had a similar issue with Claims representation. You have the conceptual
Claim and then multiple embodiment of that claim in text. It is a tough
problem.

But sure, you could think of both of them as annotators. Not sure Darwin
would like that but I cannot talk for him :)


> An alternative is to create one annotation (oa:annotatedBy Darwin) and
> another non-annotation resource. Something similar to the PROV entity we
> have for the serialization. It would represent the act of capturing a
> annotation in the system, where the student plays the creator role.
>
> In any case, that's two resources.
>
> But as for the serialization case, you may want to have only one resource
> in a 'core' solution. Two options here:
>
> 1. Consider that the oa:Annotation is the result of the intellectual work
> of both Darwin and the student. In this case both are the object of an
> oa:annotatedBy. I think this choice is borderline, but in a specific
> application context, where students spend hard work
> deciphering/interpreting an annotation, why not?
> If you want to use pav:curatedBy still, then you would need to have it a
> sub-property of oa:annotatedBy
>

We cannot really do that as pav:curatedBy is also used for objects that are
not annotations.
What I could think of doing now is:

<ann1>
      oa:annotatedBy <Darwin>
      oa:annotatedBy <Student>
      pav:authoredBy <Darwin>
      pav:curatedBy <Student>

It is redundant but that way the semantics is clear for both OA and PAV and
it allows OA clients to get to the provenance.
What do you think of it?


>
> 2. Consider that the role of the student is minor. In this case, I think a
> property with a name like pav:curatedBy still makes sense. But it would be
> a specialization of something more general, maybe dc:contributor. And its
> semantic would in fact be the one of "short-cut" for the more complex
> situation where a second annotation (or a PROV entity) exist to represent
> the situation at the right granularity.
>

In PAV most of the properties are short-cuts. The idea is to have a single
object rather than a series of them. It does not solve everything, but it
works for many use cases.
At the moment pav:curatedBy is sub-property of prov:wasAttributedTo and
also dct:contributor. So I think we are on the same page.


>
> 3. Consider that the role of Darwin is minor (very borderline maybe). In
> this case the student is the oa:annotatedBy, and Darwin a mere
> dc:contributor.
>

Frankly I feel uncomfortable with this approach. But it is true, it depends
on how you intend the annotation.


>
> In any case I don't think you can do anything practical with a solution
> that would only have one resource of type oa:Annotation and a short-cut
> property with a name like pav:createdBy. The name and intuitive semantics
> are really too close to dc:creator and oa:annotatedBy (as the creator of
> the Annotation)! In fact pav:curatedBy is much better, which is why I think
> it could be defined as a short-cut in option 2 above.
>

In pav:createdBy is used for the digital artifact only. While
pav:authoredBy, pav:curatedBy and so on...  are for the content of the
artifact.
So you can have both pav:createdBy (person that created the artifact) and
pav:curatedBy (person that collected and curated its content).


>
> Note that PAV mentions dct:createdBy as the super-property of
> pav:createdBy, which to my knowledge does not exist. In fact I really
> believe PAV would benefit from removing pav:createdBy. If you need it,
> re-introduce it with a better name, and clearer semantics!
>

That is just a typo in the description it is dct:creator. pav:createdBy is
more specific than dct:creator as it refers only to the digital artifact.

Best,
Paolo


>
>  Dear all,
>> I would like to share a solution that I am currently implementing in
>> Domeo in relation to provenance and a question related to it. Apologies in
>> advance for the length of the email.
>>
>> Use Case: I am dealing with an existing annotation that is written on
>> paper. The author of the annotation can be the author of the original
>> manuscript or a third party (let's assume the latter for this example). The
>> annotation is anchored in a specific location of the original text. My user
>> is transforming that annotation into a OA annotation. It is very similar to
>> the Darwin's annotation in the specs [1] but I got to a slightly different
>> conclusion.
>>
>> I would like to keep track of:
>> - the agent that creates the OA annotation
>> - the application the agent used to create the annotation (could be
>> different than the application that serialized the annotation)
>> - the author of the body of the annotation (third party)
>> - the author of the original association of the annotation with the
>> original text
>>
>> In Domeo I use PAV (Provenance Authoring and Versioning ontology) [2][3]
>> and I append to the oa:Annotation the following properties
>>
>> 1) pav:createdBy -> Domeo user
>> An agent primarily responsible for encoding the digital artifact or
>> resource representation. This creation is distinct from forming the
>> content, which is indicated with pav:contributedBy or its subproperties.
>> It is more specific than dct:createdBy - which might or might not be
>> interpreted to also cover the creation of the content of the artifact.
>>
>> 2) pav:createdOn -> When the Domeo user created the digital object
>> The date of creation of the digital artifact or resource representation.
>> The agents responsible can be indicated with pav:createdBy.
>>
>> 3) pav:createdAt -> Where the user created the digital object
>> The geo-location of the agent that created the annotation.
>>
>> 4) pav:createdWith -> In may case the Domeo tool
>> The software/tool used by the creator (pav:createdBy) when making the
>> digital resource, for instance a word processor or an annotation tool. A
>> more independent software agent that creates the resource without direct
>> interactions by a human creator should instead be indicated using
>> pav:createdBy.
>>
>> 5) pav:authoredBy -> The author of the original annotation on paper
>> Indicates an agent that originated or gave existence to the work that is
>> expressed by the digital resource. The author of the content of a resource
>> may be different from the creator of that resource representation
>> (pav:createdBy), although they are often the same. The author is usually
>> not a software agent (which would be indicated with pav:createdWith,
>> pav:createdBy or pav:importedBy), unless the software actually authored the
>> content itself; for instance an artificial intelligence algorithm which
>> authored a piece of music or a machine learning algorithm that authored a
>> classification of a tumor sample
>>
>> 6) pav:authoredOn -> The date of the original annotation
>> Indicates the date this resource was authored by the agents given by
>> pav:authoredBy. Note that pav:authoredOn is different from pav:createdOn,
>> although their values are often the same.
>>
>> In summary I have something like:
>>
>> <ann1> a oa:Annotation
>>     pav:createdBy -Paolo-
>>     pav:createdOn -today-
>>     pav:createdWith -Domeo-
>>     pav:createdAt -Boston location-
>>     pav:authoredBy -Annotation’s author-
>>     pav:authoredOn -Date of the original annotation-
>>
>> In other words, using PAV I can keep the distinction between the creator
>> of the digital artifact and the author of the original content/association.
>>
>> However, there are possibly a couple of overlaps with the current OA
>> properties. As I would like to provide the OA provenance as well, I am
>> wondering which of the following applies:
>> <ann1> a oa:Annotation ;
>>      oa:annotatedBy <Paolo> .
>> or
>> <ann1> a oa:Annotation ;
>>      oa:annotatedBy <OriginalAuthor> .
>>
>> Or compared to PAV:
>> - pav:createdBy =? oa:annotatedBy --or--
>> - pav:authoredBy =? oa:annotatedBy
>>
>> Looking at the Darwin’s example in the specs, if the student is
>> digitizing a note from Darwin on his own content I would say:
>> <ann2> a oa:Annotation
>>     pav:createdBy -Student-
>>     pav:createdOn -2013-
>>     pav:createdWith -Domeo-
>>     pav:createdAt -Boston location-
>>     pav:authoredBy -Darwin-
>>     pav:authoredOn -Date of the original annotation-
>>
>> Then of course the ‘body’ of the annotation can be also authored by the
>> original author of the annotation. But, as pointed out above, it is
>> important for me to attribute also the association of body and target to
>> the original author as that represent the historical provenance of it.
>>
>> What this comes down to is basically what an oa:Annotation really is: “an
>> Annotation expresses the relationship between two or more resources, and
>> their metadata, using an RDF graph”. We talked about this before - my
>> question here becomes if oa:annotatedBy indicates who formed the
>> relationship (the ‘author’ of the conceptual annotation); or the person who
>> (using some OA aware tools) formalized this as an oa:Annotation data
>> structure (the RDF structure)?
>>
>> Best,
>> Paolo
>>
>>
>> [1] http://www.openannotation.org/**spec/core/core.html#Provenance<http://www.openannotation.org/spec/core/core.html#Provenance>
>> [2] http://arxiv.org/abs/1304.7224
>> [3] http://code.google.com/p/pav-**ontology/<http://code.google.com/p/pav-ontology/>
>>
>>
>> --
>> Dr. Paolo Ciccarese
>> http://www.paolociccarese.**info/ <http://www.paolociccarese.info/>
>> Biomedical Informatics Research & Development
>> Instructor of Neurology at Harvard Medical School
>> Assistant in Neuroscience at Mass General Hospital
>> Member of the MGH Biomedical Informatics Core
>> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>>
>> CONFIDENTIALITY NOTICE: This message is intended only for the
>> addressee(s), may contain information that is considered
>> to be sensitive or confidential and may not be forwarded or disclosed to
>> any other party without the permission of the sender.
>> If you have received this message in error, please notify the sender
>> immediately.
>>
>
>
>

Received on Friday, 16 August 2013 12:18:36 UTC