W3C home > Mailing lists > Public > public-openannotation@w3.org > August 2013

Re: OA and provenance

From: Leyla Jael García Castro <leylajael@gmail.com>
Date: Fri, 16 Aug 2013 13:38:27 +0100
Message-ID: <CACLxDV7tTiq37BhAoSw5DbTzBdZuBd2yKB4yZ4g58U129Tojig@mail.gmail.com>
To: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Cc: Antoine Isaac <aisaac@few.vu.nl>, public-openannotation <public-openannotation@w3.org>
Hi Paolo, all,

Some comments on Paolo's last mail

> What I could think of doing now is:
>
> <ann1>
>       oa:annotatedBy <Darwin>
>       oa:annotatedBy <Student>
>       pav:authoredBy <Darwin>
>       pav:curatedBy <Student>
>
> It is redundant but that way the semantics is clear for both OA and PAV
> and it allows OA clients to get to the provenance.
> What do you think of it?
>

I would say pav:curatedBy makes sense only if the student is doing
something additional to just creating the annotation in Domeo. In another
scenario, rather than a student you could have a software agent creating
annotations that were retrieved from somewhere else (an entity-recognition
tool or a digital version of Darwin's annotated text). In that case I would
go for pav:createdBy <a software agent> and pav:authoredBy <Darwin> (or
pav:authoredBy <recognition tool>. Does it make sense to you?


>
> 3. Consider that the role of Darwin is minor (very borderline maybe). In
>> this case the student is the oa:annotatedBy, and Darwin a mere
>> dc:contributor.
>>
>
Frankly I feel uncomfortable with this approach. But it is true, it depends
> on how you intend the annotation.
>

I agree with Paolo, for the particular case that he has described I would
also feel uncomfortable with dc:contributor <Darwin>

Cheers,
Leyla


On Fri, Aug 16, 2013 at 1:18 PM, Paolo Ciccarese
<paolo.ciccarese@gmail.com>wrote:

> Hi Antoine,
>
> On Fri, Aug 16, 2013 at 7:12 AM, Antoine Isaac <aisaac@few.vu.nl> wrote:
>
>> Hi,
>>
>> Wow, the return of a serious discussion, in an even more complex form,
>> awesome ;-)
>>
>> First a remark on Stian's comment:
>>
>> "and the conclusion seemed to have been that it is simpler to merge the
>> conceptual annotation with the formalized annotation as a
>> datastructure."
>>
>> Yes, and this was about the data structure only. The annotation is really
>> of conceptual nature. We just allow for attributes (e.g. oa:serializedBy)
>> that shortcut some provenance info. A full, correct representation has the
>> serialization appear as a fully-fledged (PROV) entity, distinct from the
>> oa:Annotation, as pictured at
>> http://www.openannotation.org/**spec/core/appendices.html#**ProvMapping<http://www.openannotation.org/spec/core/appendices.html#ProvMapping>
>>
>> Based on this indeed pav:authoredBy is a sub-property of oa:annotatedBy
>> (or an equivalent, in the specific context).
>>
>
> I would say it is more an equivalent as pav:authoredBy is not only for
> annotations.
>
>
>>
>> The question next is how to handle the extra level of "digital
>> annotation" - the guy who captures the annotation in the system (I'll just
>> focus on the "creator" aspect, the discussion is long enough, let's ignore
>> "with" "at" and "on").
>>
>> I like Jacco's and Stian's suggestions of double annotations (whether one
>> is the target or the body or the other...). It is complex, but it
>> represents the situation quite well. In this case both are annotators.
>>
>
> This would complicate the implementation though. In some sense, I see the
> "extra level of "digital annotation"" more as extra level of provenance so
> I can stay 'compact'.
> We had a similar issue with Claims representation. You have the conceptual
> Claim and then multiple embodiment of that claim in text. It is a tough
> problem.
>
> But sure, you could think of both of them as annotators. Not sure Darwin
> would like that but I cannot talk for him :)
>
>
>> An alternative is to create one annotation (oa:annotatedBy Darwin) and
>> another non-annotation resource. Something similar to the PROV entity we
>> have for the serialization. It would represent the act of capturing a
>> annotation in the system, where the student plays the creator role.
>>
>> In any case, that's two resources.
>>
>> But as for the serialization case, you may want to have only one resource
>> in a 'core' solution. Two options here:
>>
>> 1. Consider that the oa:Annotation is the result of the intellectual work
>> of both Darwin and the student. In this case both are the object of an
>> oa:annotatedBy. I think this choice is borderline, but in a specific
>> application context, where students spend hard work
>> deciphering/interpreting an annotation, why not?
>> If you want to use pav:curatedBy still, then you would need to have it a
>> sub-property of oa:annotatedBy
>>
>
> We cannot really do that as pav:curatedBy is also used for objects that
> are not annotations.
> What I could think of doing now is:
>
> <ann1>
>       oa:annotatedBy <Darwin>
>       oa:annotatedBy <Student>
>       pav:authoredBy <Darwin>
>       pav:curatedBy <Student>
>
> It is redundant but that way the semantics is clear for both OA and PAV
> and it allows OA clients to get to the provenance.
> What do you think of it?
>
>
>>
>> 2. Consider that the role of the student is minor. In this case, I think
>> a property with a name like pav:curatedBy still makes sense. But it would
>> be a specialization of something more general, maybe dc:contributor. And
>> its semantic would in fact be the one of "short-cut" for the more complex
>> situation where a second annotation (or a PROV entity) exist to represent
>> the situation at the right granularity.
>>
>
> In PAV most of the properties are short-cuts. The idea is to have a single
> object rather than a series of them. It does not solve everything, but it
> works for many use cases.
> At the moment pav:curatedBy is sub-property of prov:wasAttributedTo and
> also dct:contributor. So I think we are on the same page.
>
>
>>
>> 3. Consider that the role of Darwin is minor (very borderline maybe). In
>> this case the student is the oa:annotatedBy, and Darwin a mere
>> dc:contributor.
>>
>
> Frankly I feel uncomfortable with this approach. But it is true, it
> depends on how you intend the annotation.
>
>
>>
>> In any case I don't think you can do anything practical with a solution
>> that would only have one resource of type oa:Annotation and a short-cut
>> property with a name like pav:createdBy. The name and intuitive semantics
>> are really too close to dc:creator and oa:annotatedBy (as the creator of
>> the Annotation)! In fact pav:curatedBy is much better, which is why I think
>> it could be defined as a short-cut in option 2 above.
>>
>
> In pav:createdBy is used for the digital artifact only. While
> pav:authoredBy, pav:curatedBy and so on...  are for the content of the
> artifact.
> So you can have both pav:createdBy (person that created the artifact) and
> pav:curatedBy (person that collected and curated its content).
>
>
>>
>> Note that PAV mentions dct:createdBy as the super-property of
>> pav:createdBy, which to my knowledge does not exist. In fact I really
>> believe PAV would benefit from removing pav:createdBy. If you need it,
>> re-introduce it with a better name, and clearer semantics!
>>
>
> That is just a typo in the description it is dct:creator. pav:createdBy is
> more specific than dct:creator as it refers only to the digital artifact.
>
> Best,
> Paolo
>
>
>>
>>  Dear all,
>>> I would like to share a solution that I am currently implementing in
>>> Domeo in relation to provenance and a question related to it. Apologies in
>>> advance for the length of the email.
>>>
>>> Use Case: I am dealing with an existing annotation that is written on
>>> paper. The author of the annotation can be the author of the original
>>> manuscript or a third party (let's assume the latter for this example). The
>>> annotation is anchored in a specific location of the original text. My user
>>> is transforming that annotation into a OA annotation. It is very similar to
>>> the Darwin's annotation in the specs [1] but I got to a slightly different
>>> conclusion.
>>>
>>> I would like to keep track of:
>>> - the agent that creates the OA annotation
>>> - the application the agent used to create the annotation (could be
>>> different than the application that serialized the annotation)
>>> - the author of the body of the annotation (third party)
>>> - the author of the original association of the annotation with the
>>> original text
>>>
>>> In Domeo I use PAV (Provenance Authoring and Versioning ontology) [2][3]
>>> and I append to the oa:Annotation the following properties
>>>
>>> 1) pav:createdBy -> Domeo user
>>> An agent primarily responsible for encoding the digital artifact or
>>> resource representation. This creation is distinct from forming the
>>> content, which is indicated with pav:contributedBy or its subproperties.
>>> It is more specific than dct:createdBy - which might or might not be
>>> interpreted to also cover the creation of the content of the artifact.
>>>
>>> 2) pav:createdOn -> When the Domeo user created the digital object
>>> The date of creation of the digital artifact or resource representation.
>>> The agents responsible can be indicated with pav:createdBy.
>>>
>>> 3) pav:createdAt -> Where the user created the digital object
>>> The geo-location of the agent that created the annotation.
>>>
>>> 4) pav:createdWith -> In may case the Domeo tool
>>> The software/tool used by the creator (pav:createdBy) when making the
>>> digital resource, for instance a word processor or an annotation tool. A
>>> more independent software agent that creates the resource without direct
>>> interactions by a human creator should instead be indicated using
>>> pav:createdBy.
>>>
>>> 5) pav:authoredBy -> The author of the original annotation on paper
>>> Indicates an agent that originated or gave existence to the work that is
>>> expressed by the digital resource. The author of the content of a resource
>>> may be different from the creator of that resource representation
>>> (pav:createdBy), although they are often the same. The author is usually
>>> not a software agent (which would be indicated with pav:createdWith,
>>> pav:createdBy or pav:importedBy), unless the software actually authored the
>>> content itself; for instance an artificial intelligence algorithm which
>>> authored a piece of music or a machine learning algorithm that authored a
>>> classification of a tumor sample
>>>
>>> 6) pav:authoredOn -> The date of the original annotation
>>> Indicates the date this resource was authored by the agents given by
>>> pav:authoredBy. Note that pav:authoredOn is different from pav:createdOn,
>>> although their values are often the same.
>>>
>>> In summary I have something like:
>>>
>>> <ann1> a oa:Annotation
>>>     pav:createdBy -Paolo-
>>>     pav:createdOn -today-
>>>     pav:createdWith -Domeo-
>>>     pav:createdAt -Boston location-
>>>     pav:authoredBy -Annotation’s author-
>>>     pav:authoredOn -Date of the original annotation-
>>>
>>> In other words, using PAV I can keep the distinction between the creator
>>> of the digital artifact and the author of the original content/association.
>>>
>>> However, there are possibly a couple of overlaps with the current OA
>>> properties. As I would like to provide the OA provenance as well, I am
>>> wondering which of the following applies:
>>> <ann1> a oa:Annotation ;
>>>      oa:annotatedBy <Paolo> .
>>> or
>>> <ann1> a oa:Annotation ;
>>>      oa:annotatedBy <OriginalAuthor> .
>>>
>>> Or compared to PAV:
>>> - pav:createdBy =? oa:annotatedBy --or--
>>> - pav:authoredBy =? oa:annotatedBy
>>>
>>> Looking at the Darwin’s example in the specs, if the student is
>>> digitizing a note from Darwin on his own content I would say:
>>> <ann2> a oa:Annotation
>>>     pav:createdBy -Student-
>>>     pav:createdOn -2013-
>>>     pav:createdWith -Domeo-
>>>     pav:createdAt -Boston location-
>>>     pav:authoredBy -Darwin-
>>>     pav:authoredOn -Date of the original annotation-
>>>
>>> Then of course the ‘body’ of the annotation can be also authored by the
>>> original author of the annotation. But, as pointed out above, it is
>>> important for me to attribute also the association of body and target to
>>> the original author as that represent the historical provenance of it.
>>>
>>> What this comes down to is basically what an oa:Annotation really is:
>>> “an Annotation expresses the relationship between two or more resources,
>>> and their metadata, using an RDF graph”. We talked about this before - my
>>> question here becomes if oa:annotatedBy indicates who formed the
>>> relationship (the ‘author’ of the conceptual annotation); or the person who
>>> (using some OA aware tools) formalized this as an oa:Annotation data
>>> structure (the RDF structure)?
>>>
>>> Best,
>>> Paolo
>>>
>>>
>>> [1] http://www.openannotation.org/**spec/core/core.html#Provenance<http://www.openannotation.org/spec/core/core.html#Provenance>
>>> [2] http://arxiv.org/abs/1304.7224
>>> [3] http://code.google.com/p/pav-**ontology/<http://code.google.com/p/pav-ontology/>
>>>
>>>
>>> --
>>> Dr. Paolo Ciccarese
>>> http://www.paolociccarese.**info/ <http://www.paolociccarese.info/>
>>> Biomedical Informatics Research & Development
>>> Instructor of Neurology at Harvard Medical School
>>> Assistant in Neuroscience at Mass General Hospital
>>> Member of the MGH Biomedical Informatics Core
>>> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>>>
>>> CONFIDENTIALITY NOTICE: This message is intended only for the
>>> addressee(s), may contain information that is considered
>>> to be sensitive or confidential and may not be forwarded or disclosed to
>>> any other party without the permission of the sender.
>>> If you have received this message in error, please notify the sender
>>> immediately.
>>>
>>
>>
>>
>
>
>
>
Received on Friday, 16 August 2013 12:39:15 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:38:23 UTC