Re: OA and provenance from Paolo Ciccarese on 2013-08-19 (public-openannotation@w3.org from August 2013)

From: Paolo Ciccarese <paolo.ciccarese@gmail.com>
Date: Mon, 19 Aug 2013 11:51:44 -0400
To: Leyla Jael García Castro <leylajael@gmail.com>
Cc: Robert Sanderson <azaroth42@gmail.com>, Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, public-openannotation <public-openannotation@w3.org>
Message-ID: <CAFPX2kCw5xAxqedcqYjRuON8uG1_Hk9-SC6aa5QqewUavfj14A@mail.gmail.com>
Rob,
let's say I go with the two annotations you propose and I have this (your
first annotation):

<anno1> a oa:Annotation ;
  oa:hasBody <uuid1> ;    // PhysicalTextWrittenByDarwin
  oa:hasTarget <uuid2> ;  // PhysicalTextWrittenByAuthorA
  oa:motivation oa:commenting ;
  oa:annotatedBy <darwin> .

<uuid1> a xxx:PhysicalText ;
  dc:creator <darwin> .

<uuid2> a xxx:PhysicalText ;
  dc:creator <authorA> .

What is the provenance of the digital artifact? Darwin created the physical
representation.
Who contributed the digital one above?
In my application  I would still have at least a 'curatedBy' somebody that
wrote the above triples.

Layla, I think for text mining results/entity recognition you can still
have one annotation.
If I understand correctly your scenario, I would just do

<ann1>
      oa:annotatedBy <AgentE>
      pav:createdBy <AgentA>
      pav:createdWith <Domeo>    // Domeo triggered the algorithm and
normalized the results in my case

or

<ann1>
      oa:annotatedBy <AgentE>
      oa:annotatedBy <AgentA>
      pav:createdBy <AgentA>
      pav:contributedBy <AgentE>
      pav:createdWith <Domeo>    // Domeo triggered the algorithm and
normalized the results in my case

I don't see the point of creating the 'canonical' annotation in this case.

Paolo



On Mon, Aug 19, 2013 at 11:38 AM, Leyla Jael García Castro <
leylajael@gmail.com> wrote:

> Hi Robert, all,
>
> Would you also recommend to have two annotations if the annotators are
> software agents?
>
> Let me describe the scenario. An agent A takes a portion of text from
> resource R, and sends it to an entity recognition tool E so E will identify
> some terms and will associate them to a concept in an ontology. At the end
> A parses what is retrieved from E and serializes the annotation(s).
>
> Using PAV, I ended up with something similar to what Paolo proposed for
> Darwin's case, <annotation> pav:authoredBy <E>, and <annotation>
> pav:createdBy <A>. Using OA, two annotations would be the way? If possible,
> I rather to have only one annotation.
>
> Thanks,
> Leyla
>
>
>
> On Mon, Aug 19, 2013 at 4:10 PM, Robert Sanderson <azaroth42@gmail.com>wrote:
>
>> Sorry for jumping in late, I was on vacation last week and offline.
>>
>> To quickly re-express the requirement:  There is a physical object
>> with some text (by Author A), and an annotation written on the object
>> about that text (by Darwin). That physical annotation is transcribed
>> as a digital annotation (by Student 1). Maintaining all of the actors
>> and objects is important.
>>
>> To me this is multiple annotations, but slightly different from the
>> ones that Stian proposes.
>>
>> Actors: AuthorA, Darwin, Student1
>> Objects: PhysicalTextWrittenByAuthorA, PhysicalTextWrittenByDarwin,
>> DigitalTextTranscribedByStudent1, (and potentially the physical page
>> on which the physical texts were written)
>>
>> Annotation 1 records that there is some text of Author A, and some
>> text of Darwin, with a link between the two (the Annotation).
>>
>> <anno1> a oa:Annotation ;
>>   oa:hasBody <uuid1> ;    // PhysicalTextWrittenByDarwin
>>   oa:hasTarget <uuid2> ;  // PhysicalTextWrittenByAuthorA
>>   oa:motivation oa:commenting ;
>>   oa:annotatedBy <darwin> .
>>
>> <uuid1> a xxx:PhysicalText ;
>>   dc:creator <darwin> .
>>
>> <uuid2> a xxx:PhysicalText ;
>>   dc:creator <authorA> .
>>
>> This is the model of the real world physical object.  Darwin wrote
>> some text about something that AuthorA wrote, and by the act of
>> writing it on the object it's an Annotation thus Darwin is the
>> annotator and the motivation is commenting (or similar).  However
>> these are /physical/ things, not the digital transcription.  As with
>> any RDF description of real world objects or concepts, there's a
>> disconnect between the description and the thing itself.
>>
>> And thus we need the transcription as a separate digital annotation:
>>
>> <anno2> a oa:Annotation ;
>>   oa:hasBody <transcription.txt> ;  // DigitalTextTranscribedByStudent1
>>   oa:hasTarget <uuid1> ;
>>   oa:motivation domeo:transcribing ;
>>   oa:annotatedBy <student1> .
>>
>> <transcription.txt> a cnt:ContentAsText, dcterms:Text ;
>>   cnt:chars "... Darwin's text here ..." .
>>   (Doesn't really matter who the dc:creator is for this content as all
>> the actors are above)
>>
>>
>> If you wanted to express it in terms of Shared Canvas, then you would
>> introduce a Canvas to explicitly represent the physical page rather
>> than just an identifier for the text itself, and the uuids would
>> become segments of it.  The only other difference would be the
>> motivation of <anno2> would be sc:painting.  Then you would associate
>> the digitized image with the Canvas as a digital representation of the
>> physical page, using another Annotation also with motivation
>> sc:painting.
>>
>> Hope that helps,
>>
>> Rob
>>
>>
>> On Thu, Aug 15, 2013 at 3:44 AM, Stian Soiland-Reyes
>> <soiland-reyes@cs.manchester.ac.uk> wrote:
>> > With my provenance hat on, I think this all depends on what is the
>> > scope of an oa:Annotation and its creation.
>> >
>> > We have the same challenge with provenance of entities and documents
>> > in general - if I write a letter in Word on Monday, and you (Paolo)
>> > print it out on paper on Tuesday, and then on Wednesday Robert puts it
>> > in an envelope and mails it, then who 'created' that thing that pops
>> > in through the mailbox at the recipient?
>> >
>> > Well it depends what you consider that thing to be - as an envelope
>> > with something inside, Robert made it, on Wednesday. As a printed
>> > letter (which happen to have an envelope in transit), Paolo made it on
>> > Tuesday, and as a conceptual letter, I wrote it on Monday. In a PROV
>> > setting, we recommend everyone to think carefully about the extent of
>> > their entity, in a way determining their life-span and what
>> > aspects/attributes can be considered mutable or fixed. If more than
>> > one kind of characterization is deemed necessary, then PROV has the
>> > concepts of specialization and alternates to relate them to
>> > each-other: http://www.w3.org/TR/prov-dm/#component5
>> >
>> > Now at first glance I think this sounds like one of those use cases
>> > where you would need multiple characterizations to model the
>> > provenance correctly. A quick go:
>> >
>> > <origAnno1> a oa:Annotation ;
>> >   oa:annotatedBy <OriginalAuthor> ;
>> >   oa:hasTarget <somebook> .
>> >
>> > <anno1> a oa:Annotation ;
>> >   oa:annotatedBy <Paolo> ;
>> >   oa:specializationOf <origAnno1> ;
>> >   oa:hasTarget <somebook> .
>> >
>> > This does seem like a bit of duplication - and also a bit strange
>> > considering both <origAnno1> and <anno1> are expressed as
>> > oa:Annotations. This kind of split-up of the annotation could however
>> > make sense in cases where the body/target are also at different
>> > specialization levels:
>> >
>> > <conceptualAnno1> a oa:Annotation ;
>> >   oa:annotatedBy <OriginalAuthor> ;
>> >   oa:hasBody <note.txt> ;
>> >   oa:hasTarget <isbn:0-85131-041-9> .
>> >
>> > <instanceAnno1> a oa:Annotation ;
>> >   oa:annotatedBy <MrLibrarian> ;
>> >   oa:hasBody <scannedNote.jpeg> ;
>> >   oa:hasTarget <redBookOnShelf5> ;
>> >   prov:specializatonOf <conceptualAnno1> .
>> >
>> > <note1.txt> prov:alternateOf <scannedNote.jpeg> ;
>> >     prov:wasDerivedFrom <scannedNote.jpeg> .
>> >
>> > <redBookOnShelf5> prov:specializationOf <isbn:0-85131-041-9> .
>> >
>> >
>> > (This could be expanded with the full FRBR model or equivalent)
>> >
>> >
>> > We have discussed conceptual vs representational oa:Annotations earlier:
>> >
>> >
>> http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0051.html
>> >
>> http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0027.html
>> >
>> > and the conclusion seemed to have been that it is simpler to merge the
>> > conceptual annotation with the formalized annotation as a
>> > datastructure.
>> >
>> > However, the discussion then did not delve into the provenance aspects
>> > - what we still need to keep somewhat clear is what the two provenance
>> > aspects we do provide cover for, annotatedBy/At and serialisedBy/At.
>> > We have a PROV unrolling of these at
>> > http://www.openannotation.org/spec/core/appendices.html#ProvMapping:
>> >
>> >>  There are two Entities in the Open Annotation model, which for
>> expediency and simplicity are collapsed into just oa:Annotation. These are
>> the Annotation document, and the concept that the Annotation embodies or
>> describes. This is the distinction between oa:annotatedBy and
>> oa:annotatedAt, versus oa:serializedBy and oa:serializedAt.
>> >
>> > OK - the wording order here is wrong (annotation/document and
>> > concept/serialized) - perhaps something to fix! But basically it says
>> > that annotated* is who created it conceptually - so in your case:
>> >
>> >   <ann1>  oa:annotatedBy <OriginalAuthor> ;
>> >           oa:serializedBy <Domeo> .
>> >
>> > The reasoning being that it was OriginalAuthor who created the
>> > relation between the body (his note) and the book (where he wrote his
>> > note) - we consider the oa:Annotation as a conceptual entity that was
>> > formed all those years ago, long time before RDF was invented.
>> >
>> > To record the digital formation of the oa:Annotation data structure as
>> > distinct from its 'authorship', then you would need to use other
>> > provenance properties - pav:curatedBy and pav:createdBy sounds like
>> > good matches. I would not put <Paolo> as the serializer, unless he
>> > more directly typed in the RDF.
>> >
>> > (Another practical consideration - I would side with Antoine here and
>> > keep oa:serializedBy at RDF Graph level, so even if Paolo typed in
>> > Turtle and Domeo put out RDF/XML, then it would still be serializedBy
>> > <Paolo>.)
>> >
>> >
>> > This said - there should not be anything in OA that prevents my
>> > expanded form with specialization - but of course then you have to be
>> > much more careful. You might wonder for inter-operability measures
>> > what this would mean - well, an annotatoin mean different thing in
>> > different systems and domains. For instance in my application, Wf4Ever
>> > research objects, we even have annotations where the body is just an
>> > RDF graph to declare the rdf:type of a resource - we needed something
>> > like OA to structure this, because such statements could be made by a
>> > user in the UI (and thus error-prone but more authorative), or
>> > inferred by automatic scripts (which might be guessing wrongly).
>> >
>> >
>> >
>> > On 14 August 2013 15:00, Paolo Ciccarese <paolo.ciccarese@gmail.com>
>> wrote:
>> >> Dear all,
>> >> I would like to share a solution that I am currently implementing in
>> Domeo
>> >> in relation to provenance and a question related to it. Apologies in
>> advance
>> >> for the length of the email.
>> >>
>> >> Use Case: I am dealing with an existing annotation that is written on
>> paper.
>> >> The author of the annotation can be the author of the original
>> manuscript or
>> >> a third party (let's assume the latter for this example). The
>> annotation is
>> >> anchored in a specific location of the original text. My user is
>> >> transforming that annotation into a OA annotation. It is very similar
>> to the
>> >> Darwin's annotation in the specs [1] but I got to a slightly different
>> >> conclusion.
>> >>
>> >> I would like to keep track of:
>> >> - the agent that creates the OA annotation
>> >> - the application the agent used to create the annotation (could be
>> >> different than the application that serialized the annotation)
>> >> - the author of the body of the annotation (third party)
>> >> - the author of the original association of the annotation with the
>> original
>> >> text
>> >>
>> >> In Domeo I use PAV (Provenance Authoring and Versioning ontology)
>> [2][3] and
>> >> I append to the oa:Annotation the following properties
>> >>
>> >> 1) pav:createdBy -> Domeo user
>> >> An agent primarily responsible for encoding the digital artifact or
>> resource
>> >> representation. This creation is distinct from forming the content,
>> which is
>> >> indicated with pav:contributedBy or its subproperties.
>> >> It is more specific than dct:createdBy - which might or might not be
>> >> interpreted to also cover the creation of the content of the artifact.
>> >>
>> >> 2) pav:createdOn -> When the Domeo user created the digital object
>> >> The date of creation of the digital artifact or resource
>> representation. The
>> >> agents responsible can be indicated with pav:createdBy.
>> >>
>> >> 3) pav:createdAt -> Where the user created the digital object
>> >> The geo-location of the agent that created the annotation.
>> >>
>> >> 4) pav:createdWith -> In may case the Domeo tool
>> >> The software/tool used by the creator (pav:createdBy) when making the
>> >> digital resource, for instance a word processor or an annotation tool.
>> A
>> >> more independent software agent that creates the resource without
>> direct
>> >> interactions by a human creator should instead be indicated using
>> >> pav:createdBy.
>> >>
>> >> 5) pav:authoredBy -> The author of the original annotation on paper
>> >> Indicates an agent that originated or gave existence to the work that
>> is
>> >> expressed by the digital resource. The author of the content of a
>> resource
>> >> may be different from the creator of that resource representation
>> >> (pav:createdBy), although they are often the same. The author is
>> usually not
>> >> a software agent (which would be indicated with pav:createdWith,
>> >> pav:createdBy or pav:importedBy), unless the software actually
>> authored the
>> >> content itself; for instance an artificial intelligence algorithm which
>> >> authored a piece of music or a machine learning algorithm that
>> authored a
>> >> classification of a tumor sample
>> >>
>> >> 6) pav:authoredOn -> The date of the original annotation
>> >> Indicates the date this resource was authored by the agents given by
>> >> pav:authoredBy. Note that pav:authoredOn is different from
>> pav:createdOn,
>> >> although their values are often the same.
>> >>
>> >> In summary I have something like:
>> >>
>> >> <ann1> a oa:Annotation
>> >>    pav:createdBy -Paolo-
>> >>    pav:createdOn -today-
>> >>    pav:createdWith -Domeo-
>> >>    pav:createdAt -Boston location-
>> >>    pav:authoredBy -Annotation’s author-
>> >>    pav:authoredOn -Date of the original annotation-
>> >>
>> >> In other words, using PAV I can keep the distinction between the
>> creator of
>> >> the digital artifact and the author of the original
>> content/association.
>> >>
>> >> However, there are possibly a couple of overlaps with the current OA
>> >> properties. As I would like to provide the OA provenance as well, I am
>> >> wondering which of the following applies:
>> >> <ann1> a oa:Annotation ;
>> >>     oa:annotatedBy <Paolo> .
>> >> or
>> >> <ann1> a oa:Annotation ;
>> >>     oa:annotatedBy <OriginalAuthor> .
>> >>
>> >> Or compared to PAV:
>> >> - pav:createdBy =? oa:annotatedBy --or--
>> >> - pav:authoredBy =? oa:annotatedBy
>> >>
>> >> Looking at the Darwin’s example in the specs, if the student is
>> digitizing a
>> >> note from Darwin on his own content I would say:
>> >> <ann2> a oa:Annotation
>> >>    pav:createdBy -Student-
>> >>    pav:createdOn -2013-
>> >>    pav:createdWith -Domeo-
>> >>    pav:createdAt -Boston location-
>> >>    pav:authoredBy -Darwin-
>> >>    pav:authoredOn -Date of the original annotation-
>> >>
>> >> Then of course the ‘body’ of the annotation can be also authored by the
>> >> original author of the annotation. But, as pointed out above, it is
>> >> important for me to attribute also the association of body and target
>> to the
>> >> original author as that represent the historical provenance of it.
>> >>
>> >> What this comes down to is basically what an oa:Annotation really is:
>> “an
>> >> Annotation expresses the relationship between two or more resources,
>> and
>> >> their metadata, using an RDF graph”. We talked about this before - my
>> >> question here becomes if oa:annotatedBy indicates who formed the
>> >> relationship (the ‘author’ of the conceptual annotation); or the
>> person who
>> >> (using some OA aware tools) formalized this as an oa:Annotation data
>> >> structure (the RDF structure)?
>> >>
>> >> Best,
>> >> Paolo
>> >>
>> >>
>> >> [1] http://www.openannotation.org/spec/core/core.html#Provenance
>> >> [2] http://arxiv.org/abs/1304.7224
>> >> [3] http://code.google.com/p/pav-ontology/
>> >>
>> >>
>> >> --
>> >> Dr. Paolo Ciccarese
>> >> http://www.paolociccarese.info/
>> >> Biomedical Informatics Research & Development
>> >> Instructor of Neurology at Harvard Medical School
>> >> Assistant in Neuroscience at Mass General Hospital
>> >> Member of the MGH Biomedical Informatics Core
>> >> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
>> >>
>> >> CONFIDENTIALITY NOTICE: This message is intended only for the
>> addressee(s),
>> >> may contain information that is considered
>> >> to be sensitive or confidential and may not be forwarded or disclosed
>> to any
>> >> other party without the permission of the sender.
>> >> If you have received this message in error, please notify the sender
>> >> immediately.
>> >
>> >
>> >
>> > --
>> > Stian Soiland-Reyes, myGrid team
>> > School of Computer Science
>> > The University of Manchester
>> > http://soiland-reyes.com/stian/work/
>> http://orcid.org/0000-0001-9842-9718
>> >
>>
>>
>


-- 
Dr. Paolo Ciccarese
http://www.paolociccarese.info/
Biomedical Informatics Research & Development
Instructor of Neurology at Harvard Medical School
Assistant in Neuroscience at Mass General Hospital
Member of the MGH Biomedical Informatics Core
+1-857-366-1524 (mobile)   +1-617-768-8744 (office)

CONFIDENTIALITY NOTICE: This message is intended only for the addressee(s),
may contain information that is considered
to be sensitive or confidential and may not be forwarded or disclosed to
any other party without the permission of the sender.
If you have received this message in error, please notify the sender
immediately.
Received on Monday, 19 August 2013 15:52:13 UTC