W3C home > Mailing lists > Public > public-openannotation@w3.org > August 2013

Re: OA and provenance

From: Leyla Jael García Castro <leylajael@gmail.com>
Date: Mon, 19 Aug 2013 16:38:14 +0100
Message-ID: <CACLxDV5DnVk26nh7a_R8Hr42abJLQJwbUENS=9yBDVbXG+Bxog@mail.gmail.com>
To: Robert Sanderson <azaroth42@gmail.com>
Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Paolo Ciccarese <paolo.ciccarese@gmail.com>, public-openannotation <public-openannotation@w3.org>
Hi Robert, all,

Would you also recommend to have two annotations if the annotators are
software agents?

Let me describe the scenario. An agent A takes a portion of text from
resource R, and sends it to an entity recognition tool E so E will identify
some terms and will associate them to a concept in an ontology. At the end
A parses what is retrieved from E and serializes the annotation(s).

Using PAV, I ended up with something similar to what Paolo proposed for
Darwin's case, <annotation> pav:authoredBy <E>, and <annotation>
pav:createdBy <A>. Using OA, two annotations would be the way? If possible,
I rather to have only one annotation.

Thanks,
Leyla



On Mon, Aug 19, 2013 at 4:10 PM, Robert Sanderson <azaroth42@gmail.com>wrote:

> Sorry for jumping in late, I was on vacation last week and offline.
>
> To quickly re-express the requirement:  There is a physical object
> with some text (by Author A), and an annotation written on the object
> about that text (by Darwin). That physical annotation is transcribed
> as a digital annotation (by Student 1). Maintaining all of the actors
> and objects is important.
>
> To me this is multiple annotations, but slightly different from the
> ones that Stian proposes.
>
> Actors: AuthorA, Darwin, Student1
> Objects: PhysicalTextWrittenByAuthorA, PhysicalTextWrittenByDarwin,
> DigitalTextTranscribedByStudent1, (and potentially the physical page
> on which the physical texts were written)
>
> Annotation 1 records that there is some text of Author A, and some
> text of Darwin, with a link between the two (the Annotation).
>
> <anno1> a oa:Annotation ;
>   oa:hasBody <uuid1> ;    // PhysicalTextWrittenByDarwin
>   oa:hasTarget <uuid2> ;  // PhysicalTextWrittenByAuthorA
>   oa:motivation oa:commenting ;
>   oa:annotatedBy <darwin> .
>
> <uuid1> a xxx:PhysicalText ;
>   dc:creator <darwin> .
>
> <uuid2> a xxx:PhysicalText ;
>   dc:creator <authorA> .
>
> This is the model of the real world physical object.  Darwin wrote
> some text about something that AuthorA wrote, and by the act of
> writing it on the object it's an Annotation thus Darwin is the
> annotator and the motivation is commenting (or similar).  However
> these are /physical/ things, not the digital transcription.  As with
> any RDF description of real world objects or concepts, there's a
> disconnect between the description and the thing itself.
>
> And thus we need the transcription as a separate digital annotation:
>
> <anno2> a oa:Annotation ;
>   oa:hasBody <transcription.txt> ;  // DigitalTextTranscribedByStudent1
>   oa:hasTarget <uuid1> ;
>   oa:motivation domeo:transcribing ;
>   oa:annotatedBy <student1> .
>
> <transcription.txt> a cnt:ContentAsText, dcterms:Text ;
>   cnt:chars "... Darwin's text here ..." .
>   (Doesn't really matter who the dc:creator is for this content as all
> the actors are above)
>
>
> If you wanted to express it in terms of Shared Canvas, then you would
> introduce a Canvas to explicitly represent the physical page rather
> than just an identifier for the text itself, and the uuids would
> become segments of it.  The only other difference would be the
> motivation of <anno2> would be sc:painting.  Then you would associate
> the digitized image with the Canvas as a digital representation of the
> physical page, using another Annotation also with motivation
> sc:painting.
>
> Hope that helps,
>
> Rob
>
>
> On Thu, Aug 15, 2013 at 3:44 AM, Stian Soiland-Reyes
> <soiland-reyes@cs.manchester.ac.uk> wrote:
> > With my provenance hat on, I think this all depends on what is the
> > scope of an oa:Annotation and its creation.
> >
> > We have the same challenge with provenance of entities and documents
> > in general - if I write a letter in Word on Monday, and you (Paolo)
> > print it out on paper on Tuesday, and then on Wednesday Robert puts it
> > in an envelope and mails it, then who 'created' that thing that pops
> > in through the mailbox at the recipient?
> >
> > Well it depends what you consider that thing to be - as an envelope
> > with something inside, Robert made it, on Wednesday. As a printed
> > letter (which happen to have an envelope in transit), Paolo made it on
> > Tuesday, and as a conceptual letter, I wrote it on Monday. In a PROV
> > setting, we recommend everyone to think carefully about the extent of
> > their entity, in a way determining their life-span and what
> > aspects/attributes can be considered mutable or fixed. If more than
> > one kind of characterization is deemed necessary, then PROV has the
> > concepts of specialization and alternates to relate them to
> > each-other: http://www.w3.org/TR/prov-dm/#component5
> >
> > Now at first glance I think this sounds like one of those use cases
> > where you would need multiple characterizations to model the
> > provenance correctly. A quick go:
> >
> > <origAnno1> a oa:Annotation ;
> >   oa:annotatedBy <OriginalAuthor> ;
> >   oa:hasTarget <somebook> .
> >
> > <anno1> a oa:Annotation ;
> >   oa:annotatedBy <Paolo> ;
> >   oa:specializationOf <origAnno1> ;
> >   oa:hasTarget <somebook> .
> >
> > This does seem like a bit of duplication - and also a bit strange
> > considering both <origAnno1> and <anno1> are expressed as
> > oa:Annotations. This kind of split-up of the annotation could however
> > make sense in cases where the body/target are also at different
> > specialization levels:
> >
> > <conceptualAnno1> a oa:Annotation ;
> >   oa:annotatedBy <OriginalAuthor> ;
> >   oa:hasBody <note.txt> ;
> >   oa:hasTarget <isbn:0-85131-041-9> .
> >
> > <instanceAnno1> a oa:Annotation ;
> >   oa:annotatedBy <MrLibrarian> ;
> >   oa:hasBody <scannedNote.jpeg> ;
> >   oa:hasTarget <redBookOnShelf5> ;
> >   prov:specializatonOf <conceptualAnno1> .
> >
> > <note1.txt> prov:alternateOf <scannedNote.jpeg> ;
> >     prov:wasDerivedFrom <scannedNote.jpeg> .
> >
> > <redBookOnShelf5> prov:specializationOf <isbn:0-85131-041-9> .
> >
> >
> > (This could be expanded with the full FRBR model or equivalent)
> >
> >
> > We have discussed conceptual vs representational oa:Annotations earlier:
> >
> >
> http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0051.html
> >
> http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0027.html
> >
> > and the conclusion seemed to have been that it is simpler to merge the
> > conceptual annotation with the formalized annotation as a
> > datastructure.
> >
> > However, the discussion then did not delve into the provenance aspects
> > - what we still need to keep somewhat clear is what the two provenance
> > aspects we do provide cover for, annotatedBy/At and serialisedBy/At.
> > We have a PROV unrolling of these at
> > http://www.openannotation.org/spec/core/appendices.html#ProvMapping:
> >
> >>  There are two Entities in the Open Annotation model, which for
> expediency and simplicity are collapsed into just oa:Annotation. These are
> the Annotation document, and the concept that the Annotation embodies or
> describes. This is the distinction between oa:annotatedBy and
> oa:annotatedAt, versus oa:serializedBy and oa:serializedAt.
> >
> > OK - the wording order here is wrong (annotation/document and
> > concept/serialized) - perhaps something to fix! But basically it says
> > that annotated* is who created it conceptually - so in your case:
> >
> >   <ann1>  oa:annotatedBy <OriginalAuthor> ;
> >           oa:serializedBy <Domeo> .
> >
> > The reasoning being that it was OriginalAuthor who created the
> > relation between the body (his note) and the book (where he wrote his
> > note) - we consider the oa:Annotation as a conceptual entity that was
> > formed all those years ago, long time before RDF was invented.
> >
> > To record the digital formation of the oa:Annotation data structure as
> > distinct from its 'authorship', then you would need to use other
> > provenance properties - pav:curatedBy and pav:createdBy sounds like
> > good matches. I would not put <Paolo> as the serializer, unless he
> > more directly typed in the RDF.
> >
> > (Another practical consideration - I would side with Antoine here and
> > keep oa:serializedBy at RDF Graph level, so even if Paolo typed in
> > Turtle and Domeo put out RDF/XML, then it would still be serializedBy
> > <Paolo>.)
> >
> >
> > This said - there should not be anything in OA that prevents my
> > expanded form with specialization - but of course then you have to be
> > much more careful. You might wonder for inter-operability measures
> > what this would mean - well, an annotatoin mean different thing in
> > different systems and domains. For instance in my application, Wf4Ever
> > research objects, we even have annotations where the body is just an
> > RDF graph to declare the rdf:type of a resource - we needed something
> > like OA to structure this, because such statements could be made by a
> > user in the UI (and thus error-prone but more authorative), or
> > inferred by automatic scripts (which might be guessing wrongly).
> >
> >
> >
> > On 14 August 2013 15:00, Paolo Ciccarese <paolo.ciccarese@gmail.com>
> wrote:
> >> Dear all,
> >> I would like to share a solution that I am currently implementing in
> Domeo
> >> in relation to provenance and a question related to it. Apologies in
> advance
> >> for the length of the email.
> >>
> >> Use Case: I am dealing with an existing annotation that is written on
> paper.
> >> The author of the annotation can be the author of the original
> manuscript or
> >> a third party (let's assume the latter for this example). The
> annotation is
> >> anchored in a specific location of the original text. My user is
> >> transforming that annotation into a OA annotation. It is very similar
> to the
> >> Darwin's annotation in the specs [1] but I got to a slightly different
> >> conclusion.
> >>
> >> I would like to keep track of:
> >> - the agent that creates the OA annotation
> >> - the application the agent used to create the annotation (could be
> >> different than the application that serialized the annotation)
> >> - the author of the body of the annotation (third party)
> >> - the author of the original association of the annotation with the
> original
> >> text
> >>
> >> In Domeo I use PAV (Provenance Authoring and Versioning ontology)
> [2][3] and
> >> I append to the oa:Annotation the following properties
> >>
> >> 1) pav:createdBy -> Domeo user
> >> An agent primarily responsible for encoding the digital artifact or
> resource
> >> representation. This creation is distinct from forming the content,
> which is
> >> indicated with pav:contributedBy or its subproperties.
> >> It is more specific than dct:createdBy - which might or might not be
> >> interpreted to also cover the creation of the content of the artifact.
> >>
> >> 2) pav:createdOn -> When the Domeo user created the digital object
> >> The date of creation of the digital artifact or resource
> representation. The
> >> agents responsible can be indicated with pav:createdBy.
> >>
> >> 3) pav:createdAt -> Where the user created the digital object
> >> The geo-location of the agent that created the annotation.
> >>
> >> 4) pav:createdWith -> In may case the Domeo tool
> >> The software/tool used by the creator (pav:createdBy) when making the
> >> digital resource, for instance a word processor or an annotation tool. A
> >> more independent software agent that creates the resource without direct
> >> interactions by a human creator should instead be indicated using
> >> pav:createdBy.
> >>
> >> 5) pav:authoredBy -> The author of the original annotation on paper
> >> Indicates an agent that originated or gave existence to the work that is
> >> expressed by the digital resource. The author of the content of a
> resource
> >> may be different from the creator of that resource representation
> >> (pav:createdBy), although they are often the same. The author is
> usually not
> >> a software agent (which would be indicated with pav:createdWith,
> >> pav:createdBy or pav:importedBy), unless the software actually authored
> the
> >> content itself; for instance an artificial intelligence algorithm which
> >> authored a piece of music or a machine learning algorithm that authored
> a
> >> classification of a tumor sample
> >>
> >> 6) pav:authoredOn -> The date of the original annotation
> >> Indicates the date this resource was authored by the agents given by
> >> pav:authoredBy. Note that pav:authoredOn is different from
> pav:createdOn,
> >> although their values are often the same.
> >>
> >> In summary I have something like:
> >>
> >> <ann1> a oa:Annotation
> >>    pav:createdBy -Paolo-
> >>    pav:createdOn -today-
> >>    pav:createdWith -Domeo-
> >>    pav:createdAt -Boston location-
> >>    pav:authoredBy -Annotation’s author-
> >>    pav:authoredOn -Date of the original annotation-
> >>
> >> In other words, using PAV I can keep the distinction between the
> creator of
> >> the digital artifact and the author of the original content/association.
> >>
> >> However, there are possibly a couple of overlaps with the current OA
> >> properties. As I would like to provide the OA provenance as well, I am
> >> wondering which of the following applies:
> >> <ann1> a oa:Annotation ;
> >>     oa:annotatedBy <Paolo> .
> >> or
> >> <ann1> a oa:Annotation ;
> >>     oa:annotatedBy <OriginalAuthor> .
> >>
> >> Or compared to PAV:
> >> - pav:createdBy =? oa:annotatedBy --or--
> >> - pav:authoredBy =? oa:annotatedBy
> >>
> >> Looking at the Darwin’s example in the specs, if the student is
> digitizing a
> >> note from Darwin on his own content I would say:
> >> <ann2> a oa:Annotation
> >>    pav:createdBy -Student-
> >>    pav:createdOn -2013-
> >>    pav:createdWith -Domeo-
> >>    pav:createdAt -Boston location-
> >>    pav:authoredBy -Darwin-
> >>    pav:authoredOn -Date of the original annotation-
> >>
> >> Then of course the ‘body’ of the annotation can be also authored by the
> >> original author of the annotation. But, as pointed out above, it is
> >> important for me to attribute also the association of body and target
> to the
> >> original author as that represent the historical provenance of it.
> >>
> >> What this comes down to is basically what an oa:Annotation really is:
> “an
> >> Annotation expresses the relationship between two or more resources, and
> >> their metadata, using an RDF graph”. We talked about this before - my
> >> question here becomes if oa:annotatedBy indicates who formed the
> >> relationship (the ‘author’ of the conceptual annotation); or the person
> who
> >> (using some OA aware tools) formalized this as an oa:Annotation data
> >> structure (the RDF structure)?
> >>
> >> Best,
> >> Paolo
> >>
> >>
> >> [1] http://www.openannotation.org/spec/core/core.html#Provenance
> >> [2] http://arxiv.org/abs/1304.7224
> >> [3] http://code.google.com/p/pav-ontology/
> >>
> >>
> >> --
> >> Dr. Paolo Ciccarese
> >> http://www.paolociccarese.info/
> >> Biomedical Informatics Research & Development
> >> Instructor of Neurology at Harvard Medical School
> >> Assistant in Neuroscience at Mass General Hospital
> >> Member of the MGH Biomedical Informatics Core
> >> +1-857-366-1524 (mobile)   +1-617-768-8744 (office)
> >>
> >> CONFIDENTIALITY NOTICE: This message is intended only for the
> addressee(s),
> >> may contain information that is considered
> >> to be sensitive or confidential and may not be forwarded or disclosed
> to any
> >> other party without the permission of the sender.
> >> If you have received this message in error, please notify the sender
> >> immediately.
> >
> >
> >
> > --
> > Stian Soiland-Reyes, myGrid team
> > School of Computer Science
> > The University of Manchester
> > http://soiland-reyes.com/stian/work/
> http://orcid.org/0000-0001-9842-9718
> >
>
>
Received on Monday, 19 August 2013 15:39:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:38:23 UTC