- From: Robert Sanderson <azaroth42@gmail.com>
- Date: Mon, 19 Aug 2013 10:45:15 -0600
- To: Leyla Jael García Castro <leylajael@gmail.com>
- Cc: Paolo Ciccarese <paolo.ciccarese@gmail.com>, public-openannotation <public-openannotation@w3.org>
I agree with Paolo that there's not much benefit having two annotations in this case. Agent A doesn't really do anything useful in the provenance chain other than act as a workflow director, or at most the agent that does the serialization of the Annotation. So: <anno1> a oa:Annotation ; oa:hasTarget <target1> oa:hasBody <ontologyTerm> ; oa:annotatedBy <agentE> ; oa:serializedBy <agentA> ; oa:motivatedBy oa:identifying, oa:tagging . <target1> a oa:SpecificResource ; oa:hasSource <resourceR> ; oa:hasSelector <someSelectorGeneratedByAgentA> . Rob On Mon, Aug 19, 2013 at 9:38 AM, Leyla Jael García Castro <leylajael@gmail.com> wrote: > Hi Robert, all, > > Would you also recommend to have two annotations if the annotators are > software agents? > > Let me describe the scenario. An agent A takes a portion of text from > resource R, and sends it to an entity recognition tool E so E will identify > some terms and will associate them to a concept in an ontology. At the end A > parses what is retrieved from E and serializes the annotation(s). > > Using PAV, I ended up with something similar to what Paolo proposed for > Darwin's case, <annotation> pav:authoredBy <E>, and <annotation> > pav:createdBy <A>. Using OA, two annotations would be the way? If possible, > I rather to have only one annotation. > > Thanks, > Leyla > > > > On Mon, Aug 19, 2013 at 4:10 PM, Robert Sanderson <azaroth42@gmail.com> > wrote: >> >> Sorry for jumping in late, I was on vacation last week and offline. >> >> To quickly re-express the requirement: There is a physical object >> with some text (by Author A), and an annotation written on the object >> about that text (by Darwin). That physical annotation is transcribed >> as a digital annotation (by Student 1). Maintaining all of the actors >> and objects is important. >> >> To me this is multiple annotations, but slightly different from the >> ones that Stian proposes. >> >> Actors: AuthorA, Darwin, Student1 >> Objects: PhysicalTextWrittenByAuthorA, PhysicalTextWrittenByDarwin, >> DigitalTextTranscribedByStudent1, (and potentially the physical page >> on which the physical texts were written) >> >> Annotation 1 records that there is some text of Author A, and some >> text of Darwin, with a link between the two (the Annotation). >> >> <anno1> a oa:Annotation ; >> oa:hasBody <uuid1> ; // PhysicalTextWrittenByDarwin >> oa:hasTarget <uuid2> ; // PhysicalTextWrittenByAuthorA >> oa:motivation oa:commenting ; >> oa:annotatedBy <darwin> . >> >> <uuid1> a xxx:PhysicalText ; >> dc:creator <darwin> . >> >> <uuid2> a xxx:PhysicalText ; >> dc:creator <authorA> . >> >> This is the model of the real world physical object. Darwin wrote >> some text about something that AuthorA wrote, and by the act of >> writing it on the object it's an Annotation thus Darwin is the >> annotator and the motivation is commenting (or similar). However >> these are /physical/ things, not the digital transcription. As with >> any RDF description of real world objects or concepts, there's a >> disconnect between the description and the thing itself. >> >> And thus we need the transcription as a separate digital annotation: >> >> <anno2> a oa:Annotation ; >> oa:hasBody <transcription.txt> ; // DigitalTextTranscribedByStudent1 >> oa:hasTarget <uuid1> ; >> oa:motivation domeo:transcribing ; >> oa:annotatedBy <student1> . >> >> <transcription.txt> a cnt:ContentAsText, dcterms:Text ; >> cnt:chars "... Darwin's text here ..." . >> (Doesn't really matter who the dc:creator is for this content as all >> the actors are above) >> >> >> If you wanted to express it in terms of Shared Canvas, then you would >> introduce a Canvas to explicitly represent the physical page rather >> than just an identifier for the text itself, and the uuids would >> become segments of it. The only other difference would be the >> motivation of <anno2> would be sc:painting. Then you would associate >> the digitized image with the Canvas as a digital representation of the >> physical page, using another Annotation also with motivation >> sc:painting. >> >> Hope that helps, >> >> Rob >> >> >> On Thu, Aug 15, 2013 at 3:44 AM, Stian Soiland-Reyes >> <soiland-reyes@cs.manchester.ac.uk> wrote: >> > With my provenance hat on, I think this all depends on what is the >> > scope of an oa:Annotation and its creation. >> > >> > We have the same challenge with provenance of entities and documents >> > in general - if I write a letter in Word on Monday, and you (Paolo) >> > print it out on paper on Tuesday, and then on Wednesday Robert puts it >> > in an envelope and mails it, then who 'created' that thing that pops >> > in through the mailbox at the recipient? >> > >> > Well it depends what you consider that thing to be - as an envelope >> > with something inside, Robert made it, on Wednesday. As a printed >> > letter (which happen to have an envelope in transit), Paolo made it on >> > Tuesday, and as a conceptual letter, I wrote it on Monday. In a PROV >> > setting, we recommend everyone to think carefully about the extent of >> > their entity, in a way determining their life-span and what >> > aspects/attributes can be considered mutable or fixed. If more than >> > one kind of characterization is deemed necessary, then PROV has the >> > concepts of specialization and alternates to relate them to >> > each-other: http://www.w3.org/TR/prov-dm/#component5 >> > >> > Now at first glance I think this sounds like one of those use cases >> > where you would need multiple characterizations to model the >> > provenance correctly. A quick go: >> > >> > <origAnno1> a oa:Annotation ; >> > oa:annotatedBy <OriginalAuthor> ; >> > oa:hasTarget <somebook> . >> > >> > <anno1> a oa:Annotation ; >> > oa:annotatedBy <Paolo> ; >> > oa:specializationOf <origAnno1> ; >> > oa:hasTarget <somebook> . >> > >> > This does seem like a bit of duplication - and also a bit strange >> > considering both <origAnno1> and <anno1> are expressed as >> > oa:Annotations. This kind of split-up of the annotation could however >> > make sense in cases where the body/target are also at different >> > specialization levels: >> > >> > <conceptualAnno1> a oa:Annotation ; >> > oa:annotatedBy <OriginalAuthor> ; >> > oa:hasBody <note.txt> ; >> > oa:hasTarget <isbn:0-85131-041-9> . >> > >> > <instanceAnno1> a oa:Annotation ; >> > oa:annotatedBy <MrLibrarian> ; >> > oa:hasBody <scannedNote.jpeg> ; >> > oa:hasTarget <redBookOnShelf5> ; >> > prov:specializatonOf <conceptualAnno1> . >> > >> > <note1.txt> prov:alternateOf <scannedNote.jpeg> ; >> > prov:wasDerivedFrom <scannedNote.jpeg> . >> > >> > <redBookOnShelf5> prov:specializationOf <isbn:0-85131-041-9> . >> > >> > >> > (This could be expanded with the full FRBR model or equivalent) >> > >> > >> > We have discussed conceptual vs representational oa:Annotations earlier: >> > >> > >> > http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0051.html >> > >> > http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0027.html >> > >> > and the conclusion seemed to have been that it is simpler to merge the >> > conceptual annotation with the formalized annotation as a >> > datastructure. >> > >> > However, the discussion then did not delve into the provenance aspects >> > - what we still need to keep somewhat clear is what the two provenance >> > aspects we do provide cover for, annotatedBy/At and serialisedBy/At. >> > We have a PROV unrolling of these at >> > http://www.openannotation.org/spec/core/appendices.html#ProvMapping: >> > >> >> There are two Entities in the Open Annotation model, which for >> >> expediency and simplicity are collapsed into just oa:Annotation. These are >> >> the Annotation document, and the concept that the Annotation embodies or >> >> describes. This is the distinction between oa:annotatedBy and >> >> oa:annotatedAt, versus oa:serializedBy and oa:serializedAt. >> > >> > OK - the wording order here is wrong (annotation/document and >> > concept/serialized) - perhaps something to fix! But basically it says >> > that annotated* is who created it conceptually - so in your case: >> > >> > <ann1> oa:annotatedBy <OriginalAuthor> ; >> > oa:serializedBy <Domeo> . >> > >> > The reasoning being that it was OriginalAuthor who created the >> > relation between the body (his note) and the book (where he wrote his >> > note) - we consider the oa:Annotation as a conceptual entity that was >> > formed all those years ago, long time before RDF was invented. >> > >> > To record the digital formation of the oa:Annotation data structure as >> > distinct from its 'authorship', then you would need to use other >> > provenance properties - pav:curatedBy and pav:createdBy sounds like >> > good matches. I would not put <Paolo> as the serializer, unless he >> > more directly typed in the RDF. >> > >> > (Another practical consideration - I would side with Antoine here and >> > keep oa:serializedBy at RDF Graph level, so even if Paolo typed in >> > Turtle and Domeo put out RDF/XML, then it would still be serializedBy >> > <Paolo>.) >> > >> > >> > This said - there should not be anything in OA that prevents my >> > expanded form with specialization - but of course then you have to be >> > much more careful. You might wonder for inter-operability measures >> > what this would mean - well, an annotatoin mean different thing in >> > different systems and domains. For instance in my application, Wf4Ever >> > research objects, we even have annotations where the body is just an >> > RDF graph to declare the rdf:type of a resource - we needed something >> > like OA to structure this, because such statements could be made by a >> > user in the UI (and thus error-prone but more authorative), or >> > inferred by automatic scripts (which might be guessing wrongly). >> > >> > >> > >> > On 14 August 2013 15:00, Paolo Ciccarese <paolo.ciccarese@gmail.com> >> > wrote: >> >> Dear all, >> >> I would like to share a solution that I am currently implementing in >> >> Domeo >> >> in relation to provenance and a question related to it. Apologies in >> >> advance >> >> for the length of the email. >> >> >> >> Use Case: I am dealing with an existing annotation that is written on >> >> paper. >> >> The author of the annotation can be the author of the original >> >> manuscript or >> >> a third party (let's assume the latter for this example). The >> >> annotation is >> >> anchored in a specific location of the original text. My user is >> >> transforming that annotation into a OA annotation. It is very similar >> >> to the >> >> Darwin's annotation in the specs [1] but I got to a slightly different >> >> conclusion. >> >> >> >> I would like to keep track of: >> >> - the agent that creates the OA annotation >> >> - the application the agent used to create the annotation (could be >> >> different than the application that serialized the annotation) >> >> - the author of the body of the annotation (third party) >> >> - the author of the original association of the annotation with the >> >> original >> >> text >> >> >> >> In Domeo I use PAV (Provenance Authoring and Versioning ontology) >> >> [2][3] and >> >> I append to the oa:Annotation the following properties >> >> >> >> 1) pav:createdBy -> Domeo user >> >> An agent primarily responsible for encoding the digital artifact or >> >> resource >> >> representation. This creation is distinct from forming the content, >> >> which is >> >> indicated with pav:contributedBy or its subproperties. >> >> It is more specific than dct:createdBy - which might or might not be >> >> interpreted to also cover the creation of the content of the artifact. >> >> >> >> 2) pav:createdOn -> When the Domeo user created the digital object >> >> The date of creation of the digital artifact or resource >> >> representation. The >> >> agents responsible can be indicated with pav:createdBy. >> >> >> >> 3) pav:createdAt -> Where the user created the digital object >> >> The geo-location of the agent that created the annotation. >> >> >> >> 4) pav:createdWith -> In may case the Domeo tool >> >> The software/tool used by the creator (pav:createdBy) when making the >> >> digital resource, for instance a word processor or an annotation tool. >> >> A >> >> more independent software agent that creates the resource without >> >> direct >> >> interactions by a human creator should instead be indicated using >> >> pav:createdBy. >> >> >> >> 5) pav:authoredBy -> The author of the original annotation on paper >> >> Indicates an agent that originated or gave existence to the work that >> >> is >> >> expressed by the digital resource. The author of the content of a >> >> resource >> >> may be different from the creator of that resource representation >> >> (pav:createdBy), although they are often the same. The author is >> >> usually not >> >> a software agent (which would be indicated with pav:createdWith, >> >> pav:createdBy or pav:importedBy), unless the software actually authored >> >> the >> >> content itself; for instance an artificial intelligence algorithm which >> >> authored a piece of music or a machine learning algorithm that authored >> >> a >> >> classification of a tumor sample >> >> >> >> 6) pav:authoredOn -> The date of the original annotation >> >> Indicates the date this resource was authored by the agents given by >> >> pav:authoredBy. Note that pav:authoredOn is different from >> >> pav:createdOn, >> >> although their values are often the same. >> >> >> >> In summary I have something like: >> >> >> >> <ann1> a oa:Annotation >> >> pav:createdBy -Paolo- >> >> pav:createdOn -today- >> >> pav:createdWith -Domeo- >> >> pav:createdAt -Boston location- >> >> pav:authoredBy -Annotation’s author- >> >> pav:authoredOn -Date of the original annotation- >> >> >> >> In other words, using PAV I can keep the distinction between the >> >> creator of >> >> the digital artifact and the author of the original >> >> content/association. >> >> >> >> However, there are possibly a couple of overlaps with the current OA >> >> properties. As I would like to provide the OA provenance as well, I am >> >> wondering which of the following applies: >> >> <ann1> a oa:Annotation ; >> >> oa:annotatedBy <Paolo> . >> >> or >> >> <ann1> a oa:Annotation ; >> >> oa:annotatedBy <OriginalAuthor> . >> >> >> >> Or compared to PAV: >> >> - pav:createdBy =? oa:annotatedBy --or-- >> >> - pav:authoredBy =? oa:annotatedBy >> >> >> >> Looking at the Darwin’s example in the specs, if the student is >> >> digitizing a >> >> note from Darwin on his own content I would say: >> >> <ann2> a oa:Annotation >> >> pav:createdBy -Student- >> >> pav:createdOn -2013- >> >> pav:createdWith -Domeo- >> >> pav:createdAt -Boston location- >> >> pav:authoredBy -Darwin- >> >> pav:authoredOn -Date of the original annotation- >> >> >> >> Then of course the ‘body’ of the annotation can be also authored by the >> >> original author of the annotation. But, as pointed out above, it is >> >> important for me to attribute also the association of body and target >> >> to the >> >> original author as that represent the historical provenance of it. >> >> >> >> What this comes down to is basically what an oa:Annotation really is: >> >> “an >> >> Annotation expresses the relationship between two or more resources, >> >> and >> >> their metadata, using an RDF graph”. We talked about this before - my >> >> question here becomes if oa:annotatedBy indicates who formed the >> >> relationship (the ‘author’ of the conceptual annotation); or the person >> >> who >> >> (using some OA aware tools) formalized this as an oa:Annotation data >> >> structure (the RDF structure)? >> >> >> >> Best, >> >> Paolo >> >> >> >> >> >> [1] http://www.openannotation.org/spec/core/core.html#Provenance >> >> [2] http://arxiv.org/abs/1304.7224 >> >> [3] http://code.google.com/p/pav-ontology/ >> >> >> >> >> >> -- >> >> Dr. Paolo Ciccarese >> >> http://www.paolociccarese.info/ >> >> Biomedical Informatics Research & Development >> >> Instructor of Neurology at Harvard Medical School >> >> Assistant in Neuroscience at Mass General Hospital >> >> Member of the MGH Biomedical Informatics Core >> >> +1-857-366-1524 (mobile) +1-617-768-8744 (office) >> >> >> >> CONFIDENTIALITY NOTICE: This message is intended only for the >> >> addressee(s), >> >> may contain information that is considered >> >> to be sensitive or confidential and may not be forwarded or disclosed >> >> to any >> >> other party without the permission of the sender. >> >> If you have received this message in error, please notify the sender >> >> immediately. >> > >> > >> > >> > -- >> > Stian Soiland-Reyes, myGrid team >> > School of Computer Science >> > The University of Manchester >> > http://soiland-reyes.com/stian/work/ >> > http://orcid.org/0000-0001-9842-9718 >> > >> >
Received on Monday, 19 August 2013 16:45:45 UTC