- From: Leyla Jael García Castro <leylajael@gmail.com>
- Date: Mon, 19 Aug 2013 16:38:14 +0100
- To: Robert Sanderson <azaroth42@gmail.com>
- Cc: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>, Paolo Ciccarese <paolo.ciccarese@gmail.com>, public-openannotation <public-openannotation@w3.org>
- Message-ID: <CACLxDV5DnVk26nh7a_R8Hr42abJLQJwbUENS=9yBDVbXG+Bxog@mail.gmail.com>
Hi Robert, all, Would you also recommend to have two annotations if the annotators are software agents? Let me describe the scenario. An agent A takes a portion of text from resource R, and sends it to an entity recognition tool E so E will identify some terms and will associate them to a concept in an ontology. At the end A parses what is retrieved from E and serializes the annotation(s). Using PAV, I ended up with something similar to what Paolo proposed for Darwin's case, <annotation> pav:authoredBy <E>, and <annotation> pav:createdBy <A>. Using OA, two annotations would be the way? If possible, I rather to have only one annotation. Thanks, Leyla On Mon, Aug 19, 2013 at 4:10 PM, Robert Sanderson <azaroth42@gmail.com>wrote: > Sorry for jumping in late, I was on vacation last week and offline. > > To quickly re-express the requirement: There is a physical object > with some text (by Author A), and an annotation written on the object > about that text (by Darwin). That physical annotation is transcribed > as a digital annotation (by Student 1). Maintaining all of the actors > and objects is important. > > To me this is multiple annotations, but slightly different from the > ones that Stian proposes. > > Actors: AuthorA, Darwin, Student1 > Objects: PhysicalTextWrittenByAuthorA, PhysicalTextWrittenByDarwin, > DigitalTextTranscribedByStudent1, (and potentially the physical page > on which the physical texts were written) > > Annotation 1 records that there is some text of Author A, and some > text of Darwin, with a link between the two (the Annotation). > > <anno1> a oa:Annotation ; > oa:hasBody <uuid1> ; // PhysicalTextWrittenByDarwin > oa:hasTarget <uuid2> ; // PhysicalTextWrittenByAuthorA > oa:motivation oa:commenting ; > oa:annotatedBy <darwin> . > > <uuid1> a xxx:PhysicalText ; > dc:creator <darwin> . > > <uuid2> a xxx:PhysicalText ; > dc:creator <authorA> . > > This is the model of the real world physical object. Darwin wrote > some text about something that AuthorA wrote, and by the act of > writing it on the object it's an Annotation thus Darwin is the > annotator and the motivation is commenting (or similar). However > these are /physical/ things, not the digital transcription. As with > any RDF description of real world objects or concepts, there's a > disconnect between the description and the thing itself. > > And thus we need the transcription as a separate digital annotation: > > <anno2> a oa:Annotation ; > oa:hasBody <transcription.txt> ; // DigitalTextTranscribedByStudent1 > oa:hasTarget <uuid1> ; > oa:motivation domeo:transcribing ; > oa:annotatedBy <student1> . > > <transcription.txt> a cnt:ContentAsText, dcterms:Text ; > cnt:chars "... Darwin's text here ..." . > (Doesn't really matter who the dc:creator is for this content as all > the actors are above) > > > If you wanted to express it in terms of Shared Canvas, then you would > introduce a Canvas to explicitly represent the physical page rather > than just an identifier for the text itself, and the uuids would > become segments of it. The only other difference would be the > motivation of <anno2> would be sc:painting. Then you would associate > the digitized image with the Canvas as a digital representation of the > physical page, using another Annotation also with motivation > sc:painting. > > Hope that helps, > > Rob > > > On Thu, Aug 15, 2013 at 3:44 AM, Stian Soiland-Reyes > <soiland-reyes@cs.manchester.ac.uk> wrote: > > With my provenance hat on, I think this all depends on what is the > > scope of an oa:Annotation and its creation. > > > > We have the same challenge with provenance of entities and documents > > in general - if I write a letter in Word on Monday, and you (Paolo) > > print it out on paper on Tuesday, and then on Wednesday Robert puts it > > in an envelope and mails it, then who 'created' that thing that pops > > in through the mailbox at the recipient? > > > > Well it depends what you consider that thing to be - as an envelope > > with something inside, Robert made it, on Wednesday. As a printed > > letter (which happen to have an envelope in transit), Paolo made it on > > Tuesday, and as a conceptual letter, I wrote it on Monday. In a PROV > > setting, we recommend everyone to think carefully about the extent of > > their entity, in a way determining their life-span and what > > aspects/attributes can be considered mutable or fixed. If more than > > one kind of characterization is deemed necessary, then PROV has the > > concepts of specialization and alternates to relate them to > > each-other: http://www.w3.org/TR/prov-dm/#component5 > > > > Now at first glance I think this sounds like one of those use cases > > where you would need multiple characterizations to model the > > provenance correctly. A quick go: > > > > <origAnno1> a oa:Annotation ; > > oa:annotatedBy <OriginalAuthor> ; > > oa:hasTarget <somebook> . > > > > <anno1> a oa:Annotation ; > > oa:annotatedBy <Paolo> ; > > oa:specializationOf <origAnno1> ; > > oa:hasTarget <somebook> . > > > > This does seem like a bit of duplication - and also a bit strange > > considering both <origAnno1> and <anno1> are expressed as > > oa:Annotations. This kind of split-up of the annotation could however > > make sense in cases where the body/target are also at different > > specialization levels: > > > > <conceptualAnno1> a oa:Annotation ; > > oa:annotatedBy <OriginalAuthor> ; > > oa:hasBody <note.txt> ; > > oa:hasTarget <isbn:0-85131-041-9> . > > > > <instanceAnno1> a oa:Annotation ; > > oa:annotatedBy <MrLibrarian> ; > > oa:hasBody <scannedNote.jpeg> ; > > oa:hasTarget <redBookOnShelf5> ; > > prov:specializatonOf <conceptualAnno1> . > > > > <note1.txt> prov:alternateOf <scannedNote.jpeg> ; > > prov:wasDerivedFrom <scannedNote.jpeg> . > > > > <redBookOnShelf5> prov:specializationOf <isbn:0-85131-041-9> . > > > > > > (This could be expanded with the full FRBR model or equivalent) > > > > > > We have discussed conceptual vs representational oa:Annotations earlier: > > > > > http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0051.html > > > http://lists.w3.org/Archives/Public/public-openannotation/2013Jan/0027.html > > > > and the conclusion seemed to have been that it is simpler to merge the > > conceptual annotation with the formalized annotation as a > > datastructure. > > > > However, the discussion then did not delve into the provenance aspects > > - what we still need to keep somewhat clear is what the two provenance > > aspects we do provide cover for, annotatedBy/At and serialisedBy/At. > > We have a PROV unrolling of these at > > http://www.openannotation.org/spec/core/appendices.html#ProvMapping: > > > >> There are two Entities in the Open Annotation model, which for > expediency and simplicity are collapsed into just oa:Annotation. These are > the Annotation document, and the concept that the Annotation embodies or > describes. This is the distinction between oa:annotatedBy and > oa:annotatedAt, versus oa:serializedBy and oa:serializedAt. > > > > OK - the wording order here is wrong (annotation/document and > > concept/serialized) - perhaps something to fix! But basically it says > > that annotated* is who created it conceptually - so in your case: > > > > <ann1> oa:annotatedBy <OriginalAuthor> ; > > oa:serializedBy <Domeo> . > > > > The reasoning being that it was OriginalAuthor who created the > > relation between the body (his note) and the book (where he wrote his > > note) - we consider the oa:Annotation as a conceptual entity that was > > formed all those years ago, long time before RDF was invented. > > > > To record the digital formation of the oa:Annotation data structure as > > distinct from its 'authorship', then you would need to use other > > provenance properties - pav:curatedBy and pav:createdBy sounds like > > good matches. I would not put <Paolo> as the serializer, unless he > > more directly typed in the RDF. > > > > (Another practical consideration - I would side with Antoine here and > > keep oa:serializedBy at RDF Graph level, so even if Paolo typed in > > Turtle and Domeo put out RDF/XML, then it would still be serializedBy > > <Paolo>.) > > > > > > This said - there should not be anything in OA that prevents my > > expanded form with specialization - but of course then you have to be > > much more careful. You might wonder for inter-operability measures > > what this would mean - well, an annotatoin mean different thing in > > different systems and domains. For instance in my application, Wf4Ever > > research objects, we even have annotations where the body is just an > > RDF graph to declare the rdf:type of a resource - we needed something > > like OA to structure this, because such statements could be made by a > > user in the UI (and thus error-prone but more authorative), or > > inferred by automatic scripts (which might be guessing wrongly). > > > > > > > > On 14 August 2013 15:00, Paolo Ciccarese <paolo.ciccarese@gmail.com> > wrote: > >> Dear all, > >> I would like to share a solution that I am currently implementing in > Domeo > >> in relation to provenance and a question related to it. Apologies in > advance > >> for the length of the email. > >> > >> Use Case: I am dealing with an existing annotation that is written on > paper. > >> The author of the annotation can be the author of the original > manuscript or > >> a third party (let's assume the latter for this example). The > annotation is > >> anchored in a specific location of the original text. My user is > >> transforming that annotation into a OA annotation. It is very similar > to the > >> Darwin's annotation in the specs [1] but I got to a slightly different > >> conclusion. > >> > >> I would like to keep track of: > >> - the agent that creates the OA annotation > >> - the application the agent used to create the annotation (could be > >> different than the application that serialized the annotation) > >> - the author of the body of the annotation (third party) > >> - the author of the original association of the annotation with the > original > >> text > >> > >> In Domeo I use PAV (Provenance Authoring and Versioning ontology) > [2][3] and > >> I append to the oa:Annotation the following properties > >> > >> 1) pav:createdBy -> Domeo user > >> An agent primarily responsible for encoding the digital artifact or > resource > >> representation. This creation is distinct from forming the content, > which is > >> indicated with pav:contributedBy or its subproperties. > >> It is more specific than dct:createdBy - which might or might not be > >> interpreted to also cover the creation of the content of the artifact. > >> > >> 2) pav:createdOn -> When the Domeo user created the digital object > >> The date of creation of the digital artifact or resource > representation. The > >> agents responsible can be indicated with pav:createdBy. > >> > >> 3) pav:createdAt -> Where the user created the digital object > >> The geo-location of the agent that created the annotation. > >> > >> 4) pav:createdWith -> In may case the Domeo tool > >> The software/tool used by the creator (pav:createdBy) when making the > >> digital resource, for instance a word processor or an annotation tool. A > >> more independent software agent that creates the resource without direct > >> interactions by a human creator should instead be indicated using > >> pav:createdBy. > >> > >> 5) pav:authoredBy -> The author of the original annotation on paper > >> Indicates an agent that originated or gave existence to the work that is > >> expressed by the digital resource. The author of the content of a > resource > >> may be different from the creator of that resource representation > >> (pav:createdBy), although they are often the same. The author is > usually not > >> a software agent (which would be indicated with pav:createdWith, > >> pav:createdBy or pav:importedBy), unless the software actually authored > the > >> content itself; for instance an artificial intelligence algorithm which > >> authored a piece of music or a machine learning algorithm that authored > a > >> classification of a tumor sample > >> > >> 6) pav:authoredOn -> The date of the original annotation > >> Indicates the date this resource was authored by the agents given by > >> pav:authoredBy. Note that pav:authoredOn is different from > pav:createdOn, > >> although their values are often the same. > >> > >> In summary I have something like: > >> > >> <ann1> a oa:Annotation > >> pav:createdBy -Paolo- > >> pav:createdOn -today- > >> pav:createdWith -Domeo- > >> pav:createdAt -Boston location- > >> pav:authoredBy -Annotation’s author- > >> pav:authoredOn -Date of the original annotation- > >> > >> In other words, using PAV I can keep the distinction between the > creator of > >> the digital artifact and the author of the original content/association. > >> > >> However, there are possibly a couple of overlaps with the current OA > >> properties. As I would like to provide the OA provenance as well, I am > >> wondering which of the following applies: > >> <ann1> a oa:Annotation ; > >> oa:annotatedBy <Paolo> . > >> or > >> <ann1> a oa:Annotation ; > >> oa:annotatedBy <OriginalAuthor> . > >> > >> Or compared to PAV: > >> - pav:createdBy =? oa:annotatedBy --or-- > >> - pav:authoredBy =? oa:annotatedBy > >> > >> Looking at the Darwin’s example in the specs, if the student is > digitizing a > >> note from Darwin on his own content I would say: > >> <ann2> a oa:Annotation > >> pav:createdBy -Student- > >> pav:createdOn -2013- > >> pav:createdWith -Domeo- > >> pav:createdAt -Boston location- > >> pav:authoredBy -Darwin- > >> pav:authoredOn -Date of the original annotation- > >> > >> Then of course the ‘body’ of the annotation can be also authored by the > >> original author of the annotation. But, as pointed out above, it is > >> important for me to attribute also the association of body and target > to the > >> original author as that represent the historical provenance of it. > >> > >> What this comes down to is basically what an oa:Annotation really is: > “an > >> Annotation expresses the relationship between two or more resources, and > >> their metadata, using an RDF graph”. We talked about this before - my > >> question here becomes if oa:annotatedBy indicates who formed the > >> relationship (the ‘author’ of the conceptual annotation); or the person > who > >> (using some OA aware tools) formalized this as an oa:Annotation data > >> structure (the RDF structure)? > >> > >> Best, > >> Paolo > >> > >> > >> [1] http://www.openannotation.org/spec/core/core.html#Provenance > >> [2] http://arxiv.org/abs/1304.7224 > >> [3] http://code.google.com/p/pav-ontology/ > >> > >> > >> -- > >> Dr. Paolo Ciccarese > >> http://www.paolociccarese.info/ > >> Biomedical Informatics Research & Development > >> Instructor of Neurology at Harvard Medical School > >> Assistant in Neuroscience at Mass General Hospital > >> Member of the MGH Biomedical Informatics Core > >> +1-857-366-1524 (mobile) +1-617-768-8744 (office) > >> > >> CONFIDENTIALITY NOTICE: This message is intended only for the > addressee(s), > >> may contain information that is considered > >> to be sensitive or confidential and may not be forwarded or disclosed > to any > >> other party without the permission of the sender. > >> If you have received this message in error, please notify the sender > >> immediately. > > > > > > > > -- > > Stian Soiland-Reyes, myGrid team > > School of Computer Science > > The University of Manchester > > http://soiland-reyes.com/stian/work/ > http://orcid.org/0000-0001-9842-9718 > > > >
Received on Monday, 19 August 2013 15:39:07 UTC