- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Wed, 30 Sep 2015 15:01:19 +0100
- To: Robert Sanderson <azaroth42@gmail.com>
- Cc: Web Annotation <public-annotation@w3.org>
What is the scope of "creating an annotation"? Does this include the creation of the body? dct:creator is (perhaps deliberately) vague about this - in that you never quite know if it's the creator of the digital resource (uploader or serializer of the file), its semantic content (structuring in its current form) or its abstract knowledge (e.g. the statements that are conveyed). All of these statements could be seen as valid with dct:creator: # The person that cropped and uploaded the JPEG <https://commons.wikimedia.org/wiki/File:Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg> dct:creator <https://commons.wikimedia.org/wiki/User:Dcoetzee> . # The agency that took the photo in the gallery <https://commons.wikimedia.org/wiki/File:Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg> dct:creator <http://www.technologies.c2rmf.fr/> # The actual painter <https://commons.wikimedia.org/wiki/File:Mona_Lisa,_by_Leonardo_da_Vinci,_from_C2RMF_retouched.jpg> dct:creator <http://dbpedia.org/resource/Leonardo_da_Vinci> But depending on which one you go for, you will get quite a different provenance trail. In PROV you can say these are all prov:wasAttributedTo - and use a prov:wasDerivedFrom chain (and possibly prov:specializationOf) to show the detailed provenance. But it is hard to narrow down one particular one of them as 'the creator' except in the basic case where they are all the same - e.g. someone typed their own words into a text into a web form and pushed the Annotate button. To make this more up to date, think of a youtube re-upload, say "Mr Politician MP says something embarrassing again", where we have: a) The politician (who said something embarassing, no matter which upload) b) The audience member who filmed him in public c) The (re)uploader of the video (after the first upload obviously got deleted) Who is the 'creator' here? Computers will almost always tell you c). Humans will tell you a). People like me who care about attribution will think about b) who was the brave one. We should let annotation systems provide you with a) and hopefully also a bit of b) and not just be stuck with c). Now to me, this means that dct:creator does not tell me much, because different applications have widely different interpretation about which one of these kind of forms is meant. To me it thus just says "was somewhat involved with making some part of this resource" - which is more of a contributing than creation. In the PAV ontology ( http://purl.org/pav/html ) we tried to clear up this for normal bibliographic usage on the web by introducing: - pav:createdBy who made the digital file - e.g. the bytes in the JPEG if you like (in this case the wikimedia user Dcoetzee) - pav:authoredBy for who made the "knowledge" that is somewhat captured - (Leonardo da Vinci the painter). - pav:curatedBy - someone who helped form the knowledge into its current form, e.g. the c2rmf photographer - pav:contributedBy - any other kind of "knowledge" contributions (including author/curator above) - e.g. someone who made a hole in the canvas [1] [1] http://www.theguardian.com/world/2015/aug/25/boy-trips-in-museum-and-punches-hole-through-million-dollar-painting All of these map to prov:wasAttributedTo and to dcterms:creator / dcterms:contributor See http://jbiomedsem.biomedcentral.com/articles/10.1186/2041-1480-4-37#Sec19 for discussion on issues with DC Terms for provenance :) So for annotations I get similar questions. It is clear in the case of say tagging that the creator of the annotation didn't necessarily "create" the tag word itself - but primarily made the link between the target and the body - this is particularly the case for semantic tags from a controlled vocabulary. If an annotation links between two standalone resources, e.g. a blog entry and a youtube video, then the annotation creator again might not have made neither of the blog or the youtube, just found that the (body) blog is about the (target) youtube video. The body and the target might therefore have their own creators - which might have been stated elsewhere. Then there are the more compound annotations that in JSON-LD would be a larger object - like if there's a SpecificResource or an embedded textual body - in this case the creator of the annotation is most likely also the author of the textual body, and is the one who made the selection of the SpecificResource. I don't think we normally want to attach provenance to each of those - so it would be good if the 'creator' of the annotation was somewhat flexibly to also apply to these cases. However on the Semantic Web we have this boring Open World Assumption - so we can't do rules like "If a dct:creator is set on the annotation but not on the body, then the annotation creator is also the body creator" - as the body resource might have other views about who its creator is. dct:creator does have some of that ambiguity here that perhaps is needed - but I don't think it would be too helpful. So this hints to me that we should get the annotation system to tell us instead - pav:authoredBy if it knows the agent also made the 'content of the annotation' - which we can say include things like embedded body text or a specific resource, or just the super-property pav:contributedBy (or its superprop prov:wasAttributedTo) if the user's role is more ambigious. pav:createdBy can be used for the actual serialization and is usually a computer system - it is basically almost like the existing oa:serializedBy which I never saw quite the need for in the first place. :) On 28 September 2015 at 21:54, Robert Sanderson <azaroth42@gmail.com> wrote: > > With the focus on making the model as approachable as possible, I'd like to > propose that we revise the provenance model somewhat. In particular, while > the distinction between creator and annotator is useful from an academic > perspective, it seems to me to be firmly in the 0.1% of use cases. > > Proposal: > > * Replace oa:annotatedBy with dcterms:creator [creator] > * Replace oa:annotatedAt with dcterms:created [created] > > * Replace oa:serializedBy with prov:generatedBy [generator] > * Replace oa:serializedAt with prov:generated [generated] > > Rationale: > > * It's simpler, and doesn't invent new terms unnecessarily. > > * It solves Luc's issue with the Prov constraints as the annotator is no > longer a generator of the annotation. > > * It also allows us to say that creator and created SHOULD be used with > embedded textual bodies, rather than hand-waving like we currently do. > > * It avoids the "serialization" issue of whether the client that created the > annotation is the serializer, or the service that makes it available. The > activity that generates the annotation is clearly the user creating it, > rather than the server serializing a graph into a particular format. > > > Thoughts? > > Rob > > -- > Rob Sanderson > Information Standards Advocate > Digital Library Systems and Services > Stanford, CA 94305 -- Stian Soiland-Reyes, eScience Lab School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Wednesday, 30 September 2015 14:02:09 UTC