Re: [web-annotation] PROV-O Mapping from Stian Soiland-Reyes on 2015-09-17 (public-annotation@w3.org from September 2015)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Thu, 17 Sep 2015 13:53:10 +0100
To: Luc Moreau <l.moreau@ecs.soton.ac.uk>
Cc: Stian Soiland-Reyes via GitHub <sysbot+gh@w3.org>, Annotation WG <public-annotation@w3.org>
Message-ID: <CAPRnXtmS3RW+eP1Hv2QPvA38Qn95HZfqbX2DBWgqAJAcrQij=w@mail.gmail.com>
 Thanks, my apologies for missing this earlier.

I like and agree with your exploration in

https://lucmoreau.wordpress.com/2015/03/30/provenance-recipe-mapping-shortcuts-to-prov/


The current mapping is in
http://w3c.github.io/web-annotation/model/wd/#mapping-to-provenance-model


I think the problem here with serializedBy and annotatedBy is a kind
of HTTP Range 14 issue, which is that the basic OA model assumes the
concrete serialization and the abstract annotation concept is 'the
same resource'.   It gets murkier if you want to distinguish the
formation of the oa:Annotation as separate from the formation of an
annotation conceptually, e.g. there could be an annotation of some old
markings in a text book when Stian was a student in 2002, modelled as
an oa:Annotation in 2014 by an OCR bot, and serialised as JSON-LD in
2015.


Without expressing the specializations, and by letting <anno1> be both
the abstract resource that is the generalization of both the
oa:Annotation and its serialization, it easily gets inconsistent (or
everything must have been made in an instant).


It's like saying the :cake and the :deliveredCake is the same thing,
which makes it harder. It only really works if you group all those
activites into a single :makingCake activity which all three agents
participate in. The :cake can then be generated by both activities
:delivering and :makingCake - which is fine as the start/end times
would then line up nicely.

In PROV we never got as far as to describe composition of activities -
but here there are simply two alternative (and compatible) world
views:


World view A:

An oa:Annotation resource (whatever that means) was generated in 2015,
attributed by Stian and the OCR bot. The generating mega-activity
finished in 2015.  (and started in 2002, but this is not expressed by
PROV as oa:annotatedAt is not a subproperty of prov:generatedAtTime)


World view B:
In 2002, a study activity by Stian generated a hand writing entity in
a book manifestation.  (Let's not go into versioning of that book with
and without hand writings!)

In 2015, an OCR activity by the bot generated an oa:Annotation
resource and its serialization. It used the hand writing, and the
annotation document is derived from the hand writing.


Now the question is the oa:Annotation.. was the oa:Annotation an RDF
Resource that was derived from the hand-writing, or was it some
conceptual entity that was made in 2002 and now has just manifested
itself as a RDF in an Annotation Document? I think we generally prefer
the second solution as RDF is used to describe real resources (e.g.
foaf:Person is a person, not a person description). We use RDF
reification and such if we really need to express the RDF
serialization provenance.

Also we can generally keep this split if we do HTTP Range 14
distinctions by not making the Annotation Document and the Annotation
the same URI.  (although you would still struggle to separate RDF
model in 2014 and a particular Turtle/JSON serialization in 2015
unless you do both # and content negotiation)



But this is effectively merging the :deliveredCake and :cake - in a
split entities you could say rather:

```turtle
  :cake pav:hasCurrentVersion :deliveredCake ;
      pav:hasVersion :deliveredCake, :wrappedCake, :unwrappedCake .;
      prov:generalizationOf :deliveredCake, :wrappedCake, :unwrappedCake .
 ```


I think your mapping examples are elegant.


Adding oa:serializedInto is a good solution for enforcing/encouraging
the HTTP Range-14 split, and is similar to the OAI ORE's
ore:isDescribedBy to separate the ore:Aggregation and the
ore:ResourceMap (the RDF document that serializes the ore:Aggregation
concept). However I think it is not a approach in general for every
vocabulary to have its own "I'm serialized in document .." property.
foaf:isPrimaryTopicOf  is a more general property used for this
property, e.g. as used in VoID.


However do I understand your suggestion right is that you would want
to add  oa:serializedInto to the Web Annotation Vocabulary (OWL
ontology) only, and to change the superproperty of oa:serializedBy to
NOT be oa:wasAttributedTo and instead be a property path?
oa:serializedInto would then be mainly for internal OWL purposes.

Or should oa:serializedInto also be added to the Web Annotation Data Model?


On 16 September 2015 at 16:21, Luc Moreau <l.moreau@ecs.soton.ac.uk> wrote:
>
> http://lists.w3.org/Archives/Public/public-annotation/2015Mar/0096.html
> http://lists.w3.org/Archives/Public/public-annotation/2015Mar/0097.html
> http://lists.w3.org/Archives/Public/public-annotation/2015Mar/0098.html
>
> On 13/09/2015 05:48, Stian Soiland-Reyes via GitHub wrote:
>>
>> @lucmoreau any links..? I can't find the proposal :(
>>
>
> --
> Professor Luc Moreau
> Head of the Web and Internet Science Group
> Electronics and Computer Science   tel:   +44 23 8059 4487
> University of Southampton          twitter: @lucmoreau
> Southampton SO17 1BJ, UK           http://www.ecs.soton.ac.uk/~lavm
>
>
>



-- 
Stian Soiland-Reyes, eScience Lab
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/    http://orcid.org/0000-0001-9842-9718
Received on Thursday, 17 September 2015 12:54:01 UTC