Re: PAV (was Re: Review of future/core.html) from Stian Soiland-Reyes on 2013-01-29 (public-openannotation@w3.org from January 2013)

From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Tue, 29 Jan 2013 13:59:02 +0000
To: Antoine Isaac <aisaac@few.vu.nl>
Cc: public-openannotation@w3.org
Message-ID: <CAPRnXt=fkzsUmXuJTE=ENS7UpYYTfb4-SetHuYm6Zo4qmWBCxg@mail.gmail.com>

--- this is getting off topic, but it's good to hear there is interest!

On Mon, Jan 28, 2013 at 10:13 PM, Antoine Isaac <aisaac@few.vu.nl> wrote:

> Without even criticizing the model a single second, I see indeed
> distinctions like "digital resource", "digital artifact", etc. I've fought
> with these for too long in my domain, and I can see cans of worms flashing
> around and long reading and discussions coming...

Yes, that is a big can of worm, not too dissimilar from the HTTP Range
14 discussions (about resources and their representations being the
same 'thing' or not).

In PAV we simply try to say that authorship/contribution has to do
with the knowledge or content that is represented ("IP" if you like,
although I hate the term), and "creation" has to do with making the
digital form this take (not necessarily the exact representation like
RDF/XML vs Turtle). How this split is realized, if at all, is domain
and application specific.

For instance it's quite straight forward for a Word document where I
typed in a chapter from Lord of the Rings, then that word document was
pav:authoredBy  J. R. R. Tolkien and pav:createdBy Stian, and it was
pav:createdWith Word. In PROV terms, you can think of authorship as
something that belongs to a more general, abstract entity that the
"digital resource"  is a prov:specializationOf.

Similarly for annotations, if I take the author's handwritten notes in
the original Lord of the Rings manuscript and formalize them as
oa:Annotation's, then those annotations are pav:authoredBy :Tolkien
and pav:createdBy :Stian.

However this gets trickier the moment the knowledge itself is a
digital thing rather than something which is merely represented with
digital concepts; for instance an ontological model, an RDF dataset, a
spreadsheet that calculates mortgage payments. For simple cases the
creator and author is just the same person, so there is no problem,
and you might want to only represent one of those.

The distinction can come into play when one talks about
transformations of formats and similar, which PAV provides more
specialized terms for, like pav:importedFrom and pav:importedBy.  So
if you made the spreadsheet in excel and I just copy it and put it on
my website, then you are still both the author and creator, and I mark
the provenance to the orginal using pav:retrievedFrom and my role
using pav:retrievedBy.

If I then saved it in OpenOffice format, then you are still the author
of my OO spreadsheet, while I am now the creator. (as here I consider
the workings of the spreadsheet as the 'knowledge'). retrievedFrom
changes to importedFrom. However if I also needed to fix a formula in
the spreadsheet to make it work in Open Office, then I also become a
curator  (pav:curatedBy).

( In a different domain it could be that a spreadsheet contains survey
data imported from a CSV which was extracted from a survey database ;
here the authorship relates to the survey data, while creation might
deals with making it into a tabular format, no matter if it has been
converted from CSV to XLS.)

If I add a bit of new functionality, then I am a contributor
(pav:contributedBy), and the OO spreadsheet is now just
pav:derivedFrom the original rather than imported from it. If that
functionality is "significant", then I would now also be an author. If
your bit is superseded by my 3d version, then now you remain only as
an author of the spreadsheet that my spreadsheet was pav:derivedFrom.

.. and with that I think I explained almost the whole model... *copy to paper*.

-- 
Stian Soiland-Reyes, myGrid team
School of Computer Science
The University of Manchester

Received on Tuesday, 29 January 2013 13:59:54 UTC