Re: PAV (was Re: Review of future/core.html)

Hi Stian,


> --- this is getting off topic, but it's good to hear there is interest!
>
> On Mon, Jan 28, 2013 at 10:13 PM, Antoine Isaac<aisaac@few.vu.nl>  wrote:
>
>> Without even criticizing the model a single second, I see indeed
>> distinctions like "digital resource", "digital artifact", etc. I've fought
>> with these for too long in my domain, and I can see cans of worms flashing
>> around and long reading and discussions coming...
>
> Yes, that is a big can of worm, not too dissimilar from the HTTP Range
> 14 discussions (about resources and their representations being the
> same 'thing' or not).
>
> In PAV we simply try to say that authorship/contribution has to do
> with the knowledge or content that is represented ("IP" if you like,
> although I hate the term), and "creation" has to do with making the
> digital form this take (not necessarily the exact representation like
> RDF/XML vs Turtle). How this split is realized, if at all, is domain
> and application specific.
>
> For instance it's quite straight forward for a Word document where I
> typed in a chapter from Lord of the Rings, then that word document was
> pav:authoredBy  J. R. R. Tolkien and pav:createdBy Stian, and it was
> pav:createdWith Word. In PROV terms, you can think of authorship as
> something that belongs to a more general, abstract entity that the
> "digital resource"  is a prov:specializationOf.
>
>
> Similarly for annotations, if I take the author's handwritten notes in
> the original Lord of the Rings manuscript and formalize them as
> oa:Annotation's, then those annotations are pav:authoredBy :Tolkien
> and pav:createdBy :Stian.
>
>
> However this gets trickier the moment the knowledge itself is a
> digital thing rather than something which is merely represented with
> digital concepts; for instance an ontological model, an RDF dataset, a
> spreadsheet that calculates mortgage payments. For simple cases the
> creator and author is just the same person, so there is no problem,
> and you might want to only represent one of those.
>
> The distinction can come into play when one talks about
> transformations of formats and similar, which PAV provides more
> specialized terms for, like pav:importedFrom and pav:importedBy.  So
> if you made the spreadsheet in excel and I just copy it and put it on
> my website, then you are still both the author and creator, and I mark
> the provenance to the orginal using pav:retrievedFrom and my role
> using pav:retrievedBy.
>
> If I then saved it in OpenOffice format, then you are still the author
> of my OO spreadsheet, while I am now the creator. (as here I consider
> the workings of the spreadsheet as the 'knowledge'). retrievedFrom
> changes to importedFrom. However if I also needed to fix a formula in
> the spreadsheet to make it work in Open Office, then I also become a
> curator  (pav:curatedBy).
>
> ( In a different domain it could be that a spreadsheet contains survey
> data imported from a CSV which was extracted from a survey database ;
> here the authorship relates to the survey data, while creation might
> deals with making it into a tabular format, no matter if it has been
> converted from CSV to XLS.)
>
> If I add a bit of new functionality, then I am a contributor
> (pav:contributedBy), and the OO spreadsheet is now just
> pav:derivedFrom the original rather than imported from it. If that
> functionality is "significant", then I would now also be an author. If
> your bit is superseded by my 3d version, then now you remain only as
> an author of the spreadsheet that my spreadsheet was pav:derivedFrom.
>
> .. and with that I think I explained almost the whole model... *copy to paper*.
>


Impressive :-)

I think I get it, but I don't have the time to investigate much these days...

 From what I understand it's a bit like an anti-PROV or anti-FRBR: it lumps together resources that these frameworks try to distinguish (sometimes at a too fine grain, I agree), and tries to convey the differences through using different properties for the different "facets" of an object. Hoping that choosing good labels will make it relatively clear. The interest in terms of simplicity is certain, but there's a high risk of mis-use. At least that's the experience I have in my domain...

Anyway, I hope one day to have the opportunity to say more, on another forum maybe.

Cheers,

Antoine

PS: Personally I've grown allergic to anything that has "digital" in its label. When one scratches under the surface, one realizes that most people don't really now what they're talking about. *You* probably know, but that's a different story ;-)
And it's not a question for the ontologist. It often messes with the functions that are supposed to be build on top of the data...

Received on Wednesday, 30 January 2013 22:31:14 UTC