- From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
- Date: Wed, 21 Oct 2015 12:05:49 +0100
- To: public-linked-json <public-linked-json@w3.org>
Excerpts from Nico Schlömer's message of 2015-10-19 17:22:02 +0100: > A scientific article is typically published in several revisions, > e.g., a bunch of revisions on a preprint server like arXiv [1] plus > possibly a version somewhere on a publisher's website [2]. The > versions will generally be perceived as "the same article", but differ > a little bit here and there. I'd hesitate to refer to them as the same > document. "The same" is almost never actually the same. It's generallty a matter of different abstractions and different representations and descriptions which might have varying timespans. As pointed out earlier there are various vocabularies that deal with this in different forms. I'll add some more to the list.. sorry! ## Fabio For publications I would look at the SPAR ontologies <http://www.sparontologies.net/> which include FaBIO <http://www.sparontologies.net/ontologies/fabio>, a mapping of the well-known FRBR model (Work/Expression/Manifestation/Item) as used by libraries. A particular Work can have many Expressions, e.g. a presentation, a journal article, a poster paper. Identifying the work is usually hard as this is a higher-level abstraction that is often not tracked on its own. My take on using FaBIO is with an anonymous top-level fabio:ResearchPaper: { "@context": { "frbr": "http://purl.org/vocab/frbr/core#", "prism": "http://prismstandard.org/namespaces/basic/2.0/", "fabio": "http://purl.org/spar/fabio/" }, "@type": "fabio:ResearchPaper", "frbr:realization": [ { "@id": "http://arxiv.org/abs/123.45", "@type": "fabio:Preprint" }, { "@id": "http://dx.doi.org/10.999/1.2.3.4", "@type": "fabio:JournalArticle", "prism:doi": "10.999/1.2.3.4", "frbr:embodiment": [ { "@id": "http://journal.example.com/article15.pdf", "@type": "fabio:DigitalManifestation" }, { "@id": "http://journal.example.com/article15.epub", "@type": "fabio:DigitalManifestation" }, { "@type": "fabio:PrintObject", "frbr:hasExemplar": { "@id": "http:/library.example.org/physical/152" } } ] } } FaBIO allows you to provide fine distinctions such as preprint vs postprints, definitve versions, ordered author lists, etc, and its background from libraries means it's also easy to relate to physical objects like a printed journal exemplar in a particular library. ## Dublin Core Terms I would warn about what first might seem like an obvious choice from the Dublin Core Terms <http://purl.org/dc/terms/> - which has dcterms:isVersionOf and dcterms:isFormatOf (and their inverse dcterms:hasVersion / dcterms:hasFormat ) were meant for - sadly these properties have been misused by users who don't read the descriptions, in that "hasVersion" have been misunderstood to only point to a versioned snapshot description of the same resource (e.g. not a variation-version). Similarly dcterms:hasFormat ("A related resource that is substantially the same as the pre-existing described resource, but in another format.") has been misunderstood to point to a format definition rather than the resource in a particular format (dcterms:format). I would use DC Terms somewhat like this: { "@context": { "dcterms": "http://purl.org/dc/terms/" }, "@id": "http://dx.doi.org/10.999/1.2.3.4", "dcterms:hasVersion": [ { "@id": "http://journal.example.com/article15", "dcterms:hasFormat": [ { "@id": "http://journal.example.com/article15.pdf" }, { "@id": "http://journal.example.com/article15.html" }, { "@id": "http://journal.example.com/article15.epub" } ], "dcterms:isVersionOf": { "@id": "http://dx.doi.org/10.999/1.2.3.4" } }, { "@id": "http://arxiv.org/abs/123.45", "dcterms:hasFormat": { "@id": "http://arxiv.org/pdf/123.45.pdf" }, "dcterms:isVersionOf": { "@id": "http://dx.doi.org/10.999/1.2.3.4" } } ] } Obviously you can structure this top-down or bottom-down to your liking (perhaps reflecting better the HTTP JSON-LD resource that was requested), and use dcterms:is*Of or dcterms:has* depending on the direction. But you see here, using DC Terms in the (in my view) intended way of hierarchical arrangement, there is no direct link between the arxiv preprint and the published journal paper. Sometimes the representation and its more abstract resource has the same identifier (e.g. there is no ".html" extension, or there is no URI for the published article in any format - this is a variant of the HTTP Range 14 problem I would not delve too much into, and instead simply drop "@id" on those that do not have a URI). The dcterms properties are still a bit too loose for my liking, as they don't really express the relationship much beyond a "kind of sameness", and has unclear provenance directionality. ## PROV-O The PROV-O ontology <http://www.w3.org/TR/prov-o/> is obviously relevant as to the provenance aspect, e.g. the publisher website can be seen to be prov:wasDerivedFrom the arxiv preprint, or even prov:wasRevisionOf. The statements here become a bit murky provenance-wise if you probe hard enough, because the publisher version is not truly based on the arXiv preprint - but I guess you generally don't want to be involving the third (evolving) copy of the article as say a .docx file on someone's laptop. The detail level to use here depends on what provenance you have and what is relevant, e.g. if you are detailing the article at different submission stages you might have quite a bit more information than if you only have preprint and published article. I'll assume the second: { "@context": { "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://journal.example.com/article15", "prov:wasRevisionOf": { "@id": "http://arxiv.org/abs/123.45" } } One issue with prov:wasRevisionOf or prov:wasDerivedFrom is that it is not required to be "strictly previous version" - it's just pointing at "some" older version in a way. Thus you might find other cases where you have more intermediates: { "@context": { "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://journal.example.com/article15", "prov:wasRevisionOf" { "@id": "http://journal.example.com/submitted-for-peer-review/15" } } PROV adds another very important aspect with prov:specializationOf and prov:alternateOf - this means you can relate a generic resource with a more specific one, e.g. you express that relationship between DOI and the published article, or the article and its PDF representation. In PROV specialisations, what is said about the more general resource should also be true about the more specific resource, e.g. the authors and title should be the same. A prov:alternateOf means a resource which has the same general resource as this one. { "@context": { "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://journal.example.com/article15", "prov:specializationOf" { "@id": "http://dx.doi.org/10.999/1.2.3.4" }, "prov:alternateOf": "http://arxiv.org/abs/123.45" } The inverses prov:generalizationOf can be used for the opposite direction. Here we use this to present the representation formats: { "@context": { "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://journal.example.com/article15", "prov:generalizationOf": [ { "@id": "http://journal.example.com/article15.pdf" }, { "@id": "http://journal.example.com/article15.html" }, { "@id": "http://journal.example.com/article15.epub" } ] } ## PAV In the PAV ontology <http://purl.org/pav/> we try to make common bibliographical provenance patterns for web resources more easily expressed. PAV is mapped to PROV, so you can make the statements above more specific using PAV. { "@context": { "pav": "http://purl.org/pav/" }, "@id": "http://journal.example.com/article15", "pav:previousVersion": "http://arxiv.org/abs/123.45" } PAV provides retrieval/import statements, which are very useful when an article has re-appeared in a differnet system with a different URI, and possibly in a different format. So you could add a PUBMED record as: { "@context": { "pav": "http://purl.org/pav/", "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://www.ncbi.nlm.nih.gov/pubmed/1234", "pav:importedFrom": "http://journal.example.com/article15", "prov:specializationOf": "http://dx.doi.org/10.999/1.2.3.4" } Or a redistribution of the publisher's Open Access PDF as: { "@context": { "pav": "http://purl.org/pav/", "prov": "http://www.w3.org/ns/prov#" }, "@id": "http://home.example.com/~alice/mypaper.pdf", "pav:retrievedFrom": "http://journal.example.com/article15.pdf", "prov:specializationOf": "http://dx.doi.org/10.999/1.2.3.4" } (For added fun in getting the provenance lineage straight, redistribute the publisher PDF on arxiv! :)) ### PAV versions To counter the DC Terms confusion, PAV provides a more specific hierarchical pav:hasVersion that only is used to relate "version-versions", e.g. v2.1.2 version. pav:hasCurrentVersion is a way to point to the authorative current version (at time of writing). So if the publication system has URIs for different stages of the article, this can be used to provide a perma-link to whatever version you are currently returning: On arXiv every version of the upload are available, with different URIs for each version of both the entry (abstract) and representation (PDF). Here combining with dcterms:hasFormat is easy, although detailing every version of course gets verbose: { "@context": { "pav": "http://purl.org/pav/", "dcterms": "http://purl.org/dc/terms" }, "@id": "http://arxiv.org/abs/1304.7224", "prov:specializationOf": { "@id": "http://dx.doi.org/10.1186/2041-1480-4-37" }, "pav:hasCurrentVersion": { "@id": "http://arxiv.org/abs/1304.7224v6", "pav:previousVersion": { "@id": "http://arxiv.org/abs/1304.7224v5" }, "dcterms:hasVersion": { "@id": "http://arxiv.org/pdf/1304.7224v6,pdf" } }, "dcterms:hasVersion": { "@id": "http://arxiv.org/pdf/1304.7224", "pav:hasCurrentVersion": "http://arxiv.org/pdf/1304.7224v6.pdf" } } The PAV version properties are subproperties of PROV properties prov:generalizationOf, prov:alternateOf and prov:wasRevisionOf so those can be implied. ## Summary So exactly what properties are best for you to use depends a bit on what you mean with "version" :) -- Stian Soiland-Reyes, eScience Lab School of Computer Science The University of Manchester http://soiland-reyes.com/stian/work/ http://orcid.org/0000-0001-9842-9718
Received on Wednesday, 21 October 2015 11:06:27 UTC