- From: John Walker <john.walker@semaku.com>
- Date: Wed, 15 Mar 2017 14:31:30 +0000
- To: Dave Reynolds <dave.e.reynolds@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>
Thanks to Pano for the extra detail on our use case and contributions of everyone :) Perhaps another way to look at it is the RDF dataset is derived from the non-RDF dataset. One could then use PROV to describe the provenance in more detail if useful. That would certainly make sense when the RDF is a 'lossy' conversion i.e. when not all data from the source dataset is mapped into RDF. When the non-RDF and RDF versions are informationally equivalent, I can see they can be considered as different forms of one dataset. John > -----Original Message----- > From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com] > Sent: Wednesday, March 15, 2017 12:29 PM > To: public-lod@w3.org > Subject: Re: Relationship of dcat:Dataset and void:Dataset > > For what it's worth, personally I agree with this analysis. > > dcat:Dataset is best regarded as an abstract thing which then gets > represented/expressed as an RDF graph or a set of RDBMS tables or > whatever. Each of which can then be distributed/manifested in multiple > different ways. > > Hence for greatest cleanliness having something between a dcat:Dataset and > the dcat:Distribution could make sense. > > However, in practice its likely to be a complication too far for most uses. > > Fundamentally the notion of a "dataset" itself doesn't really work in a linked > data world. Datasets have boundaries - what's in or out of the set. The point > of linked data is break down those boundaries. > > Dave > > On 15/03/17 10:59, Pano Maria wrote: > > I am one of John Walker's colleagues, and as John says we've been > > having some interesting discussions on this topic. I'm partial to the > > first option he presents, as our situation is similar to the situation > > that Alasdair described. > > > > > > > > As an example: > > > > We have a collection of data pertaining to the addresses and buildings > > in the Netherlands that is distributed in many different ways: WFS, > > WMS, GML data dump, etc. Our linked data version of this collection of > > data is actually created by transforming one of these sources to > > linked data and subsequently exposing this via a SPARQL endpoint and > REST API's. > > > > > > > > In my view a dcat:Dataset is the *abstract* representation of some > > collection of data. That is, I can say stuff about this dataset in an > > abstract sense, like who the curator is, what the accrual periodicity > > is, what the spatial extent is, when it was last updated, etc., > > without this collection of data having to have any specific concrete > > form. This also fits well with the situation that Alasdair describes, > > and the above example. > > > > > > > > In my opinion one resource should therefore not be an instance of both > > dcat:Dataset and void:Dataset, since if we consider the definition of > > void:Dataset: "A dataset is a set of RDF triples that are published, > > maintained or aggregated by a single provider" [1], we're clearly not > > describing an abstract collection of data. > > > > > > > > Now, where I agree it all becomes a bit muddy is when you think of a > > set of RDF triples, i.e. a void:Dataset, being distributable in > > several different ways (SPARQL endpoint, datadump, LDP API etc.) What > > does that make our void:Dataset? Note that the properties of the > > abstract dataset are still relevant to describe all these different > > forms of the collection of data. > > > > > > > > So, maybe what we are missing is a way to distinguish an expression of > > an (abstract) collection of data from the distribution of that > > expression. Analogous to the `work -> expression -> manifestation -> > > item` that the FRBR model [2] uses. That would lead to a dcat:Dataset > > representing the abstract dataset, one or more expressions of this > > dataset, e.g. an RDF expression and thus a void:Dataset, and > > dcat:Distributions of those expressions. > > > > > > > > The downside is that it becomes quite philosophical... > > > > > > > > As it stands currently, I'm still inclined to consider a void:Dataset > > a better match with dcat:Distribution than with dcat:Dataset, because > > of the need to use dcat:Dataset in an expression-independent way. > > > > > > > > Kind regards, > > > > > > > > Pano Maria > > > > > > > > [1] https://www.w3.org/TR/void/ > > > > [2] http://www.sparontologies.net/ontologies/fabio > > > > > > > > *Van:*Markus Freudenberg [mailto:markus.freudenberg@gmail.com] > > *Verzonden:* woensdag 15 maart 2017 11:22 > > *Aan:* Gray, Alasdair J G > > *CC:* John Erickson; John Walker; public-lod@w3.org; > > public-dwbp-wg@w3.org > > *Onderwerp:* Re: Relationship of dcat:Dataset and void:Dataset > > > > > > > > We had a very similar discussion about how to marry DCAT with VOID > > (and what to do with void:Dataset) for DataID > > <http://dataid.dbpedia.org/ns/core.html>. > > > > > > > > In the end, we decided to define dataid:Dataset as sub of dcat:Dataset > > and void:Dataset for the following reasons: > > > > > > > > 1. their similar definitions : > > > > > > > > void:Dataset "[...] we think of a dataset asa meaningful > > collection of triples, that deal with a certain topic, originate from > > a certain source or process, are hosted on a certain server, or are > > aggregated by a certain custodian." [1] > > > > dcat:Dataset "[...] collection of data, published or curated by a > > single agent, and available for access or download in one or more > > formats." [2] > > > > > > > > It appears, all of what is stated about a dcat:Dataset is true for a > > void:Dataset (including the possibility of different formats). > > > > > > > > 2. the similarities between dcat:CatalogRecord and > void:DatasetDescription: > > > > > > > > Both provide some form of metadata about a dataset. Both are using > > foaf:topic / foaf:primaryTopc to point out the (Dataset) entity of interest. > > > > When combining DCAT and VOID using the first option, a > > dcat:CatalogRecord would reference a dcat:Dataset, while a > > void:DatasetDescription would reference a dcat:Distribution. > > > > > > > > 3. void:subset > > > > > > > > Points out a subset of a void:Dataset. If a void:Dataset is also > > considered a dcat:Distribution, one would have to deal with the notion > > of a 'sub-distributions'. > > > > Which is a point of contention (as far as I remember the discussion at > > SDSVoc). We rather use this property with DataID to provide the > > missing hierarchical pointers between datasets. > > > > > > > > 4. The definition of dcat:Distribution > > > > > > > > dcat:Distribution: "Represents a specific available form of a dataset." > > > > > > > > The definition of a void:Dataset is different since it only narrows > > the available formats of a dataset to RDF, not to a specific serialization. > > Also, no VOID properties offer no further clarification on the > > 'specific available format' of the dataset. > > > > > > > > VOID Properties like: > > > > classes <http://vocab.deri.ie/void#classes> | distinctObjects > > <http://vocab.deri.ie/void#distinctObjects> | distinctSubjects > > <http://vocab.deri.ie/void#distinctSubjects> | documents > > <http://vocab.deri.ie/void#documents> | entities > > <http://vocab.deri.ie/void#entities> | properties > > <http://vocab.deri.ie/void#properties> | property > > <http://vocab.deri.ie/void#property> | propertyPartition > > <http://vocab.deri.ie/void#propertyPartition> | triples > > <http://vocab.deri.ie/void#triples> | vocabulary > > <http://vocab.deri.ie/void#vocabulary> etc. > > > > are all characteristics of a dataset and not just a single > > distribution, in my understanding. > > > > > > > > These were our main reasons to combine dcat:Dataset and void:Dataset > > into dataid:Dataset. > > > > > > Markus Freudenberg > > > > > > > > Release Manager, DBpedia <http://wiki.dbpedia.org> > > > > > > > > On Tue, Mar 14, 2017 at 5:10 PM, Gray, Alasdair J G > > <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote: > > > > When we were considering this in the Health Care and Life Sciences > > Community Profile [1] we took the view that the RDF representation > > was one of several possible distributions for a dataset and that it > > would be incorrect to associate that distribution information with > > the notion of the dataset itself. That is, we took the first > > approach proposed by John. > > > > > > > > We specifically did this as not all HCLS datasets are made available > > in RDF and we did not want to make incorrect inferences. > > > > > > > > Best regards, > > > > > > > > Alasdair > > > > > > > > [1] https://www.w3.org/TR/hcls-dataset/ > > > > > > > > On 14 Mar 2017, at 14:18, John Erickson <olyerickson@gmail.com > > <mailto:olyerickson@gmail.com>> wrote: > > > > > > > > John makes a great argument for the second approach. That is how we > > tend to think of it. > > > > As with most DCAT-related questions, start with "DCAT is like > > 'Dublin > > Core' for datasets." In other words, general purpose, good for > > starters, accommodates refinements... > > > > John > > > > On Tue, Mar 14, 2017 at 9:59 AM, John Walker > > <john.walker@semaku.com <mailto:john.walker@semaku.com>> > wrote: > > > > Hello, > > > > > > > > Following discussion with colleagues, I would like to ask > > for opinions on > > semantics of dcat:Dataset and void:Dataset. > > > > > > > > We have two points of view. > > > > > > > > First, the RDF version of a dcat:Dataset is a > > dcat:distribution of that > > dataset and is itself a void:Dataset. > > > > That could be represented as follows: > > > > > > > > <my-dataset> a dcat:Dataset ; > > > > dcat:distribution <my-rdf-dataset> ; > > > > . > > > > <my-rdf-dataset> a dcat:Distribution , void:Dataset ; > > > > void:sparqlEndpoint <sparql> ; > > > > void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ; > > > > . > > > > > > > > Secondly that a dcat:Dataset that is available as RDF (and > > possibly other > > forms) is also a void:Dataset. > > > > Or to put it another way: void:Dataset rdfs:subClassOf > > dcat:Dataset. > > > > That could be represented as follows: > > > > > > > > <my-dataset> a dcat:Dataset, void:Dataset ; > > > > dcat:distribution <my-sparql-distribution>, > > <my-rdfxml-distribution>, > > <my-turtle-distribution>; > > > > void:sparqlEndpoint <sparql> ; > > > > void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ; > > > > . > > > > <my-sparql-distribution> a dcat:Distribution ; > > > > dcat:accessURL <sparql> ; > > > > . > > > > <my-rdfxml-distribution> a dcat:Distribution ; > > > > dcat:downloadURL <my-dataset.rdf> ; > > > > dcat:mediaType "application/rdf+xml" > > > > . > > > > <my-turtle-distribution> a dcat:Distribution ; > > > > dcat:downloadURL <my-dataset.ttl> ; > > > > dcat:mediaType "text/turtle" > > > > . > > > > > > > > I’m trying to keep an open mind, but leaning towards the > > second method as > > thinking of the SPARQL endpoint, dumps and crawlable linked > > data (plus other > > forms such as an API or WFS endpoint) as different > > distributions of the same > > dataset seems to fit better with the spirit of DCAT (at > > least to my > > interpretation of the recommendation). > > > > > > > > Thoughts welcome! > > > > > > > > Regards, > > > > John > > > > > > > > > > -- > > John S. Erickson, Ph.D. > > Director of Operations, The Rensselaer IDEA > > Deputy Director, Web Science Research Center (RPI) > > <http://idea.rpi.edu/> <olyerickson@gmail.com > > <mailto:olyerickson@gmail.com>> > > Twitter & Skype: olyerickson > > > > > > > > Alasdair J G Gray > > > > Fellow of the Higher Education Academy > > Assistant Professor in Computer Science, > > School of Mathematical and Computer Sciences > > (Athena SWAN Bronze Award) > > Heriot-Watt University, Edinburgh UK. > > > > Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk> > > Web: http://www.macs.hw.ac.uk/~ajg33 > > ORCID: http://orcid.org/0000-0002-5711-4872 > > Office: Earl Mountbatten Building 1.39 > > Twitter: @gray_alasdair > > > > > > > > > > > > > > > > > > > > > > > > ---------------------------------------------------------------------- > > -- > > > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. > > With campuses and students across the entire globe we span the > > world, delivering innovation and educational excellence in business, > > engineering, design and the physical, social and life sciences. > > > > This email is sent from the Heriot-Watt University Group, which > > includes Heriot-Watt University, the Edinburgh Business School, and > > Heriot-Watt Services Ltd (Oriam, Scotland's national performance > > centre for sport). The contents (including any attachments) are > > confidential. If you are not the intended recipient of this e-mail, > > any disclosure, copying, distribution or use of its contents is > > strictly prohibited, and you should please notify the sender > > immediately and then delete it (including any attachments) from your > > system. > > > > > >
Received on Wednesday, 15 March 2017 14:32:07 UTC