- From: Markus Freudenberg <markus.freudenberg@gmail.com>
- Date: Wed, 15 Mar 2017 11:21:49 +0100
- To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
- Cc: John Erickson <olyerickson@gmail.com>, John Walker <john.walker@semaku.com>, "public-lod@w3.org" <public-lod@w3.org>, "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
- Message-ID: <CALoNf0WHV-KphqT90=ZGtcAS_bdoG9V8QrbXRR_887H5UieD0w@mail.gmail.com>
We had a very similar discussion about how to marry DCAT with VOID (and what to do with void:Dataset) for DataID <http://dataid.dbpedia.org/ns/core.html>. In the end, we decided to define dataid:Dataset as sub of dcat:Dataset and void:Dataset for the following reasons: 1. their similar definitions : void:Dataset "[...] we think of a dataset as a meaningful collection of triples, that deal with a certain topic, originate from a certain source or process, are hosted on a certain server, or are aggregated by a certain custodian." [1] dcat:Dataset "[...] collection of data, published or curated by a single agent, and available for access or download in one or more formats." [2] It appears, all of what is stated about a dcat:Dataset is true for a void:Dataset (including the possibility of different formats). 2. the similarities between dcat:CatalogRecord and void:DatasetDescription: Both provide some form of metadata about a dataset. Both are using foaf:topic / foaf:primaryTopc to point out the (Dataset) entity of interest. When combining DCAT and VOID using the first option, a dcat:CatalogRecord would reference a dcat:Dataset, while a void:DatasetDescription would reference a dcat:Distribution. 3. void:subset Points out a subset of a void:Dataset. If a void:Dataset is also considered a dcat:Distribution, one would have to deal with the notion of a 'sub-distributions'. Which is a point of contention (as far as I remember the discussion at SDSVoc). We rather use this property with DataID to provide the missing hierarchical pointers between datasets. 4. The definition of dcat:Distribution dcat:Distribution: "Represents a specific available form of a dataset." The definition of a void:Dataset is different since it only narrows the available formats of a dataset to RDF, not to a specific serialization. Also, no VOID properties offer no further clarification on the 'specific available format' of the dataset. VOID Properties like: classes <http://vocab.deri.ie/void#classes> | distinctObjects <http://vocab.deri.ie/void#distinctObjects> | distinctSubjects <http://vocab.deri.ie/void#distinctSubjects> | documents <http://vocab.deri.ie/void#documents> | entities <http://vocab.deri.ie/void#entities> | properties <http://vocab.deri.ie/void#properties> | property <http://vocab.deri.ie/void#property> | propertyPartition <http://vocab.deri.ie/void#propertyPartition> | triples <http://vocab.deri.ie/void#triples> | vocabulary <http://vocab.deri.ie/void#vocabulary> etc. are all characteristics of a dataset and not just a single distribution, in my understanding. These were our main reasons to combine dcat:Dataset and void:Dataset into dataid:Dataset. Markus Freudenberg Release Manager, DBpedia <http://wiki.dbpedia.org> On Tue, Mar 14, 2017 at 5:10 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> wrote: > When we were considering this in the Health Care and Life Sciences > Community Profile [1] we took the view that the RDF representation was one > of several possible distributions for a dataset and that it would be > incorrect to associate that distribution information with the notion of the > dataset itself. That is, we took the first approach proposed by John. > > We specifically did this as not all HCLS datasets are made available in > RDF and we did not want to make incorrect inferences. > > Best regards, > > Alasdair > > [1] https://www.w3.org/TR/hcls-dataset/ > > On 14 Mar 2017, at 14:18, John Erickson <olyerickson@gmail.com> wrote: > > John makes a great argument for the second approach. That is how we > tend to think of it. > > As with most DCAT-related questions, start with "DCAT is like 'Dublin > Core' for datasets." In other words, general purpose, good for > starters, accommodates refinements... > > John > > On Tue, Mar 14, 2017 at 9:59 AM, John Walker <john.walker@semaku.com> > wrote: > > Hello, > > > > Following discussion with colleagues, I would like to ask for opinions on > semantics of dcat:Dataset and void:Dataset. > > > > We have two points of view. > > > > First, the RDF version of a dcat:Dataset is a dcat:distribution of that > dataset and is itself a void:Dataset. > > That could be represented as follows: > > > > <my-dataset> a dcat:Dataset ; > > dcat:distribution <my-rdf-dataset> ; > > . > > <my-rdf-dataset> a dcat:Distribution , void:Dataset ; > > void:sparqlEndpoint <sparql> ; > > void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ; > > . > > > > Secondly that a dcat:Dataset that is available as RDF (and possibly other > forms) is also a void:Dataset. > > Or to put it another way: void:Dataset rdfs:subClassOf dcat:Dataset. > > That could be represented as follows: > > > > <my-dataset> a dcat:Dataset, void:Dataset ; > > dcat:distribution <my-sparql-distribution>, <my-rdfxml-distribution>, > <my-turtle-distribution>; > > void:sparqlEndpoint <sparql> ; > > void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ; > > . > > <my-sparql-distribution> a dcat:Distribution ; > > dcat:accessURL <sparql> ; > > . > > <my-rdfxml-distribution> a dcat:Distribution ; > > dcat:downloadURL <my-dataset.rdf> ; > > dcat:mediaType "application/rdf+xml" > > . > > <my-turtle-distribution> a dcat:Distribution ; > > dcat:downloadURL <my-dataset.ttl> ; > > dcat:mediaType "text/turtle" > > . > > > > I’m trying to keep an open mind, but leaning towards the second method as > thinking of the SPARQL endpoint, dumps and crawlable linked data (plus > other > forms such as an API or WFS endpoint) as different distributions of the > same > dataset seems to fit better with the spirit of DCAT (at least to my > interpretation of the recommendation). > > > > Thoughts welcome! > > > > Regards, > > John > > > > > -- > John S. Erickson, Ph.D. > Director of Operations, The Rensselaer IDEA > Deputy Director, Web Science Research Center (RPI) > <http://idea.rpi.edu/> <olyerickson@gmail.com> > Twitter & Skype: olyerickson > > > Alasdair J G Gray > Fellow of the Higher Education Academy > Assistant Professor in Computer Science, > School of Mathematical and Computer Sciences > (Athena SWAN Bronze Award) > Heriot-Watt University, Edinburgh UK. > > Email: A.J.G.Gray@hw.ac.uk > Web: http://www.macs.hw.ac.uk/~ajg33 > ORCID: http://orcid.org/0000-0002-5711-4872 > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > > > > > > > > > > ------------------------------ > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, delivering > innovation and educational excellence in business, engineering, design and > the physical, social and life sciences. > > This email is sent from the Heriot-Watt University Group, which includes > Heriot-Watt University, the Edinburgh Business School, and Heriot-Watt > Services Ltd (Oriam, Scotland's national performance centre for sport). The > contents (including any attachments) are confidential. If you are not the > intended recipient of this e-mail, any disclosure, copying, distribution or > use of its contents is strictly prohibited, and you should please notify > the sender immediately and then delete it (including any attachments) from > your system. >
Received on Wednesday, 15 March 2017 10:22:24 UTC