Re: Relationship of dcat:Dataset and void:Dataset from Markus Freudenberg on 2017-03-15 (public-lod@w3.org from March 2017)

From: Markus Freudenberg <markus.freudenberg@gmail.com>
Date: Wed, 15 Mar 2017 11:21:49 +0100
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: John Erickson <olyerickson@gmail.com>, John Walker <john.walker@semaku.com>, "public-lod@w3.org" <public-lod@w3.org>, "public-dwbp-wg@w3.org" <public-dwbp-wg@w3.org>
Message-ID: <CALoNf0WHV-KphqT90=ZGtcAS_bdoG9V8QrbXRR_887H5UieD0w@mail.gmail.com>
We had a very similar discussion about how to marry DCAT with VOID (and
what to do with void:Dataset) for DataID
<http://dataid.dbpedia.org/ns/core.html>.

In the end, we decided to define dataid:Dataset as sub of dcat:Dataset and
void:Dataset for the following reasons:

1. their similar definitions :

    void:Dataset "[...] we think of a dataset as a meaningful collection of
triples, that deal with a certain topic, originate from a certain source or
process, are hosted on a certain server, or are aggregated by a certain
custodian." [1]
    dcat:Dataset "[...] collection of data, published or curated by a
single agent, and available for access or download in one or more
formats." [2]

It appears, all of what is stated about a dcat:Dataset is true for a
void:Dataset (including the possibility of different formats).

2. the similarities between dcat:CatalogRecord and void:DatasetDescription:

Both provide some form of metadata about a dataset. Both are using
foaf:topic / foaf:primaryTopc to point out the (Dataset) entity of interest.
When combining DCAT and VOID using the first option, a dcat:CatalogRecord
would reference a dcat:Dataset, while a void:DatasetDescription would
reference a dcat:Distribution.

3. void:subset

Points out a subset of a void:Dataset. If a void:Dataset is also considered
a dcat:Distribution, one would have to deal with the notion of a
'sub-distributions'.
Which is a point of contention (as far as I remember the discussion at
SDSVoc). We rather use this property with DataID to provide the missing
hierarchical pointers between datasets.

4.  The definition of dcat:Distribution

    dcat:Distribution: "Represents a specific available form of a dataset."

The definition of a void:Dataset is different since it only narrows the
available formats of a dataset to RDF, not to a specific serialization.
Also, no VOID properties offer no further clarification on the 'specific
available format' of the dataset.

VOID Properties like:
classes <http://vocab.deri.ie/void#classes> | distinctObjects
<http://vocab.deri.ie/void#distinctObjects> | distinctSubjects
<http://vocab.deri.ie/void#distinctSubjects> | documents
<http://vocab.deri.ie/void#documents> | entities
<http://vocab.deri.ie/void#entities> | properties
<http://vocab.deri.ie/void#properties> | property
<http://vocab.deri.ie/void#property> | propertyPartition
<http://vocab.deri.ie/void#propertyPartition> | triples
<http://vocab.deri.ie/void#triples> | vocabulary
<http://vocab.deri.ie/void#vocabulary> etc.
are all characteristics of a dataset and not just a single distribution, in
my understanding.

These were our main reasons to combine dcat:Dataset and void:Dataset into
dataid:Dataset.

Markus Freudenberg

Release Manager, DBpedia <http://wiki.dbpedia.org>

On Tue, Mar 14, 2017 at 5:10 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
wrote:

> When we were considering this in the Health Care and Life Sciences
> Community Profile [1] we took the view that the RDF representation was one
> of several possible distributions for a dataset and that it would be
> incorrect to associate that distribution information with the notion of the
> dataset itself. That is, we took the first approach proposed by John.
>
> We specifically did this as not all HCLS datasets are made available in
> RDF and we did not want to make incorrect inferences.
>
> Best regards,
>
> Alasdair
>
> [1] https://www.w3.org/TR/hcls-dataset/
>
> On 14 Mar 2017, at 14:18, John Erickson <olyerickson@gmail.com> wrote:
>
> John makes a great argument for the second approach. That is how we
> tend to think of it.
>
> As with most DCAT-related questions, start with "DCAT is like 'Dublin
> Core' for datasets." In other words, general purpose, good for
> starters, accommodates refinements...
>
> John
>
> On Tue, Mar 14, 2017 at 9:59 AM, John Walker <john.walker@semaku.com>
> wrote:
>
> Hello,
>
>
>
> Following discussion with colleagues, I would like to ask for opinions on
> semantics of dcat:Dataset and void:Dataset.
>
>
>
> We have two points of view.
>
>
>
> First, the RDF version of a dcat:Dataset is a dcat:distribution of that
> dataset and is itself a void:Dataset.
>
> That could be represented as follows:
>
>
>
> <my-dataset> a dcat:Dataset ;
>
>  dcat:distribution <my-rdf-dataset> ;
>
>  .
>
> <my-rdf-dataset> a dcat:Distribution , void:Dataset ;
>
>  void:sparqlEndpoint <sparql> ;
>
> void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ;
>
>  .
>
>
>
> Secondly that a dcat:Dataset that is available as RDF (and possibly other
> forms) is also a void:Dataset.
>
> Or to put it another way: void:Dataset rdfs:subClassOf dcat:Dataset.
>
> That could be represented as follows:
>
>
>
> <my-dataset> a dcat:Dataset, void:Dataset ;
>
>  dcat:distribution <my-sparql-distribution>, <my-rdfxml-distribution>,
> <my-turtle-distribution>;
>
>  void:sparqlEndpoint <sparql> ;
>
>  void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ;
>
>  .
>
> <my-sparql-distribution> a dcat:Distribution ;
>
>  dcat:accessURL <sparql> ;
>
>  .
>
> <my-rdfxml-distribution> a dcat:Distribution ;
>
>  dcat:downloadURL <my-dataset.rdf> ;
>
>  dcat:mediaType "application/rdf+xml"
>
>  .
>
> <my-turtle-distribution> a dcat:Distribution ;
>
>  dcat:downloadURL <my-dataset.ttl> ;
>
>  dcat:mediaType "text/turtle"
>
>  .
>
>
>
> I’m trying to keep an open mind, but leaning towards the second method as
> thinking of the SPARQL endpoint, dumps and crawlable linked data (plus
> other
> forms such as an API or WFS endpoint) as different distributions of the
> same
> dataset seems to fit better with the spirit of DCAT (at least to my
> interpretation of the recommendation).
>
>
>
> Thoughts welcome!
>
>
>
> Regards,
>
> John
>
>
>
>
> --
> John S. Erickson, Ph.D.
> Director of Operations, The Rensselaer IDEA
> Deputy Director, Web Science Research Center (RPI)
> <http://idea.rpi.edu/> <olyerickson@gmail.com>
> Twitter & Skype: olyerickson
>
>
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science,
> School of Mathematical and Computer Sciences
> (Athena SWAN Bronze Award)
> Heriot-Watt University, Edinburgh UK.
>
> Email: A.J.G.Gray@hw.ac.uk
> Web: http://www.macs.hw.ac.uk/~ajg33
> ORCID: http://orcid.org/0000-0002-5711-4872
> Office: Earl Mountbatten Building 1.39
> Twitter: @gray_alasdair
>
>
>
>
>
>
>
>
>
>
> ------------------------------
>
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
>
> This email is sent from the Heriot-Watt University Group, which includes
> Heriot-Watt University, the Edinburgh Business School, and Heriot-Watt
> Services Ltd (Oriam, Scotland's national performance centre for sport). The
> contents (including any attachments) are confidential. If you are not the
> intended recipient of this e-mail, any disclosure, copying, distribution or
> use of its contents is strictly prohibited, and you should please notify
> the sender immediately and then delete it (including any attachments) from
> your system.
>
Received on Wednesday, 15 March 2017 10:22:24 UTC