RE: Relationship of dcat:Dataset and void:Dataset

Looks like FRBR to me :-) 

-----Original Message-----
From: Dave Reynolds [mailto:dave.e.reynolds@gmail.com] 
Sent: Wednesday, 15 March, 2017 22:29
To: public-lod@w3.org
Subject: Re: Relationship of dcat:Dataset and void:Dataset

For what it's worth, personally I agree with this analysis.

dcat:Dataset is best regarded as an abstract thing which then gets represented/expressed as an RDF graph or a set of RDBMS tables or whatever. Each of which can then be distributed/manifested in multiple different ways.

Hence for greatest cleanliness having something between a dcat:Dataset and the dcat:Distribution could make sense.

However, in practice its likely to be a complication too far for most uses.

Fundamentally the notion of a "dataset" itself doesn't really work in a linked data world. Datasets have boundaries - what's in or out of the set. The point of linked data is break down those boundaries.

Dave

On 15/03/17 10:59, Pano Maria wrote:
> I am one of John Walker's colleagues, and as John says we've been 
> having some interesting discussions on this topic. I'm partial to the 
> first option he presents, as our situation is similar to the situation 
> that Alasdair described.
>
>
>
> As an example:
>
> We have a collection of data pertaining to the addresses and buildings 
> in the Netherlands that is distributed in many different ways: WFS, 
> WMS, GML data dump, etc. Our linked data version of this collection of 
> data is actually created by transforming one of these sources to 
> linked data and subsequently exposing this via a SPARQL endpoint and REST API's.
>
>
>
> In my view a dcat:Dataset is the *abstract* representation of some 
> collection of data. That is, I can say stuff about this dataset in an 
> abstract sense, like who the curator is, what the accrual periodicity 
> is, what the spatial extent is, when it was last updated, etc., 
> without this collection of data having to have any specific concrete 
> form. This also fits well with the situation that Alasdair describes, 
> and the above example.
>
>
>
> In my opinion one resource should therefore not be an instance of both 
> dcat:Dataset and void:Dataset, since if we consider the definition of
> void:Dataset: "A dataset is a set of RDF triples that are published, 
> maintained or aggregated by a single provider" [1], we're clearly not 
> describing an abstract collection of data.
>
>
>
> Now, where I agree it all becomes a bit muddy is when you think of a 
> set of RDF triples, i.e. a void:Dataset, being distributable in 
> several different ways (SPARQL endpoint, datadump, LDP API etc.) What 
> does that make our void:Dataset? Note that the properties of the 
> abstract dataset are still relevant to describe all these different 
> forms of the collection of data.
>
>
>
> So, maybe what we are missing is a way to distinguish an expression of 
> an (abstract) collection of data from the distribution of that 
> expression. Analogous to the `work -> expression -> manifestation -> 
> item` that the FRBR model [2] uses. That would lead to a dcat:Dataset 
> representing the abstract dataset, one or more expressions of this 
> dataset, e.g. an RDF expression and thus a void:Dataset, and 
> dcat:Distributions of those expressions.
>
>
>
> The downside is that it becomes quite philosophical...
>
>
>
> As it stands currently, I'm still inclined to consider a void:Dataset 
> a better match with dcat:Distribution than with dcat:Dataset, because 
> of the need to use dcat:Dataset in an expression-independent way.
>
>
>
> Kind regards,
>
>
>
> Pano Maria
>
>
>
> [1] https://www.w3.org/TR/void/

>
> [2] http://www.sparontologies.net/ontologies/fabio

>
>
>
> *Van:*Markus Freudenberg [mailto:markus.freudenberg@gmail.com]
> *Verzonden:* woensdag 15 maart 2017 11:22
> *Aan:* Gray, Alasdair J G
> *CC:* John Erickson; John Walker; public-lod@w3.org; 
> public-dwbp-wg@w3.org
> *Onderwerp:* Re: Relationship of dcat:Dataset and void:Dataset
>
>
>
> We had a very similar discussion about how to marry DCAT with VOID 
> (and what to do with void:Dataset) for DataID 
> <http://dataid.dbpedia.org/ns/core.html>.
>
>
>
> In the end, we decided to define dataid:Dataset as sub of dcat:Dataset 
> and void:Dataset for the following reasons:
>
>
>
> 1. their similar definitions :
>
>
>
>     void:Dataset "[...] we think of a dataset asa meaningful 
> collection of triples, that deal with a certain topic, originate from 
> a certain source or process, are hosted on a certain server, or are 
> aggregated by a certain custodian." [1]
>
>     dcat:Dataset "[...] collection of data, published or curated by a 
> single agent, and available for access or download in one or more 
> formats." [2]
>
>
>
> It appears, all of what is stated about a dcat:Dataset is true for a 
> void:Dataset (including the possibility of different formats).
>
>
>
> 2. the similarities between dcat:CatalogRecord and void:DatasetDescription:
>
>
>
> Both provide some form of metadata about a dataset. Both are using 
> foaf:topic / foaf:primaryTopc to point out the (Dataset) entity of interest.
>
> When combining DCAT and VOID using the first option, a 
> dcat:CatalogRecord would reference a dcat:Dataset, while a 
> void:DatasetDescription would reference a dcat:Distribution.
>
>
>
> 3. void:subset
>
>
>
> Points out a subset of a void:Dataset. If a void:Dataset is also 
> considered a dcat:Distribution, one would have to deal with the notion 
> of a 'sub-distributions'.
>
> Which is a point of contention (as far as I remember the discussion at 
> SDSVoc). We rather use this property with DataID to provide the 
> missing hierarchical pointers between datasets.
>
>
>
> 4.  The definition of dcat:Distribution
>
>
>
>     dcat:Distribution: "Represents a specific available form of a dataset."
>
>
>
> The definition of a void:Dataset is different since it only narrows 
> the available formats of a dataset to RDF, not to a specific serialization.
> Also, no VOID properties offer no further clarification on the 
> 'specific available format' of the dataset.
>
>
>
> VOID Properties like:
>
> classes <http://vocab.deri.ie/void#classes> | distinctObjects 
> <http://vocab.deri.ie/void#distinctObjects> | distinctSubjects 
> <http://vocab.deri.ie/void#distinctSubjects> | documents 
> <http://vocab.deri.ie/void#documents> | entities 
> <http://vocab.deri.ie/void#entities> | properties 
> <http://vocab.deri.ie/void#properties> | property 
> <http://vocab.deri.ie/void#property> | propertyPartition 
> <http://vocab.deri.ie/void#propertyPartition> | triples 
> <http://vocab.deri.ie/void#triples> | vocabulary 
> <http://vocab.deri.ie/void#vocabulary> etc.
>
> are all characteristics of a dataset and not just a single 
> distribution, in my understanding.
>
>
>
> These were our main reasons to combine dcat:Dataset and void:Dataset 
> into dataid:Dataset.
>
>
> Markus Freudenberg
>
>
>
> Release Manager, DBpedia <http://wiki.dbpedia.org>
>
>
>
> On Tue, Mar 14, 2017 at 5:10 PM, Gray, Alasdair J G 
> <A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>> wrote:
>
>     When we were considering this in the Health Care and Life Sciences
>     Community Profile [1] we took the view that the RDF representation
>     was one of several possible distributions for a dataset and that it
>     would be incorrect to associate that distribution information with
>     the notion of the dataset itself. That is, we took the first
>     approach proposed by John.
>
>
>
>     We specifically did this as not all HCLS datasets are made available
>     in RDF and we did not want to make incorrect inferences.
>
>
>
>     Best regards,
>
>
>
>     Alasdair
>
>
>
>     [1] https://www.w3.org/TR/hcls-dataset/

>
>
>
>         On 14 Mar 2017, at 14:18, John Erickson <olyerickson@gmail.com
>         <mailto:olyerickson@gmail.com>> wrote:
>
>
>
>         John makes a great argument for the second approach. That is how we
>         tend to think of it.
>
>         As with most DCAT-related questions, start with "DCAT is like
>         'Dublin
>         Core' for datasets." In other words, general purpose, good for
>         starters, accommodates refinements...
>
>         John
>
>         On Tue, Mar 14, 2017 at 9:59 AM, John Walker
>         <john.walker@semaku.com <mailto:john.walker@semaku.com>> wrote:
>
>             Hello,
>
>
>
>             Following discussion with colleagues, I would like to ask
>             for opinions on
>             semantics of dcat:Dataset and void:Dataset.
>
>
>
>             We have two points of view.
>
>
>
>             First, the RDF version of a dcat:Dataset is a
>             dcat:distribution of that
>             dataset and is itself a void:Dataset.
>
>             That could be represented as follows:
>
>
>
>             <my-dataset> a dcat:Dataset ;
>
>              dcat:distribution <my-rdf-dataset> ;
>
>              .
>
>             <my-rdf-dataset> a dcat:Distribution , void:Dataset ;
>
>              void:sparqlEndpoint <sparql> ;
>
>             void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ;
>
>              .
>
>
>
>             Secondly that a dcat:Dataset that is available as RDF (and
>             possibly other
>             forms) is also a void:Dataset.
>
>             Or to put it another way: void:Dataset rdfs:subClassOf
>             dcat:Dataset.
>
>             That could be represented as follows:
>
>
>
>             <my-dataset> a dcat:Dataset, void:Dataset ;
>
>              dcat:distribution <my-sparql-distribution>,
>             <my-rdfxml-distribution>,
>             <my-turtle-distribution>;
>
>              void:sparqlEndpoint <sparql> ;
>
>              void:dataDump <my-dataset.rdf>, <my-dataset.ttl> ;
>
>              .
>
>             <my-sparql-distribution> a dcat:Distribution ;
>
>              dcat:accessURL <sparql> ;
>
>              .
>
>             <my-rdfxml-distribution> a dcat:Distribution ;
>
>              dcat:downloadURL <my-dataset.rdf> ;
>
>              dcat:mediaType "application/rdf+xml"
>
>              .
>
>             <my-turtle-distribution> a dcat:Distribution ;
>
>              dcat:downloadURL <my-dataset.ttl> ;
>
>              dcat:mediaType "text/turtle"
>
>              .
>
>
>
>             I’m trying to keep an open mind, but leaning towards the
>             second method as
>             thinking of the SPARQL endpoint, dumps and crawlable linked
>             data (plus other
>             forms such as an API or WFS endpoint) as different
>             distributions of the same
>             dataset seems to fit better with the spirit of DCAT (at
>             least to my
>             interpretation of the recommendation).
>
>
>
>             Thoughts welcome!
>
>
>
>             Regards,
>
>             John
>
>
>
>
>         --
>         John S. Erickson, Ph.D.
>         Director of Operations, The Rensselaer IDEA
>         Deputy Director, Web Science Research Center (RPI)
>         <http://idea.rpi.edu/> <olyerickson@gmail.com
>         <mailto:olyerickson@gmail.com>>
>         Twitter & Skype: olyerickson
>
>
>
>     Alasdair J G Gray
>
>     Fellow of the Higher Education Academy
>     Assistant Professor in Computer Science,
>     School of Mathematical and Computer Sciences
>     (Athena SWAN Bronze Award)
>     Heriot-Watt University, Edinburgh UK.
>
>     Email: A.J.G.Gray@hw.ac.uk <mailto:A.J.G.Gray@hw.ac.uk>
>     Web: http://www.macs.hw.ac.uk/~ajg33

>     ORCID: http://orcid.org/0000-0002-5711-4872

>     Office: Earl Mountbatten Building 1.39
>     Twitter: @gray_alasdair
>
>
>
>
>
>
>
>
>
>
>     
> ----------------------------------------------------------------------
> --
>
>     Founded in 1821, Heriot-Watt is a leader in ideas and solutions.
>     With campuses and students across the entire globe we span the
>     world, delivering innovation and educational excellence in business,
>     engineering, design and the physical, social and life sciences.
>
>     This email is sent from the Heriot-Watt University Group, which
>     includes Heriot-Watt University, the Edinburgh Business School, and
>     Heriot-Watt Services Ltd (Oriam, Scotland's national performance
>     centre for sport). The contents (including any attachments) are
>     confidential. If you are not the intended recipient of this e-mail,
>     any disclosure, copying, distribution or use of its contents is
>     strictly prohibited, and you should please notify the sender
>     immediately and then delete it (including any attachments) from your
>     system.
>
>
>

Received on Wednesday, 15 March 2017 21:49:48 UTC