Re: [dxwg] DCAT: Proposal for an updated definition for the concept “dataset” (#1195)

For strict semantics, the definition "a dataset is an instance of a data model" is the best I could find to date. If the data model is formally specified, one can verify whether a dataset conforms to it. For example, with RDF we are even in the fortunate position that equivalence of datasets is defined, so for two conforming datasets there is a well-defined procedure to determine equivalence.

The problem right now is - at least to my understanding - is, that dcat:Datasets cannot be linked by owl:sameAs because the identity of dcat datasets includes the authority that publishes it.
So even if the exact same sequence of bytes was published by different authorities, they can never be the same dcat:Dataset.

A maybe better approach in the future would detach the content from the record that describes a dataset as published by some authority. For example, publisher A distributes some content (let's assume a set of triples) as a single download URL, and publisher B re-publishes the same content partitioned by the RDF predicate, this could then be expressed as:

So I suppose I am actually proposing to separate dcat:Dataset from e.g. a future dcat:Content:

#x: = Namespace for future extensions

datasetXPublishedByA a dcat:Dataset ; dct:publisher A ;
  x:content contentXByA .

datasetXPublishedByB a dcat:Dataset ; dct:publisher B ;
  x:content contentXByB .

contentXByA a x:Content ;
  dcat:distribution [ dcat:downloadURL <everything.ttl> ] .

contentXByB a x:Content ;
  dcat:distribution [
    # Merge all content of the union members according to the data model,
    # and one obtains the distribution as a single downloadURL 
    a x:UnionDistribution ;
    x:qualifiedMember [
      x:partitionPredicate rdf:type ;
      x:dataset [ a dcat:Dataset ; x:content [ dcat:distribution [ dcat:downloadURL <rdf-type.ttl> ] ] ]
    x:qualifiedMember [ x:partitionPredicate rdfs:label ... ]


This for example allows to safely express: `contentXByB owl:sameAs contentXByB`, as the content denotes a specific instance of the data model, which is a specific set of triples.

GitHub Notification of comment by Aklakan
Please view or discuss this issue at using your GitHub account

Received on Thursday, 23 January 2020 20:03:56 UTC