[dxwg] Best practice for a loosely-structured catalog from Jakub Klímek via GitHub on 2018-06-11 (public-dxwg-wg@w3.org from June 2018)

From: Jakub Klímek via GitHub <sysbot+gh@w3.org>
Date: Mon, 11 Jun 2018 06:33:38 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issues.opened-331060843-1528698817-sysbot+gh@w3.org>

jakubklimek has just created a new issue for https://github.com/w3c/dxwg:

== Best practice for a loosely-structured catalog ==
@dr-shorthair raised this in the mailing list:
> I’ve been doing some investigations of some local repositories and catalogues, and have uncovered that in many cases ‘datasets’ are ‘just a bag of files’. There is no distinction made between part/whole, distribution (representation), and other kinds of relationship (e.g. documentation, schema, supporting documents). So while the precision we are aiming for in DCAT is clearly valuable in terms of semantics, it is difficult to implement on these legacy systems. Mostly I see people using the Dataset-distribution-> relationship for everything … which is clearly incorrect in many cases. But I doubt if we are unusual in this.
>
> I’m thinking about how to advise on this, while not actually breaking DCAT.
If we made dcat:distribution a sub-property of dct:relation
dcat:distribution rdfs:subPropertyOf dct:relation .
> 
> then I think we can have a reasonable recommendation to the simple repositories.
> We could tell repositories that use the ‘just a bag of files’ approach to say

```turtle
 :Dataset987 a dcat:Dataset ;
     dct:relation <file1> , <file2> , <file3> , <file4> , <file5> , <file6> , <file7> … .
```
> which would not be inconsistent with a later reclassification to
```turtle
  :Dataset987 a dcat:Dataset ;
              dct:hasPart <file1> , <file2> ;
              dcat:distribution <file3> , <file4> ;
              dct:conformsTo <file5> ;
              dct:requires <file6> ;
              dct:references <file7> .  
```
> If this is not all mad, I will add a new use-case - something like ‘Mapping from simple repository model’ – as justification, and propose this tiny enhancement.

I had a few concerns regarding this proposal:

1. It is not clear to me from the description what exactly the file* IRIs are. If they were actual downloadable files, i.e. something originally linked using `dcat:downloadURL`, I would disagree with the possibility to allow linking them directly from a `dcat:Dataset` record, as this would create mess everywhere where a publisher would be a bit lazy to describe the data properly.
2. Would it be possible to get a few more detailed examples of how this would work?
3. In my experience, data publishers use the `dcat:distribution` in a wrong way mainly due to the lack of support for dataset series, which is being resolved in this DCAT revision. When this support is added, publishers will have the possibility of modeling many use cases correctly.

Please view or discuss this issue at https://github.com/w3c/dxwg/issues/253 using your GitHub account

Received on Monday, 11 June 2018 06:33:42 UTC