W3C home > Mailing lists > Public > public-dxwg-wg@w3.org > June 2018

Re: [dxwg] How to express distributions provided as compressed files

From: Jakub Klímek via GitHub <sysbot+gh@w3.org>
Date: Wed, 27 Jun 2018 07:34:44 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-400572793-1530084883-sysbot+gh@w3.org>
>To avoid problems, it might be better not to use existing properties dct:format and dcat:mediaType but to create new properties like dcat:containedFormat and dcat:containedMediaType so that there is no confusion with how people have been using dct:format and dcat:mediaType.

@makxdekkers I see your point regarding the backward compatibility. The downside is that the actual data representation format (e.g. CSV, XML, JSON, RDF) will be attached using different properties for compressed/packaged and uncompressed/unpackaged distributions like this:

Uncompressed:
```turtle
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .

<https://data.gov.cz/zdroj/datová-sada/247025684/22> a dcat:Distribution ;
    dcat:accessURL <https://mvcr1.opendata.cz/czechpoint/2007.csv> ;
    dcat:downloadURL <https://mvcr1.opendata.cz/czechpoint/2007.csv> ;
    dct:conformsTo <https://mvcr1.opendata.cz/czechpoint/2007.json> ;
    dct:license <https://data.gov.cz/podmínky-užití/volný-přístup/> ;

    dct:format <http://publications.europa.eu/resource/authority/file-type/CSV> ;
    dcat:mediaType <http://www.iana.org/assignments/media-types/text/csv> ;
```

Compressed:
```turtle
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct: <http://purl.org/dc/terms/> .

<https://data.gov.cz/zdroj/datová-sada/247025684/22> a dcat:Distribution ;
    dcat:accessURL <https://mvcr1.opendata.cz/czechpoint/2007.csv.gz> ;
    dcat:downloadURL <https://mvcr1.opendata.cz/czechpoint/2007.csv.gz> ;
    dct:conformsTo <https://mvcr1.opendata.cz/czechpoint/2007.json> ;
    dct:license <https://data.gov.cz/podmínky-užití/volný-přístup/> ;

    dcat:containedFormat <http://publications.europa.eu/resource/authority/file-type/CSV> ;
    dcat:containedMediaType <http://www.iana.org/assignments/media-types/text/csv> ;

    dcat:packageFormat <http://publications.europa.eu/resource/authority/file-type/TAR> ;
# for TAR there is no media type, but e.g. for ZIP there is dcat:packageMediaType <http://www.iana.org/assignments/media-types/application/zip> ;
    dcat:compressionMediaType <http://www.iana.org/assignments/media-types/application/gzip> ;
    dcat:compressionFormat <http://publications.europa.eu/resource/authority/file-type/GZIP> .
```

To be honest, I am not sure how publishers actually behaved when faced with this challenge in DCAT 2014, i.e. whether they specified `dct:format` as CSV or GZIP when describing compressed files. I think both approaches were used in this case, causing confusion, as CSV made more sense as it was more descriptive, while GZIP described more the actual data file published on the web. The original DCAT 2014 definitions did not provide any guidance regarding this:
- `dcat:mediaType`: `The media type of the distribution as defined by IANA.`
- `dct:format`: `The file format of the distribution.`

-- 
GitHub Notification of comment by jakubklimek
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/259#issuecomment-400572793 using your GitHub account
Received on Wednesday, 27 June 2018 07:34:47 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 30 October 2019 00:15:44 UTC