Re: [dxwg] authenticity and integrity of dcat files and associated datasets (#1526) from Riccardo Albertoni via GitHub on 2022-11-21 (public-dxwg-wg@w3.org from November 2022)

From: Riccardo Albertoni via GitHub <sysbot+gh@w3.org>
Date: Mon, 21 Nov 2022 19:52:24 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issue_comment.created-1322568624-1669060342-sysbot+gh@w3.org>

> If it's not feasible to provide standardized functionality for authenticity and integrity of DCAT files (or other distributions of the metadata) in the short term, then I think it would be reasonable to:
> 
> 1. add a warning about the security implications of checksum properties when the metadata's authenticity has not been confirmed; and
> 2. list some ways to access DCAT metadata in an authenticated, secure way (downloaded over HTTPS from the expected origin, for example); and
> 3. mark it as an issue for a future version.
> 
> Postponing features has to happen sometimes. But I would strongly recommend that there be a plan to address this in the future, rather than just postponing it as a way to avoid dealing with it. 

Thanks, Nick, for your suggestions; we've included them in the Security and Privacy section; check the second paragraph in https://w3c.github.io/dxwg/dcat/#security_and_privacy

Please feel free to suggest improvements to the draft. 
 
If you can live with the current draft, we will backlog this issue for further consideration in the next standardization round of DCAT ( e.g., DCAT 4).


> I'm not convinced that it's wholly out of scope. One of the only features being added to this version is a checksum property, which is apparently intended to provide security protections, but doesn't provide the expected security protections if there's no way to provide integrity or authenticity of the DCAT metadata.

We have acknowledged that in the new paragraph.
> 
> I'm not sure if the checksum property is fully defined enough that it can be generally interoperably used (is there implementation experience?), but that property assumes that there already exists a canonical way to refer to a distribution, if not a dataset.
> 

This solution is adopted by DCAT-AP 2.1.0. The checksum property range in <code> spdx:Checksum</code> class, which specifies actual <code>spdx:checksumValue</code> and the <code>spdx:algorithm</code> used to produce the checksum. 
DCAT distribution might be in many other formats than RDF. As for RDF,  there is a Group working on the   [RDF Dataset Canonicalization and Hash](https://www.w3.org/groups/wg/rch), and we prefer to wait for their outcomes before recommending anything in that direction.

>Accessing datasets that could be tampered with, or not knowing the provenance or authorship or integrity of a dataset, is a real and significant threat; it affects far more than just the implementers of this spec. I don't think it can be our long-term plan that W3C Recommendations don't provide any mechanism for basic, interoperable security properties and instead rely on the hope that every individual implementation or user will figure out its own way to provide security.


We agree that this is a pervasive and transversal issue that impacts every vocabulary the W3C recommends, and this is the main reason why the solution should be common to all vocabularies.    [RDF Dataset Canonicalization and Hash](https://www.w3.org/groups/wg/rch) Working Group will likely provide a ground upon which RDF vocabularies will build. Anyway,  any further input to consider in the next standardization round is more than welcome.




-- 
GitHub Notification of comment by riccardoAlbertoni
Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1526#issuecomment-1322568624 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Monday, 21 November 2022 19:52:26 UTC