Re: [dxwg] Dcat issue 1526 bis (#1578)

The introductory paragraph is all about privacy, but the mitigation with a checksum is only about validating the integrity and authenticity of the data, not privacy. So we seem to introduce one topic and then go on to discuss how we can mitigate something entirely different.

I don't think rights statements could potentially include or reference sensitive information such as user and asset identifiers. That is, I don't think that any user identifier in the context of stating rights over a dataset would or could be a breach of privacy. (One cannot expect privacy when asserting a public right.) The real issue is how well one can secure data about people. But detailing how to secure web content and authenticate users is beyond the scope of DCAT.

The checksum is for the distribution described, not the metadata about it. But since the metadata is often offered in the same download as the distribution, the checksum would not provide authenticity unless provided separately. One must either provide the metadata in a secure manner and separately from the data, or also provide the checksum separately.

typo: "different checksum algorithmS might be deployed"

" It is worth noting that the associated checksum will not provide the expected security protections if ..." The checksum does not provide security, only authenticity/integrity.

DCAT providers should make DCAT distribution files (not just metadata) downloadable from authoritative origins.

I don't think we need even mention the RDF dataset canonicalization and hash work. The case of a single entity offering distributions really only calls for a hash for each file provided, and indicating which hash algorithm was used. 

I think Nick's main point is that a checksum should be provided via a route that is separate from the data. It may be included in metadata that is provided with the data, but if so it should also be provided separately to prevent an attacker from manipulating it along with the data. The authenticity of a dataset cannot be assumed if the authenticity of the hash cannot be assured.

Section 6.17: I also think that we should say something about the use of a checksum as an indicator of an update or download error rather than just for verifying integrity.

-- 
GitHub Notification of comment by agreiner
Please view or discuss this issue at https://github.com/w3c/dxwg/pull/1578#issuecomment-1663161262 using your GitHub account


-- 
Sent via github-notify-ml as configured in https://github.com/w3c/github-notify-ml-config

Received on Thursday, 3 August 2023 01:17:29 UTC