- From: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
- Date: Wed, 12 Feb 2020 14:39:26 +0000
- To: Erich Bremer <erich@ebremer.com>
- Cc: public-rosc@w3.org
On Tue, 11 Feb 2020 15:24:33 -0500, Erich Bremer <erich@ebremer.com> wrote: > In storing the files in the RO Crate zip file, is there a preference for > RDF property for representing an MD5 or SHA-512 hash of the file that is > being stored in the RO crate zip file? There didn't seem to be one at > schema.org but I did find the following: > http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html We have not listed an RDF property for hash within the RO-Crate manifest, as we largely considered that a "transport-level" detail that is better covered by BagIt or Oxford Common File Layout. https://w3id.org/ro/crate/1.0#combining-with-other-packaging-schemes There you should probably use SHA-256 or SHA-512 so it's cryptographically strong, MD5 and SHA-1 should be avoided where possible. Agree that the loc ontology you link to show good identifiers for the hash *functions*, but it does not provide RDF properties for linking to a particular hash. I guess you *could* re-purpose URIs like <http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/sha256> as a property, but then why didn't LOC declare them also as such, given that they have other vocabularies? We should not use it blindly as a property without agreeing what should be a valid subject and object for its use - e.g. would these theoretical properties expect the hash value as bytes, a hex string (with or without spaces? Upper case, lower case or both?), or as a separate HashValue resource? One possibility, if you can avoid insecure MD5 and SHA1, is to use RFC6920 nih: URIs as identifiers https://tools.ietf.org/html/rfc6920 (or shorter ni: which use base64 encoding) for instance: { "@context": "https://w3id.org/ro/crate/1.0/context", "@graph": [ { "@type": "CreativeWork", "@id": "ro-crate-metadata.jsonld", "conformsTo": {"@id": "https://w3id.org/ro/crate/1.0"}, "about": {"@id": "./"}, "description": "RO-Crate Metadata File Descriptor (this file)" }, { "@id": "./", "@type": "Dataset", "name": "Example RO-Crate", "description": "The RO-Crate Root Data Entity", "hasPart": [ {"@id": "data1.txt"}, ] }, { "@id": "data1.txt", "@type": "File", "description": "One of hopefully many Data Entities", "identifier": { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03"} }, ] } An advantage of NI is that they can be rewritten to .well-known http URIs for retrieval - you can then retrieve from any supporting content-delivery platform as you can check the hash afterwards https://tools.ietf.org/html/rfc6920#section-4 As for RO-Crate a more elaborate alternative where you won't need to parse the URI is to use a https://schema.org/PropertyValue similar to https://w3id.org/ro/crate/1.0/#repository-specific-identifiers but linking to the id.loc identifiers: { "@id": "data1.txt", "@type": "File", "description": "A file containing only the line: hello" "identifier": { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03"} }, { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03", "@type": "PropertyValue", "name": "sha256", "unitText", "hexadecimal" "propertyID": { "@id": "http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions/sha256"}, "value": "5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03" } Here I used the more lightweight "propertyID" and "hexadecimal" as unitText, so it would be more of a convention. The RFC6920 URIs are far more rigorously defined and thus my preference, but at the expense of parsing. In this combined approach you get best of both worlds - you have a global content-based @id URI for the data file (content), and you have exposed the sha256 hash value as a separate property so you don't need to parse that URI. BTW, the RO-Crate community don't usually communicate on this list (but perhaps we should), could you raise this as a Use Case on https://github.com/researchobject/ro-crate/issues ? Feel free to link to my reply! We can then discuss it on the next RO-Crate telcon, which is scheduled for 2020-02-27 <https://s.apache.org/ro-crate-minutes> It might also be worth asking on the schema.org list as there might be others there dealing with hash values. -- Stian Soiland-Reyes The University of Manchester 🐝 https://www.esciencelab.org.uk/ https://orcid.org/0000-0001-9842-9718
Received on Wednesday, 12 February 2020 14:39:47 UTC