Re: Cryptographic Hash Functions

On Tue, 11 Feb 2020 15:24:33 -0500, Erich Bremer <> wrote:
> In storing the files in the RO Crate zip file, is there a preference for
> RDF property for representing an MD5 or SHA-512 hash of the file that is
> being stored in the RO crate zip file?  There didn't seem to be one at
> but I did find the following:

We have not listed an RDF property for hash within the RO-Crate manifest,
as we largely considered that a "transport-level" detail that is better
covered by BagIt or Oxford Common File Layout.

There you should probably use SHA-256 or SHA-512 so it's
cryptographically strong, MD5 and SHA-1 should be avoided where

Agree that the loc ontology you link to show good identifiers for the
hash *functions*, but it does not provide RDF properties for linking to a
particular hash.

I guess you *could* re-purpose URIs like
as a property, but then why didn't LOC declare them also as such, given
that they have other vocabularies?

We should not use it blindly as a property without agreeing what should
be a valid subject and object for its use - e.g. would these theoretical
properties expect the hash value as bytes, a hex string (with or without
spaces? Upper case, lower case or both?), or as a separate HashValue

One possibility, if you can avoid insecure MD5 and SHA1, is to
use RFC6920 nih: URIs as identifiers
(or shorter ni: which use base64 encoding)

for instance:

{ "@context": "",
  "@graph": [

      "@type": "CreativeWork",
      "@id": "ro-crate-metadata.jsonld",
      "conformsTo": {"@id": ""},
      "about": {"@id": "./"},
      "description": "RO-Crate Metadata File Descriptor (this file)"
      "@id": "./",
      "@type": "Dataset",
      "name": "Example RO-Crate",
      "description": "The RO-Crate Root Data Entity",
      "hasPart": [
        {"@id": "data1.txt"},
      "@id": "data1.txt",
      "@type": "File",
      "description": "One of hopefully many Data Entities",
      "identifier": { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03"}

An advantage of NI is that they can be rewritten to .well-known http
URIs for retrieval - you can then retrieve from any supporting content-delivery 
platform as you can check the hash afterwards

As for RO-Crate a more elaborate alternative where you won't need to
parse the URI is to use a similar to
but linking to the id.loc identifiers:

      "@id": "data1.txt",
      "@type": "File",
      "description": "A file containing only the line: hello"
      "identifier": { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03"}
    { "@id": "nih:sha-256;5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03",
      "@type": "PropertyValue",
      "name": "sha256",
      "unitText", "hexadecimal"
      "propertyID": { "@id": ""},
      "value": "5891b5b522d5df086d0ff0b110fbd9d21bb4fc7163af34d08286a2e846f6be03"

Here I used the more lightweight "propertyID" and "hexadecimal" as
unitText, so it would be more of a convention. The RFC6920 URIs are far
more rigorously defined and thus my preference, but at the expense of

In this combined approach you get best of both worlds - you have a
global content-based @id URI for the data file (content), and you have
exposed the sha256 hash value as a separate property so you don't need
to parse that URI.

BTW, the RO-Crate community don't usually communicate on this list (but
perhaps we should), could you raise this as a Use Case on ? 

Feel free to link to my reply!

We can then discuss it on the next RO-Crate telcon, which is 
scheduled for 2020-02-27

It might also be worth asking on the list as there might be
others there dealing with hash values.

Stian Soiland-Reyes
The University of Manchester 🐝

Received on Wednesday, 12 February 2020 14:39:47 UTC