Re: gelatinous resources and naming things with their hashes from dorian taylor on 2014-07-12 (public-openannotation@w3.org from July 2014)

From: dorian taylor <dorian.taylor@gmail.com>
Date: Fri, 11 Jul 2014 18:52:23 -0700
To: Bob Morris <morris.bob@gmail.com>
Cc: public-openannotation <public-openannotation@w3.org>
Message-ID: <CAAr_vNzzWWGJE8uh6u1jW8D5Vybe04ht8YQQ8mdcnEUxZE5VUg@mail.gmail.com>

On Sun, Jul 6, 2014 at 2:56 PM, Bob Morris <morris.bob@gmail.com> wrote:

> p.s.  I usually whine about URIs with any semantics whatsover, so I'm not
> sure whether I'd argue against 6920 on these grounds.  And yet....
> p.p.s.  In fact has 6920 taken off anywhere at all?

I wrote an implementation for the URI scheme itself:

http://search.cpan.org/dist/URI-ni/lib/URI/ni.pm

I'm also working on a basic content-addressable storage using the
/.well-known/ idiom:

https://github.com/doriantaylor/p5-store-digest

I intend to use (read: already am using) the ni: scheme in my own
semantic web work for identifying opaque data objects (i.e. blobs,
file-like constructs).

The problem with identifying blobs by their cryptographic hashes is
naturally that informational content does not necessarily (rather, in
the real world, is likely not to) correspond 1:1 to a binary
representation amenable to cryptographic hashes.

As such, I'm also working on an RDF vocab for expressing permutation
functions, which operate over opaque data objects, such that it can
encode ni(x) -> ni(f(x)). This way it's at least possible to map
different permutations of the same "thing". Nevertheless, one would
still have to specify implementation-independent canonicalization
functions for virtually every file and/or message format (and then
implement them). This will be easier for some, but virtually
impossible for others (in particular, JPEGs, or any other lossy media
compression formats).

In the context of Wiki content, however, one could imagine creating
one function that canonicalizes the creole source (e.g. set charset to
UTF8, apply NFKC, remove superfluous whitespace), and a similar
function that reduces the HTML output back to canonicalized creole
source. At that point, the hashes should match.

I came across Phillip Hallam-Baker's draft of the ni: URI scheme back
in 2011: http://tools.ietf.org/html/draft-hallambaker-digesturi-02 ,
back when it was called di: . I had hand-rolled my own blob-identifier
as urn:x-sha-256 a year before that, so was pleased to see that
somebody else was thinking along the same lines (and had given it
considerably more thought). Presumably he and his colleagues are using
the ni: URI scheme.

-- 
Dorian Taylor
Make things. Make sense.
http://doriantaylor.com

Received on Saturday, 12 July 2014 01:52:51 UTC