Re: Cryptography In CWM: Hashes from Tim Berners-Lee on 2001-12-01 (www-archive@w3.org from December 2001)

From: Tim Berners-Lee <timbl@w3.org>
Date: Fri, 30 Nov 2001 20:09:04 -0500
To: "Sean B. Palmer" <sean@mysterylights.com>, "Dan Connolly" <connolly@w3.org>
Cc: <www-archive@w3.org>
Message-ID: <01ee01c17ad3$ddc0c210$e9001d12@CREST>
Woooo!  Yeah, spb!  Thank you!
I haven't looked at it yet  (distractions yesterday),
but comments on your mail follow.
Tim

----- Original Message -----
From: "Sean B. Palmer" <sean@mysterylights.com>
To: "Tim Berners-Lee" <timbl@w3.org>; "Dan Connolly" <connolly@w3.org>
Cc: <www-archive@w3.org>
Sent: Thursday, November 29, 2001 8:25 PM
Subject: Cryptography In CWM: Hashes


> Summary: the beginnings of a cryptography module for CWM, as
> "cwm_crypto.py", with hash-finding built-ins.
>
> I wanted to do signature validation before I shipped this off, but
> since I have to download several hundred packages to do that, I

Several hundred?  I hope that is an exaggeration.
Which crypto package are you using, I wonder?

> thought I'd just archive this first.

Good move.

> As a test of the module, I used crypto.n3:-
>
>    python cwm.py crypto.n3 -think -purge > crypto-out.n3
>
> and the file "test.txt", which simply contains the string "blargh".
> The result is:-
>
> [[[
>      @prefix : <#> .
>      @prefix crypto: <http://www.w3.org/2000/10/swap/crypto#> .
>      @prefix log: <http://www.w3.org/2000/10/swap/log#> .
>      @prefix string: <http://www.w3.org/2000/10/swap/string#> .
>
>     <file:c:\test.txt>     a :GetHashFile;
>          :content "blargh";
>          :md5 "ef15c9bd4c7836612b1567f4c8396726";
>          :sha "d1e670385f40ee942a059f949c761214872ac35f" .
> ]]] - <<crypto-out.n3>>
>
> The files are attached. The most important is <<cwm_crypto.py>>, which
> is the actual module as it stands. I also needed to modify llyn.py to
> register the built-ins, so that is attached too, as <<llyn.py>>.
> <<crypto.n3>> is the test file, and <<crypto-out.n3>> is the output.
> Also attached is the simple <<test.txt>> "blargh" test file. The paths
> should be modified appropriately. cwm_crypto.py was based on one of
> the other built-ins modules: cwm_string.py.

Great

> The properties that one can use at the moment are:-
>
>    crypto:md5 a rdf:Property; rdfs:label "md5";
>       rdfs:comment "The MD5 hash of a string";
>       rdfs:domain string:String; rdfs:range string:String .
>
>    crypto:sha a daml:UnambiguousProperty,
>       daml:UniqueProperty; rdfs:label "sha";
>       rdfs:comment "The SHA hash of a string";
>       rdfs:domain string:String; rdfs:range string:String .

I notice your higher trust of sha1!

> Upper-case property names are used in the crypto.n3 schema [1] (which
> needs chACL-ing), but I prefer to use lower-case for properties, and
> upper-case (prefixes) for classes, so I changed them.

agreed.

> To get the hash of a file, you of course have to use log:content on
> it... I did consider just putting in a built-in function that would do
> that for you, but it seems more sensible to deploy one standard
> approach.

I agree.  The string is the fundamental thing.  If you want to read or write
files or something, then that should be separate.  KIt is easy enough to
define content hashing functions.

One question, to be purist, is to whether the hashing and the
base64 encoding should be split.

I thought that we would need separate functions to decode
and encode base64, because they are not inverse, because
the decoding is many to one, and the encoding has to be one-one!
The convention for naming them is tricky  base42 and anyBase64?
cannonicalBase64 and base64?

What happens is we cheat and call base64 a function and a reverse
function, and allow it to make an arbitrary choice in the encoding
seqiuence?   I say   "foo"  :base64 :y, meaning "foo" has base64 encoding y,
and then it generates :y so that it is *one* of the strings which will make
"foo" :base64 :y true.  Does this lead  to any logical inconsistencies?
Yes, I think you get problems when you are, say, checking a set
of strings to see which one is the base64 encoding of foo.

{ "foo" :base64 :y;   :y a :inputString } log:implies { ...}

The query engine would generate a string for y, and then
check class membership, which would (probbaly) fail,
instead of  finding all menbers of :inputString and  then
checking each one for being a valid base64 encoding of foo.

I don't like verbs in properties (base64-encode, -decode, -generate etc)
Maybe
                                 :cannonicalBase64

indicting its 1-1 nature, which is the declaritive aspect of the
encoding function.

> I also considered using a new "hash:" URI scheme to identify
> hashes as first class objects on the Web, but after considering it
> carefully, decided not to.

I think some of those schemes exist.
I think you made the right choice.
We can add a

{ ( "md5:"  [is crypto:md5 of [is log:content of :x]] ) string:concatenation
[is log:uri of :y ] log:implies { :x = :y }.

rule if we need  to generate those, I suppose.

> For signatures, I intend to hack on the mxCrypto stuff, but you need
> an incredible amount of stuff in order to install that, so that's on
> the TODO list :-)

I looked at the amkCrypto, which is billed as a temporay fixing fork of
mxCrypo. I couldn't figure out in the few minutes I gave it what else I
had to do to make it work.  It didn't seem too huge - maybe I was
missing masses of C code.

It is great that you have taken this on!  A todo list shared ....

Tim


> Cheers,
>
> [1] http://www.w3.org/2000/10/swap/crypto.n3
>
> --
> Kindest Regards,
> Sean B. Palmer
> @prefix : <http://webns.net/roughterms/> .
> :Sean :hasHomepage <http://purl.org/net/sbp/> .
>
Received on Saturday, 1 December 2001 20:51:40 UTC