W3C home > Mailing lists > Public > public-rdf-dawg@w3.org > October to December 2010

Re: md5sum and sha1sum functions

From: Steve Harris <steve.harris@garlik.com>
Date: Tue, 7 Dec 2010 11:41:57 +0000
Cc: Paul Gearon <gearon@ieee.org>, Andy Seaborne <andy.seaborne@epimorphics.com>, SPARQL Working Group <public-rdf-dawg@w3.org>
Message-Id: <EAE9C739-945E-42C7-8823-3957B93B9B45@garlik.com>
To: Steve Harris <steve.harris@garlik.com>
One more thing, the canonical form for hex numbers in XSD is uppercase letters (http://www.w3.org/TR/xmlschema-2/#hexBinary-lexical-representation). On the other hand, though FOAF doesn't specify, but the vast majority of foaf:mbox_sha1sum examples I've seen in the wild have been lower case.

It can be worked around with judicious application of either UCASE() or LCASE(), but we should specify one. I have no preference.

FWIW, "deadbeef"^^xsd:hexBinary = "DEADBEEF"^^xsd:hexBinary, so that would sortof be a solution, but adding a datatype to SPARQL just to handle this problem seems crazy.

- Steve

On 2010-12-07, at 10:43, Steve Harris wrote:

> On 2010-12-06, at 17:39, Paul Gearon wrote:
> 
>> On Mon, Dec 6, 2010 at 7:44 AM, Steve Harris <steve.harris@garlik.com> wrote:
>>> If these are going to return a simple literal containing hex characters, rather than some 128 / 160 / 256 bit integer datatype, then I'd prefer MD5_HEX() etc.
>> 
>> The standard result of this algorithm is almost always presented as a
>> hex string, so that may be redundant. It might be nice to have a
>> version that returned a xsd:hexBinary. I also thought it would be nice
>> to accept an xsd:hexBinary as an alternative to a string. That said, I
>> didn't suggest anything like these since I was trying to keep it
>> simple, and not bloat the spec.
> 
> OK, the languages I'm most familiar with (C and Perl), return big integers (actually structs of ints) by default.
> 
> I guess I don't foresee any serious problems just returning hex strings in SPARQL.
> 
>>> I marginally prefer named functions, e.g. SHA256_HEX(?x), rather than SHA_HEX(?x, 256). Length might not be enough to distinguish all algorithms on it own, so we could end up with some odd cases.
>>> 
>>> BTW, hexadecimal SHA1 values are 40 characters long.
>> 
>> They are indeed. I copied/pasted from MD5 and missed that part, sorry.
> 
> I guessed that was the problem, the intent was clear enough.
> 
> - Steve
> 
> -- 
> Steve Harris, CTO, Garlik Limited
> 1-3 Halford Road, Richmond, TW10 6AW, UK
> +44 20 8439 8203  http://www.garlik.com/
> Registered in England and Wales 535 7233 VAT # 849 0517 11
> Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
> 
> 

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Tuesday, 7 December 2010 11:42:35 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 16:15:44 GMT