Re: hash cash and email from Al Gilman on 2001-04-28 (www-talk@w3.org from March to April 2001)

From: Al Gilman <asgilman@iamdigex.net>
Date: Sat, 28 Apr 2001 15:01:27 -0400
To: Cem Karan <Cem.Karan@usa.alcatel.com>
Cc: www-talk@w3.org
Message-Id: <200104281856.OAA12236692@smtp2.mail.iamworld.net>
On the question of venue, I believe that the standard answer on behalf of the
IETF is that if you have an idea about something like this you ask the Area
Directors (in case of email, for applications) where to raise the issue or
where to ask for discussion on you Internet-Draft.

On the other hand, it is not clear that any listserv is where you will find
the
right community [i.e. spectrum of participants] that would get you coverage of
all the angles and sufficiently balanced representation of interests to
understand the tradeoffs involved in the possible pattern of practice that you
are contemplating.

Possible candidates:

- CAUCE
- IMC
- SlashDot
- cypherpunks

I didn't make it to the W3C "Web Services" Workshop but the press I read
coming
out of it suggests that 'security and privacy' is still a can of worms. 
Nobody
has established themselves as having their hands around the problem and having
momentum that justifies treating them as a bandwagon to get on board in the
solution of this problem area.

So your guess is as good as mine as to where to nucleate a consensus for
something in this area that would stick in the end.

-- some details inline below 

At 10:40 AM 2001-04-28 -0400, Cem Karan wrote:
>This should actually be aimed towards any group writing the RFCs
>associated with email, but I don't know of a mailing list directly
>associated with them. 
>
>I know the international nature of this mailing list, and I don't know
>if everyone knows the definition of 'spam' as used in the USA.  Spam is
>unsolicited bulk email, kind of like what this mailing list has been
>subjected to recently.  I mention this because spam has started to make
>the usefulness of email go down.  My personal inbox usually has a ratio
>of about 3:1 of spam:useful mail.  This is probably going to get worse
>as time goes on.  Currently, there are several different strategies to
>cope with it, but they all deal with the same fundamental
>problem/blessing of email: there is virtually no cost associated with
>sending a message.  Direct mail advertisers must spend money in order to
>send you mail.  This imposes a low, but totally negligible cost.  The
>cost helps limit the amount of mail that can be sent out.  
>
>Email, as I said before, has a very low, almost negligible cost
>associated with it.  If we could introduce an artificial cost, one that
>is easy to implement, then there would be dissentive to bulk emailers. 
>A possible way of introducing a cost is through hash cash.  The idea is
>based on brute force cryptanlysis.  In order to break an unknown
>cryptographic message, you must try to calculate all of the possible
>keys, matching them to the message.  Eventually, you will find the key,
>and break the message.  However, to do so takes time.  And that is what
>breaks a spammer.  If it takes time to send messages, they can't send
>them to everyone in the world.  
>
>Here is how the scheme works:
>As a user, you are allowed to create any number of hash keys, each of
>which can be any length that you wish.  You can invalidate and create
>new keys at any time (This would require secure validation that proves
>that only you are trying to invalidate or create new keys.  Otherwise,
>anyone can create a new simple to break key in your name, and continue
>to send you messages) When someone wants to send you a message, they
>need to break one of the keys before they can send you the message. 
>People you don't know will most likely try to break the shortest/weakest
>key as that will take the least amount of time.  People you know can be
>given the key to one of the longer hashes, which will allow them to
>break the hash immediately, instead of having to use brute force. 
>Anyone who doesn't have the key can still try a brute force attack, but
>if the key is long enough, then this will take an extraordinary amount
>of time, severely limiting the number of people that they can spam.  The
>result is a system that has an associated cost that is user controllable
>and easy to update.  
>
>See
><http://www.cypherspace.org/~adam/hashcash/>http://www.cypherspace.org/~ad
am/hashcash/
>for more details and an implementation that you can play with.
>
>There is one major problem with this though.  It would require a major
>overhaul of the email systems that are in place.  This would not be
>negligible.  I think that it would be worth the costs, but I would like
>to know what others thought as well.
>

AG::  No change in the installed plant is required at all.  The resulting
action can be taken in your Mail User Agent as mail filtering on your personal
node.  Those messages which are from a) a known good correspondent or b)
someone who took the trouble to follow the protocol will get binned
differently
and read by you in preference to the others.

My problem, as a self-appointed servant to the newbies of the world (the
technologically barely there) is that it is extremely unlikely that someone
with marginal Internet skills will succeed in using any special extra-effort
protocol.  Such schemes are intrinsically likely to limit your mail to a small
circle of people already intensely involved in Internet technology.  It takes
an AOL to implement this on behalf of the clueless, and for a variety of basic
Internet functions including specifically email, the market has so far
rebuffed
attempts to create sub-networks of "better" service in favor of superior
connectivity with _everybody_.

My filtering based on subject line quality is bimodal.  If the subject line is
very well written, it will relate the message to a topic I want to read about
and I will want to read it.  This can be either an individually addressed or
list message and it may come from an unknown or novel correspondent.

The messages with no subject header at all, or other totally clueless entries,
I read.  These are ususally from a hapless individual who genuinely needs help
and I want to understand what sort of help to direct them to.

It is the band in between where there is an ambiguity between the mass
mailling
shill and the genuine correspondent.  Currently, I would say that you can
ususally recognize a spam by its subject header, but only by dint of reading a
continuing stream of spam headers for training.  And this means opening some
fraction of the suspect spam messages for confirmation of your header
interpretation.  This process could be shared with a neural net or other
learning filter.  But the monitoring and testing can't be completely
eliminated.

So for my preferences (who I want my mailbox accessible to) "using the
Internet
right" is not a usable test of incoming messages.  I do want to read the
messages from people who hang on to the ability to send and receive email by
their fingernails.

Simple data-oriented tachniques are as likely to prove equally or more
effective in prioritizing your inbox.

If you maintain the discipline of caching any "From:" information on a message
where "yes, I would be interested in hearing from them again" in a good list
for contacts, then the vast majority of good mail will be recognizable from
simply its From: header and you can afford to get scientific with only the
new.

The remaining problem is how to get scientific about the messages which are
from novel sources.  I presume that you have already eliminated messages that
match your "known bad" list.

Here the best-prognosis form of remedy is to take the message and bounce it
through an analyst Application Service Provider service which will return you
commentary on this message.  They may have access to public commentary about
the site that the emessage links to.  They may offer remarks indicating that
the form of the headers indicate that the From: information is spoofed, that
the nominal node of origin was hacked to send this message.  That's the
sort of
test that I apply manually, and these could be reduced to pattern rules.
Or it
could simply be a pattern of discrimination learned by a neural net based on
what you said about messages you got before.  It could use CC/PP for
communication between your Mail User Agent and the message-critic service
as to
your reading preferences.

In any case, bouncing a message to a "tell me more about this message" service
is something your Mail User Agent can always do.  It doesn't require the
introduction of filters into SMTP.

The overall point is that you will find it very, very, hard to get the
Internet
at large to agree to _reject_ a message for transport.  That would require a
clear and compelling automatically discernable "open and notorious evil
liver."  So don't try to get emessages blocked at the transport gate.  If you
come up with a mail reading prefilter that is so effective that the
preponderance of email readers use it, then the commercial users of email will
conform to its requirements, and pay whatever it reasonably takes.  

There is some belief on the business side that email advertising is a proven
success -- it is a going concern.  To change the rules of email transport, you
will have to overcome opposition from the business community.

So implement whatever scheme you wish as an enhancement to the mail filtering
that people already do on the client side, and you have a ready path to
adoption.  Don't try to change the transport rules until you have an open and
shut case.

Al

>Cem Karan
>
Received on Saturday, 28 April 2001 14:56:58 UTC