RE: No apologies if you receive this multiple times (was CFP: IEE E IC3N'2000)

I believe the issue here is a misunderstanding on what level of heuristics a
mailer could be expected to achieve.  I strongly suspect that it is
computationally infeasible to avoid both false positives (denial of service
attacks) and false negatives (having two versions of a message because, for
instance, one of the mailing lists adds unsubscribe information to the
body).

If I were going to design my ideal mailer, I would instead suggest that the
best user interface metaphor is a merged document, a la the Compare
Documents feature in Word or diff in Unix.  That is, it is quite unlikely to
get two exactly identical messages (at the very least, the Received headers
will probably be different.).  The question is how different are they.
Rather than asking the computer to decide whether they are one message or
two, have it perform a diff, and show messages with the same Message ID as a
single merged message, including strike through and underlined for new text
(or whatever syntactic elements one prefers to highlight the differences).

If (as is most often the case for me), the duplicates only differ by
Received headers, then that information would be hidden unless I selected
Show Headers and I could treat it as one message.  But, I could be confident
that if, for any reason, the message had changed in any sort of significant
form, I would see that as strike through and new text underlines in the main
text.

		- dan
--
Daniel Kohn <mailto:dan@dankohn.com>
tel:+1-425-602-6222  fax:+1-425-602-6223
http://www.dankohn.com

-----Original Message-----
From: Lawrence Greenfield [mailto:leg+@andrew.cmu.edu]
Sent: Tuesday, 2000-05-02 11:25
To: Tim Moors; Keith Moore
Cc: Atiquzzaman@andrew.cmu.edu, Mohammed; discuss@apps.ietf.org
Subject: Re: No apologies if you receive this multiple times (was CFP:
IEEE IC3N'2000) 


Carnegie Mellon's legacy e-mail system has been eliminating duplicates
based on message-id alone (well, with recepient envelope address) for
many years (circa 1985?), and our new system, the Cyrus IMAP server,
also does it.  We never get any user complaints except when it doesn't
work.

The denial-of-service attack is interesting, and text should probably
be added to the relevant document that message-ids should be
reasonably unpredictable if it's not there already.

Larry

   From: Keith Moore <moore@cs.utk.edu>
   Date: Tue, 02 May 2000 12:33:52 -0400

   [end2endinterest removed...not on topic for that discussion]

   in general you don't want to do duplicate suppression based on
   message-id alone, because sometimes the same message-id really
   is used for significantly different messages  (sometimes due
   to software bugs, but if duplicate supression were widespread
   it would probably be a target of malice ... to keep someone from
   seeing a message, send them a different message using the same
   message-id).  some lists significantly modify messages without 
   modifying the message-id.  (and you probably don't want them
   to modify the message-id - it's what lets you trace a message 
   back to its source)

   you can use message-id to find potential duplicates, and then
   compare the messages themselves and use heuristics to determine
   whether they really are more-or-less the same.  or a user agent
   could remove extraneous information (e.g. received headers) from 
   every message it received, hash the result, and compare the hashes
   for duplicates.   I don't know of any user agent that does either 
   of these, and unless one gets lots of duplicate mail, it might
   not be worth the bother.

   mail delivery systems should probably not try to eliminate duplicates
   on behalf of their users. sometimes you actually want to know that
   you got a copy of the message that was sent through a list even if
   you already received a copy by other means.  so the user agent would
   need to do the duplicate suppression if it is to be done at all.

   I don't think we need to find a technical solution to every 
   social problem that exists with email.  the problem exists in other
   fields as well - people sometimes get more than one copy of the
   same mail-order catalog, for instance - and we don't lose too much
   sleep over it.  

   in general, the purpose of apologies are to avoid getting compliants
   from people who are naive enough to think that this is the sender's
   problem rather than the recipient's.

   Keith

Received on Wednesday, 3 May 2000 00:39:28 UTC