- From: Dan Kohn <dan@dankohn.com>
- Date: Tue, 2 May 2000 21:25:50 -0700
- To: discuss@apps.ietf.org
I believe the issue here is a misunderstanding on what level of heuristics a mailer could be expected to achieve. I strongly suspect that it is computationally infeasible to avoid both false positives (denial of service attacks) and false negatives (having two versions of a message because, for instance, one of the mailing lists adds unsubscribe information to the body). If I were going to design my ideal mailer, I would instead suggest that the best user interface metaphor is a merged document, a la the Compare Documents feature in Word or diff in Unix. That is, it is quite unlikely to get two exactly identical messages (at the very least, the Received headers will probably be different.). The question is how different are they. Rather than asking the computer to decide whether they are one message or two, have it perform a diff, and show messages with the same Message ID as a single merged message, including strike through and underlined for new text (or whatever syntactic elements one prefers to highlight the differences). If (as is most often the case for me), the duplicates only differ by Received headers, then that information would be hidden unless I selected Show Headers and I could treat it as one message. But, I could be confident that if, for any reason, the message had changed in any sort of significant form, I would see that as strike through and new text underlines in the main text. - dan -- Daniel Kohn <mailto:dan@dankohn.com> tel:+1-425-602-6222 fax:+1-425-602-6223 http://www.dankohn.com -----Original Message----- From: Lawrence Greenfield [mailto:leg+@andrew.cmu.edu] Sent: Tuesday, 2000-05-02 11:25 To: Tim Moors; Keith Moore Cc: Atiquzzaman@andrew.cmu.edu, Mohammed; discuss@apps.ietf.org Subject: Re: No apologies if you receive this multiple times (was CFP: IEEE IC3N'2000) Carnegie Mellon's legacy e-mail system has been eliminating duplicates based on message-id alone (well, with recepient envelope address) for many years (circa 1985?), and our new system, the Cyrus IMAP server, also does it. We never get any user complaints except when it doesn't work. The denial-of-service attack is interesting, and text should probably be added to the relevant document that message-ids should be reasonably unpredictable if it's not there already. Larry From: Keith Moore <moore@cs.utk.edu> Date: Tue, 02 May 2000 12:33:52 -0400 [end2endinterest removed...not on topic for that discussion] in general you don't want to do duplicate suppression based on message-id alone, because sometimes the same message-id really is used for significantly different messages (sometimes due to software bugs, but if duplicate supression were widespread it would probably be a target of malice ... to keep someone from seeing a message, send them a different message using the same message-id). some lists significantly modify messages without modifying the message-id. (and you probably don't want them to modify the message-id - it's what lets you trace a message back to its source) you can use message-id to find potential duplicates, and then compare the messages themselves and use heuristics to determine whether they really are more-or-less the same. or a user agent could remove extraneous information (e.g. received headers) from every message it received, hash the result, and compare the hashes for duplicates. I don't know of any user agent that does either of these, and unless one gets lots of duplicate mail, it might not be worth the bother. mail delivery systems should probably not try to eliminate duplicates on behalf of their users. sometimes you actually want to know that you got a copy of the message that was sent through a list even if you already received a copy by other means. so the user agent would need to do the duplicate suppression if it is to be done at all. I don't think we need to find a technical solution to every social problem that exists with email. the problem exists in other fields as well - people sometimes get more than one copy of the same mail-order catalog, for instance - and we don't lose too much sleep over it. in general, the purpose of apologies are to avoid getting compliants from people who are naive enough to think that this is the sender's problem rather than the recipient's. Keith
Received on Wednesday, 3 May 2000 00:39:28 UTC