RE: spoofing and IRIs from John C Klensin on 2010-03-02 (public-iri@w3.org from March 2010)

From: John C Klensin <john-ietf@jck.com>
Date: Tue, 02 Mar 2010 16:18:32 -0500
To: Larry Masinter <LMM@acm.org>
cc: public-iri@w3.org
Message-ID: <141B8C6BCF6F1A6D11805929@PST.JCK.COM>

--On Saturday, February 27, 2010 22:25 -0800 Larry Masinter
<LMM@acm.org> wrote:

> Going through the Security considerations of 
> of draft-ietf-idnabis-defs-13 vs. the current
> "Security Considerations" of the current IRI document
> 
> here's looking at
> http://tools.ietf.org/html/draft-ietf-idnabis-defs 
> section 4:
>...

Larry,

Suggestions, fwiw (mostly drawing comments from other notes
together):

(1) Reference that doc.  As others have pointed out, it
addresses UTR 36, but contains some material that may be more
directly relevant to IRIs generally and their domain name
components in particular.

(2) Point out that neither of those documents (...idnabis-defs
nor UTR36) really addresses "sound alike" (especially to someone
not familiar with the relevant language) issues rather than
"look alike" or "might be expected to be treated alike" ones.
In conjunction with this, note that the problem is not just with
the false positive comparisons that characterize the spoofing
problem but with perceptual false negatives:  people who are
under the delusion the IRIs (or URIs or domain names) are to be
interpreted by humans and who are not computer experts often
expect orthographic variations to compare equal.  Difference in
US and UK spelling, Simplified and Traditional Chinese and maybe
pinyin, conventions about representation of extended Latin
strings in basic Latin characters, and writing of Japanese in
either Kana or Kanji all fall into that category for at least
some populations.

(3) Note that these are problems for _both_ humans and human
perception and user agents that try to guess at strings and
other issues with which humans might have problems so that the
users can be warned.   You've noted the example of trying to
distinguish between familiar and unfamiliar scripts.  Others
have noted that mixed-script situations and the use of some
specific characters can be problematic.  For example, as a
problem very specific to IRIs, there are many characters in
Unicode that could plausibly be confused with forward slashes
and other reserved punctuation.

(4) Of course, we also have the human interface design question
of whether or not one should try to do anything (and possibly
create false expectations and an unreasonable sense of
confidence in being protected) when it is clear that a
comprehensive solution is impossible.  If one inspects browsers
and other IRI-using programs, the consensus seems to be "yes, do
what one can".  That is not the only plausible conclusion and
there is certainly no consensus as to what one should actually
do.   I think it would be wise for the document to say that.

best,
   john

Received on Tuesday, 2 March 2010 21:19:03 UTC