Re: Page Security Score proposal from Mary Ellen Zurko on 2007-06-19 (public-wsc-wg@w3.org from June 2007)

From: Mary Ellen Zurko <Mary_Ellen_Zurko@notesdev.ibm.com>
Date: Tue, 19 Jun 2007 12:28:46 -0400
To: rachna.public@gmail.com
Cc: public-wsc-wg@w3.org
Message-ID: <OF982B512F.DAF4F294-ON852572FF.005A7A32-852572FF.005A8410@LocalDomain>
Thanks. At a glance I didn't see anything about usability testing. Was 
there any?

          Mez





"Rachna Dhamija" <rachna.public@gmail.com> 
Sent by: public-wsc-wg-request@w3.org
06/18/2007 08:09 PM

To
michael.mccormick@wellsfargo.com
cc
public-wsc-wg@w3.org
Subject
Re: Page Security Score proposal







Related to this topic, we should be aware of Spoofguard, an IE plugin
that was developed a few years ago by Boneh and Mitchell's group at
Stanford.

It analyzes web pages and collapses several heuristics into one
indicator (a green/yellow/red traffic light).  Users can set the
weights for each heuristic and some threshold where a warning message
is displayed.  The heuristics include page visit history, image cache
history, the number of unencrypted password fields, if the user
arrived to the page by clicking on an email link and some checks on
the domain name and URLs in the page.

http://crypto.stanford.edu/SpoofGuard/

I'll add it to the shared bookmarks.

Rachna


On Jun 18, 2007, at 3:43 PM, <michael.mccormick@wellsfargo.com> wrote:

Johnathan,

There is admittedly some arbitrariness to the weights I used in my
scoring formula, but I think if you play with it you'll start to see
the aggregate scores move up and down in a more or less reasonable
way, especially considering this was only a straw man formula designed
to enable discussion.  (Without an actual example formula I was
concerned this proposal would be too abstract for people to fully
understand.)

Your point about brittleness is well taken.  I agree the scoring
formula will have to adapt occasionally to changing technologies as
new security indicators become available, etc.

I sympathize with your preference to keep all the SCIs separate rather
than aggregating them to a single gauge.  I'm not proposing the
individual secondary SCIs shouldn't be available to IT security savvy
users like you & me.  But I don't accept the premise that ordinary
users can make sense of them (much less sound risk decisions).

A detailed "hi fi" SCI is obviously superior for those who have the
training & expertise to use it.  But for everyone else a simple "lo
fi" SCI is better than none at all.

So the fundamental questions seem to be:
 - Are the many security & risk context indicators we've identified
(PageSecurityInfo) usable in raw form by typical web users?
 - If not, should agents attempt to distill them down to something
simple and intuitive -- even if it's low fidelity?

My own answers would be No and Yes, respectively.

Thanks for your thoughtful comments.  Mike

From: Johnathan Nightingale [mailto:johnath@mozilla.com]
Sent: Friday, June 15, 2007 10:40 AM
To: Mary Ellen Zurko
Cc: McCormick, Mike; public-wsc-wg@w3.org
Subject: Re: Page Security Score proposal

On 15-Jun-07, at 10:16 AM, Mary Ellen Zurko wrote:

I believe we're likely to achieve concensus that there should be some
primary SCI display (there are accessibility and device
size/characteristics to be accounted for orthogonally, as well as the
multicultural aspect raised by Bruno/ANEC; I assume those and do not
explicitly address them here). To the extent there is a primary SCI
display, it will have to have some sort of levels or gradations
(on/off, 3 levels as in "what is a secure page", 4 levels as this
proposal suggests, 99 levels/gradations as this proposal also
suggests). No one seems to be proposing something with no levels as a
primary SCI (that is currently relegated to secondary SCI in PageInfo,
and rightly so in my opinion). We discussed the issue of medium/high
risk situations that are pure display (no input) during one of the
lightening discussions I led, and there seemed to be concensus that
there would be pure display use cases of medium/high risk data, which
also points towards concensus around a primary SCI display. Now would
be the time for any participant to indicate that we did not have
concensus on the need for recommendations around a primary display of
SCI which reflects some level or gradation of security that is meant
to be usable for trust decisions.

So, as a meta point, it seems wrong to me to assume silence on the
wire vis-a-vis email discussion of a proposal constitutes consensus.
I don't think that's what you were doing here Mez, because as you
mentioned, some of this has been discussed in lightning discussion
(though I think not one I was around for - my bad) but I just wanted
to throw it out there.  I know that when running a workgroup like
this, and going through periods of frustrating silence, declaring
consensus can be a good way of stirring people to action, but I would
think that for Pass/Fail on individual recommendations, it might get
us into a situation where people withdraw because things made it
quietly into the recs that they don't believe in.  I think the
approach you began on the last call, where we dive into a specific
recommendation for more detail, and presumably where that culminates
in people discussing whether it should be in the recs Yes or No, is a
better way to go.  </meta>

This is me indicating that we do not have consensus on the need for
recommendations around a primary display of SCI which reflects some
level of gradation of security that is meant to be usable for trust
decisions.  :)

During lightning discussions, I obviously didn't want to throw up a
bunch of stop energy, but I think that trying to aggregate a
multi-dimensional space of indicators into a single
number/letter/colour is (with apologies to Yngve and Opera's
multi-level padlock) the wrong way to go.  Or at very least, I'm not
yet convinced it's the right way - I'm not shutting the door.

First of all, there's a fundamental arbitrariness to the numbers, as I
see it.  I know Mike meant his proposal to be a launching off point,
so I don't want to start a whole debate about the math, but there's
something intrinsically confusing about the fact that "user has
visited this page in the past" is worth the same number of points as
"SSLv1" and that using a local HOSTS file + visiting this page in the
past is worth the same as a non-AES/3DES cipher suite.  What's odd
isn't the numbers, or even the equivalencies, it's that these
comparisons are just category-errors-all-over-the-place.  Even if it
were meaningful to compare choice-of-cipher-suite to history data
point for point, I don't think we can have any real confidence that
those ratios are fixed from user to user.

I also don't know how users are supposed to make decisions based on
this kind of thing.  Yes, the rec says that they should be able to see
the equation, but if the idea is to have an at a glance indicator, how
does it help them make better trust decisions?  Should they shop at a
65?  Should they only give their SSN to a 90?  The numbers are opaque
and in some cases totally misleading.  A site you've been to before,
using proper SSL is basically fine, will have different numbers for
different people depending on things like whether they've bookmarked
it.

Furthermore, this kind of scoring system is brittle in the face of a
changing security landscape.  Let's imagine we do this, and even that
browsers standardize on its use.  If a hole is found in DNSSEC next
year, or AES, we'll have to adjust the scores.  Maybe that's
containable.  What if some new technology is introduced?  A broadly
distributed social web-site-trustworthiness service, or for that
matter Google's anti-phishing/anti-malware lists.  Do we include them?
 If so, we create the possibility of scores greater than 100.  If we
tweak everything down to accommodate the new data instead, then we
implicitly downgrade the existing numbers, causing user confusion
about whether their bank has become less secure.

There are more objections I think I could offer, but this note is
already getting long, and I don't want to be a critic without anything
better to offer, so let me try to explain what I'd prefer.  We're
talking about security *context* here, about a set of cues to help
users situate themselves better online, and make better decisions as a
result.  Stores in the real world don't have scores tacked to the
outside very often, and even when they do (e.g. the health check
green/yellow/red on restaurants in many cities) they aren't your only
cue.  You make your decisions based on your own internal weighting of
the various indicators you have to work with (how full is the
restaurant, how clean does it look, have you been there before, etc).
My argument would be that the web equivalent of that is to NOT combine
multiple indicators, but rather to employ each individually as
appropriate.  Let people do the thing they do exceptionally well,
reasoning intuitively based on numerous inputs, and focus on getting
those inputs to be as meaningful and atomic as possible.  Tell people
which sites they're visiting with an identity indicator.  Use a
ritualized chrome supported login process.  Introduce robustness
countermeasures to prevent chrome spoofing.

Aggregating the various signals into a number/symbol/letter/colour
isn't creating context, it's lossy, it misses the opportunity to put
that context out there for the user.  And if there are pieces of
context information that we argue can't be put in front of users (e.g.
algorithm selection for SSL) then my argument would be that they
aren't enabling better trust decisions anyhow.

Cheers,

J

---
Johnathan Nightingale
Human Shield
johnath@mozilla.com
Received on Tuesday, 19 June 2007 16:29:03 UTC