Re: Page Security Score proposal

AFIK there was no usability testing of Spoofguard.  There are certainly
studies that explore how people interpret indicators that represent several
collapsed dimensions (e.g. in the information visualization literature).
If I dig up any that are relevant, I'll send them to the list.

Rachna

On 6/19/07, Mary Ellen Zurko <Mary_Ellen_Zurko@notesdev.ibm.com> wrote:
>
>
> Thanks. At a glance I didn't see anything about usability testing. Was
> there any?
>
>           Mez
>
>
>
>
> *"Rachna Dhamija" <rachna.public@gmail.com>*
> Sent by: public-wsc-wg-request@w3.org
>
> 06/18/2007 08:09 PM
> To
> michael.mccormick@wellsfargo.comcc
> public-wsc-wg@w3.orgSubject
> Re: Page Security Score proposal
>
>
>
>
>
>
>
> Related to this topic, we should be aware of Spoofguard, an IE plugin
> that was developed a few years ago by Boneh and Mitchell's group at
> Stanford.
>
> It analyzes web pages and collapses several heuristics into one
> indicator (a green/yellow/red traffic light).  Users can set the
> weights for each heuristic and some threshold where a warning message
> is displayed.  The heuristics include page visit history, image cache
> history, the number of unencrypted password fields, if the user
> arrived to the page by clicking on an email link and some checks on
> the domain name and URLs in the page.
>
> http://crypto.stanford.edu/SpoofGuard/
>
> I'll add it to the shared bookmarks.
>
> Rachna
>
>
> On Jun 18, 2007, at 3:43 PM, <michael.mccormick@wellsfargo.com> wrote:
>
> Johnathan,
>
> There is admittedly some arbitrariness to the weights I used in my
> scoring formula, but I think if you play with it you'll start to see
> the aggregate scores move up and down in a more or less reasonable
> way, especially considering this was only a straw man formula designed
> to enable discussion.  (Without an actual example formula I was
> concerned this proposal would be too abstract for people to fully
> understand.)
>
> Your point about brittleness is well taken.  I agree the scoring
> formula will have to adapt occasionally to changing technologies as
> new security indicators become available, etc.
>
> I sympathize with your preference to keep all the SCIs separate rather
> than aggregating them to a single gauge.  I'm not proposing the
> individual secondary SCIs shouldn't be available to IT security savvy
> users like you & me.  But I don't accept the premise that ordinary
> users can make sense of them (much less sound risk decisions).
>
> A detailed "hi fi" SCI is obviously superior for those who have the
> training & expertise to use it.  But for everyone else a simple "lo
> fi" SCI is better than none at all.
>
> So the fundamental questions seem to be:
> - Are the many security & risk context indicators we've identified
> (PageSecurityInfo) usable in raw form by typical web users?
> - If not, should agents attempt to distill them down to something
> simple and intuitive -- even if it's low fidelity?
>
> My own answers would be No and Yes, respectively.
>
> Thanks for your thoughtful comments.  Mike
>
> From: Johnathan Nightingale [mailto:johnath@mozilla.com]
> Sent: Friday, June 15, 2007 10:40 AM
> To: Mary Ellen Zurko
> Cc: McCormick, Mike; public-wsc-wg@w3.org
> Subject: Re: Page Security Score proposal
>
> On 15-Jun-07, at 10:16 AM, Mary Ellen Zurko wrote:
>
> I believe we're likely to achieve concensus that there should be some
> primary SCI display (there are accessibility and device
> size/characteristics to be accounted for orthogonally, as well as the
> multicultural aspect raised by Bruno/ANEC; I assume those and do not
> explicitly address them here). To the extent there is a primary SCI
> display, it will have to have some sort of levels or gradations
> (on/off, 3 levels as in "what is a secure page", 4 levels as this
> proposal suggests, 99 levels/gradations as this proposal also
> suggests). No one seems to be proposing something with no levels as a
> primary SCI (that is currently relegated to secondary SCI in PageInfo,
> and rightly so in my opinion). We discussed the issue of medium/high
> risk situations that are pure display (no input) during one of the
> lightening discussions I led, and there seemed to be concensus that
> there would be pure display use cases of medium/high risk data, which
> also points towards concensus around a primary SCI display. Now would
> be the time for any participant to indicate that we did not have
> concensus on the need for recommendations around a primary display of
> SCI which reflects some level or gradation of security that is meant
> to be usable for trust decisions.
>
> So, as a meta point, it seems wrong to me to assume silence on the
> wire vis-a-vis email discussion of a proposal constitutes consensus.
> I don't think that's what you were doing here Mez, because as you
> mentioned, some of this has been discussed in lightning discussion
> (though I think not one I was around for - my bad) but I just wanted
> to throw it out there.  I know that when running a workgroup like
> this, and going through periods of frustrating silence, declaring
> consensus can be a good way of stirring people to action, but I would
> think that for Pass/Fail on individual recommendations, it might get
> us into a situation where people withdraw because things made it
> quietly into the recs that they don't believe in.  I think the
> approach you began on the last call, where we dive into a specific
> recommendation for more detail, and presumably where that culminates
> in people discussing whether it should be in the recs Yes or No, is a
> better way to go.  </meta>
>
> This is me indicating that we do not have consensus on the need for
> recommendations around a primary display of SCI which reflects some
> level of gradation of security that is meant to be usable for trust
> decisions.  :)
>
> During lightning discussions, I obviously didn't want to throw up a
> bunch of stop energy, but I think that trying to aggregate a
> multi-dimensional space of indicators into a single
> number/letter/colour is (with apologies to Yngve and Opera's
> multi-level padlock) the wrong way to go.  Or at very least, I'm not
> yet convinced it's the right way - I'm not shutting the door.
>
> First of all, there's a fundamental arbitrariness to the numbers, as I
> see it.  I know Mike meant his proposal to be a launching off point,
> so I don't want to start a whole debate about the math, but there's
> something intrinsically confusing about the fact that "user has
> visited this page in the past" is worth the same number of points as
> "SSLv1" and that using a local HOSTS file + visiting this page in the
> past is worth the same as a non-AES/3DES cipher suite.  What's odd
> isn't the numbers, or even the equivalencies, it's that these
> comparisons are just category-errors-all-over-the-place.  Even if it
> were meaningful to compare choice-of-cipher-suite to history data
> point for point, I don't think we can have any real confidence that
> those ratios are fixed from user to user.
>
> I also don't know how users are supposed to make decisions based on
> this kind of thing.  Yes, the rec says that they should be able to see
> the equation, but if the idea is to have an at a glance indicator, how
> does it help them make better trust decisions?  Should they shop at a
> 65?  Should they only give their SSN to a 90?  The numbers are opaque
> and in some cases totally misleading.  A site you've been to before,
> using proper SSL is basically fine, will have different numbers for
> different people depending on things like whether they've bookmarked
> it.
>
> Furthermore, this kind of scoring system is brittle in the face of a
> changing security landscape.  Let's imagine we do this, and even that
> browsers standardize on its use.  If a hole is found in DNSSEC next
> year, or AES, we'll have to adjust the scores.  Maybe that's
> containable.  What if some new technology is introduced?  A broadly
> distributed social web-site-trustworthiness service, or for that
> matter Google's anti-phishing/anti-malware lists.  Do we include them?
> If so, we create the possibility of scores greater than 100.  If we
> tweak everything down to accommodate the new data instead, then we
> implicitly downgrade the existing numbers, causing user confusion
> about whether their bank has become less secure.
>
> There are more objections I think I could offer, but this note is
> already getting long, and I don't want to be a critic without anything
> better to offer, so let me try to explain what I'd prefer.  We're
> talking about security *context* here, about a set of cues to help
> users situate themselves better online, and make better decisions as a
> result.  Stores in the real world don't have scores tacked to the
> outside very often, and even when they do (e.g. the health check
> green/yellow/red on restaurants in many cities) they aren't your only
> cue.  You make your decisions based on your own internal weighting of
> the various indicators you have to work with (how full is the
> restaurant, how clean does it look, have you been there before, etc).
> My argument would be that the web equivalent of that is to NOT combine
> multiple indicators, but rather to employ each individually as
> appropriate.  Let people do the thing they do exceptionally well,
> reasoning intuitively based on numerous inputs, and focus on getting
> those inputs to be as meaningful and atomic as possible.  Tell people
> which sites they're visiting with an identity indicator.  Use a
> ritualized chrome supported login process.  Introduce robustness
> countermeasures to prevent chrome spoofing.
>
> Aggregating the various signals into a number/symbol/letter/colour
> isn't creating context, it's lossy, it misses the opportunity to put
> that context out there for the user.  And if there are pieces of
> context information that we argue can't be put in front of users (e.g.
> algorithm selection for SSL) then my argument would be that they
> aren't enabling better trust decisions anyhow.
>
> Cheers,
>
> J
>
> ---
> Johnathan Nightingale
> Human Shield
> johnath@mozilla.com
>
>
>
>

Received on Wednesday, 20 June 2007 05:42:58 UTC