Re: Page Security Score proposal from Johnathan Nightingale on 2007-06-15 (public-wsc-wg@w3.org from June 2007)

From: Johnathan Nightingale <johnath@mozilla.com>
Date: Fri, 15 Jun 2007 11:39:31 -0400
To: "Mary Ellen Zurko" <Mary_Ellen_Zurko@notesdev.ibm.com>
Cc: <michael.mccormick@wellsfargo.com>, public-wsc-wg@w3.org
Message-Id: <6AB2B801-33B1-4F94-8DA4-3D5A5ABDAFCD@mozilla.com>
On 15-Jun-07, at 10:16 AM, Mary Ellen Zurko wrote:

> I believe we're likely to achieve concensus that there should be  
> some primary SCI display (there are accessibility and device size/ 
> characteristics to be accounted for orthogonally, as well as the  
> multicultural aspect raised by Bruno/ANEC; I assume those and do  
> not explicitly address them here). To the extent there is a primary  
> SCI display, it will have to have some sort of levels or gradations  
> (on/off, 3 levels as in "what is a secure page", 4 levels as this  
> proposal suggests, 99 levels/gradations as this proposal also  
> suggests). No one seems to be proposing something with no levels as  
> a primary SCI (that is currently relegated to secondary SCI in  
> PageInfo, and rightly so in my opinion). We discussed the issue of  
> medium/high risk situations that are pure display (no input) during  
> one of the lightening discussions I led, and there seemed to be  
> concensus that there would be pure display use cases of medium/high  
> risk data, which also points towards concensus around a primary SCI  
> display. Now would be the time for any participant to indicate that  
> we did not have concensus on the need for recommendations around a  
> primary display of SCI which reflects some level or gradation of  
> security that is meant to be usable for trust decisions.

So, as a meta point, it seems wrong to me to assume silence on the  
wire vis-a-vis email discussion of a proposal constitutes consensus.   
I don't think that's what you were doing here Mez, because as you  
mentioned, some of this has been discussed in lightning discussion  
(though I think not one I was around for - my bad) but I just wanted  
to throw it out there.  I know that when running a workgroup like  
this, and going through periods of frustrating silence, declaring  
consensus can be a good way of stirring people to action, but I would  
think that for Pass/Fail on individual recommendations, it might get  
us into a situation where people withdraw because things made it  
quietly into the recs that they don't believe in.  I think the  
approach you began on the last call, where we dive into a specific  
recommendation for more detail, and presumably where that culminates  
in people discussing whether it should be in the recs Yes or No, is a  
better way to go.  </meta>

This is me indicating that we do not have consensus on the need for  
recommendations around a primary display of SCI which reflects some  
level of gradation of security that is meant to be usable for trust  
decisions.  :)

During lightning discussions, I obviously didn't want to throw up a  
bunch of stop energy, but I think that trying to aggregate a multi- 
dimensional space of indicators into a single number/letter/colour is  
(with apologies to Yngve and Opera's multi-level padlock) the wrong  
way to go.  Or at very least, I'm not yet convinced it's the right  
way - I'm not shutting the door.

First of all, there's a fundamental arbitrariness to the numbers, as  
I see it.  I know Mike meant his proposal to be a launching off  
point, so I don't want to start a whole debate about the math, but  
there's something intrinsically confusing about the fact that "user  
has visited this page in the past" is worth the same number of points  
as "SSLv1" and that using a local HOSTS file + visiting this page in  
the past is worth the same as a non-AES/3DES cipher suite.  What's  
odd isn't the numbers, or even the equivalencies, it's that these  
comparisons are just category-errors-all-over-the-place.  Even if it  
were meaningful to compare choice-of-cipher-suite to history data  
point for point, I don't think we can have any real confidence that  
those ratios are fixed from user to user.

I also don't know how users are supposed to make decisions based on  
this kind of thing.  Yes, the rec says that they should be able to  
see the equation, but if the idea is to have an at a glance  
indicator, how does it help them make better trust decisions?  Should  
they shop at a 65?  Should they only give their SSN to a 90?  The  
numbers are opaque and in some cases totally misleading.  A site  
you've been to before, using proper SSL is basically fine, will have  
different numbers for different people depending on things like  
whether they've bookmarked it.

Furthermore, this kind of scoring system is brittle in the face of a  
changing security landscape.  Let's imagine we do this, and even that  
browsers standardize on its use.  If a hole is found in DNSSEC next  
year, or AES, we'll have to adjust the scores.  Maybe that's  
containable.  What if some new technology is introduced?  A broadly  
distributed social web-site-trustworthiness service, or for that  
matter Google's anti-phishing/anti-malware lists.  Do we include  
them?  If so, we create the possibility of scores greater than 100.   
If we tweak everything down to accommodate the new data instead, then  
we implicitly downgrade the existing numbers, causing user confusion   
about whether their bank has become less secure.

There are more objections I think I could offer, but this note is  
already getting long, and I don't want to be a critic without  
anything better to offer, so let me try to explain what I'd prefer.   
We're talking about security *context* here, about a set of cues to  
help users situate themselves better online, and make better  
decisions as a result.  Stores in the real world don't have scores  
tacked to the outside very often, and even when they do (e.g. the  
health check green/yellow/red on restaurants in many cities) they  
aren't your only cue.  You make your decisions based on your own  
internal weighting of the various indicators you have to work with  
(how full is the restaurant, how clean does it look, have you been  
there before, etc).  My argument would be that the web equivalent of  
that is to NOT combine multiple indicators, but rather to employ each  
individually as appropriate.  Let people do the thing they do  
exceptionally well, reasoning intuitively based on numerous inputs,  
and focus on getting those inputs to be as meaningful and atomic as  
possible.  Tell people which sites they're visiting with an identity  
indicator.  Use a ritualized chrome supported login process.   
Introduce robustness countermeasures to prevent chrome spoofing.

Aggregating the various signals into a number/symbol/letter/colour  
isn't creating context, it's lossy, it misses the opportunity to put  
that context out there for the user.  And if there are pieces of  
context information that we argue can't be put in front of users  
(e.g. algorithm selection for SSL) then my argument would be that  
they aren't enabling better trust decisions anyhow.

Cheers,

J

---
Johnathan Nightingale
Human Shield
johnath@mozilla.com
Received on Friday, 15 June 2007 15:39:48 UTC