RE: Some studies on the visibility of EV sites. from Hallam-Baker, Phillip on 2008-03-06 (public-wsc-wg@w3.org from March 2008)

From: Hallam-Baker, Phillip <pbaker@verisign.com>
Date: Thu, 6 Mar 2008 07:18:32 -0800
To: "Serge Egelman" <egelman@cs.cmu.edu>
Cc: <public-wsc-wg@w3.org>
Message-ID: <2788466ED3E31C418E9ACC5C3166155727CAAE@mou1wnexmb09.vcorp.ad.vrsn.com>
All studies are flawed.

The shopping cart studies are studies of real world users under field
conditions. They are very large sample sizes and they demonstrate that
the users are in fact noticing the green bar and modifying their
behavior in response.

Now that is not the same as demonstrating that the users will make the
right choice when faced with an attack. But I don't know how to measure
that accurately. I don't think anyone else does either.


The question is whether these studies are more likely to illustrate the
conditions that we care about than laboratory subjects with five test
users.

The scenario I care about is the one in which we have a user who
receives a phishing bait email in their inbox, is suspicious but follows
the link. In that scenario the user is primed to be security aware by
definition.


We do have a large number of users who are suspicious when they get
these emails. The problem is that there are two possible outcomes
(caught / not caught) and the attacker can deliberately confuse the
user.

I do not expect to get perfect results. A bank told me that 15% of
customers who are suspicious enough to call the bank then go ahead and
give their details AFTER being warned that it was a scam (!)

>From a loss management point of view what we need to do here is to draw
out the state graph of the user with all the possible transitions,
assign costs and probabilities to each arc. The user getting phished is
not necessarily the highest cost outcome. Only a small percentage of
phished users results in a loss, every customer who calls the call
center represents a cost.


I think that you draw flawed conclusions from the laboratory tests. 


Take Rachma's test on a certain bank to user authentication technique
with Doug Tygar (I would prefer we do not refer to it by name as I don't
want to comment on a competitor's product). The lab tests show that the
measure is ineffective under lab conditions when the users have nothing
to lose and are primed to expect that they might be working with a
prototype that may not function correctly. We can also be pretty sure
that the banks who continue to pay to use the measure are seeing an
increase in consumer confidence.


Hypothesis 1: The user is confused about computer security

Support: Numerous, users state that they are confused, attacks are
predicated on confusion, security measures presented are confusing to
security specialists.


Hypothesis 2: A user that has become confused as a result of an
inconsistent i/f will not be any less confused after an hour or even a
day of use when presented with a consistent user i/f .

Support: In the short term the internally consistent user interface
represents something that is different to their experience and thus
merely another data point that is inconsistent with the existing data
points. We should not expect the user to be any less confused.

Support: I am still learning how to use my MacBook after two weeks and
that is generally reconed to be a more consistent than average system.


Hypothesis 3: some precentage of confused users will become less
confused if presented with a consistent interface over a prolonged
period of time.

Support: None at this time, the reasoning behing the hypothesis is that
over time a user builds a mental model of the system. If the system is
inconsistent or too complex, so is the model. The user is unable to
predict the outcome of a particular interaction and this is reported as
'confusion'. If the user has experience of a consistent interface over a
prolonged period of time the are able to build a consistent mental model
which allows them to predict the outcome of interactions. The user gains
confidence.


Note that all our existing data points are essentially consistent with
these hypotheses. We would expect lab tests over short periods of time
to result in the user being more confused and more susceptible to
attack.

Such studies can still be valid. I think that Rachma's test is valid
because the system under test is essentially a bolt-on extra. It is only
applied at one Web site, I don't expect that the user is going to be
very much better at generating a consistent mental model over time than
in the lab. There is also some pretty compelling data from phishing
attacks using some of the proposed vectors.

But even if valid, tests can still be damaging. Within a few hours of
the results being presented at a closed door meeting we started to see
live attacks using precisely the vectors described.


What I am looking for here is a way to obtain an independent test of
hypothesis 3. I suspect we are going to need a somewhat significant
quantity of money to do this and that is going to need to come from an
independent source. 


-----Original Message-----
From: Serge Egelman [mailto:egelman@cs.cmu.edu] 
Sent: Wednesday, March 05, 2008 11:54 AM
To: Hallam-Baker, Phillip
Cc: public-wsc-wg@w3.org
Subject: Re: Some studies on the visibility of EV sites.

This entire study is flawed: participants were primed for security.  Any
legitimate security study would not begin by telling participants it is
a security study, and telling them which browser features to look for!

"The simple description participants heard in this study was: "The green
address bar in Internet Explorer 7 means that this website is an
Extended Validation website. Extended Validation, or EV, means that the
website owner has gone through extra, rigorous steps with an authorized
Certificate Authority to prove they are a secure site."

Of course you're going to get favorable results when you tell them ahead
of time what to look for!  In every study where participants have *not*
been told to look for SSL indicators, they *rarely* notice them.  With
the EV indicator specifically, I had 0 of 60 participants notice it.

This study has not shown that the EV indicators are effective.  It has
shown that when some schmuck calls you on the phone and tells people
what to look for, they generally follow those instructions.  Hooray,
VeriSign just paid to confirm the Hawthorne Effect.  It's too bad this
has been known for over 50 years.

serge

Hallam-Baker, Phillip wrote:
> OK, these are vendor studies, but they are much bigger sample sizes 
> and under field conditions. The Tec-Ed study is an independent study 
> we comissioned. These are the only studies I am aware of that VeriSign

> has commissioned.
>  
>  
> I don't think that the small sample size is the real problem in lab 
> tests. Its the lab itself. I have been using computers for 25 years, I

> used a Mac every day at MIT. It has taken me over two weeks to get 
> used to my MacBook Air and I am still finding things out now.
>  
> Nielsen's usability tests seem to me to be exactly right if your 
> objective is to design something in order to sell it. I was in the 
> Apple store for a total of about 30 minutes. I did not intend to buy 
> that particular model going in (I was going to buy a more expensive 
> model but they didn't have it in stock - thankfully).
>  
> But what matters for stopping Internet crime is the long term user 
> interaction.
>  
>  
> ___http://www.verisign.com/static/040655.pdf_
> <https://webmail.verisign.com/exchweb/bin/redir.asp?URL=http://www.ver
> isign.com/static/040655.pdf>
> 
> January 2007, Tec-Ed researched usage and attitudes of 384 online 
> shoppers
> 
>           o Measured their responses to Web sites with and without
green
>             bars
>                       # 100% of participants notice whether a site
shows
>                         the green EV bar
>                       # 93% of participants prefer to shop on sites
that
>                         show the green bar
>                       # 97% are likely to share their credit card
>                         information on sites with the green EV bar, as
>                         opposed to only 63% with non-EV sites
>                       # 77% of participants report that they would
>                         hesitate to shop at a site that previously
>                         showed the green EV bar and no longer does so
> 
> *DebtHelp: 11% increase in transactions* 
> ___http://www.verisign.com/Resources/success-stories/SSL_and_VeriSign_
> Secured_Seal/debthelp.html___ 
> <https://webmail.verisign.com/exchweb/bin/redir.asp?URL=http://www.ver
> isign.com/Resources/success-stories/SSL_and_VeriSign_Secured_Seal/debt
> help.html>__
> 
> 
> *Overstock: 8.6% decrease in abandoned shopping cart rate* 
> ___http://www.verisign.com/Resources/success-stories/SSL_and_VeriSign_
> Secured_Seal/overstock.html___ 
> <https://webmail.verisign.com/exchweb/bin/redir.asp?URL=http://www.ver
> isign.com/Resources/success-stories/SSL_and_VeriSign_Secured_Seal/over
> stock.html>__
> 
> 
> *Scribendi: 27% increase in transactions* 
> ___http://www.verisign.com/Resources/success-stories/SSL_and_VeriSign_
> Secured_Seal/scribendi.html___ 
> <https://webmail.verisign.com/exchweb/bin/redir.asp?URL=http://www.ver
> isign.com/Resources/success-stories/SSL_and_VeriSign_Secured_Seal/scri
> bendi.html>__
> 
> 

--
/*
PhD Candidate
Carnegie Mellon University

"Whoever said there's no such thing as a free lunch was never a grad
student."

All views contained in this message, either expressed or implied, are
the views of my employer, and not my own.
*/
Received on Thursday, 6 March 2008 15:19:00 UTC