Re: first cut usability walk through from Serge Egelman on 2007-08-03 (public-wsc-wg@w3.org from August 2007)

From: Serge Egelman <egelman@cs.cmu.edu>
Date: Fri, 03 Aug 2007 15:38:42 -0400
To: W3 Work Group <public-wsc-wg@w3.org>
Message-ID: <46B38442.80800@cs.cmu.edu>
Let's look at this from a different angle: do you disagree with the
underlying assumptions, as written?  If so, how would you reword them?

More below.

Thomas Roessler wrote:
> On 2007-08-01 13:11:50 -0400, Serge Egelman wrote:
> 
>>> - Identity Signal, Page Security Score, and the EV part of the
>>>   proposals are pretty much focused on the same topic --
>>>   passive indicators, and when to show them.  However, we have
>>>   no language in the proposals so far that would usefully tell
>>>   us what these indicators would look like.
> 
>> We don't need to know what the specific indicators look like if
>> the underlying concepts are flawed.  This is what this study
>> examines.  If users ignore the most flashy passive indicators,
>> then using any type of passive indicator is a nonstarter.
> 
> I'd beg to differ.
> 
> There are at least two different concepts around:
> 
> - Teach people that a specific symbol (color, padlock, etc) has a
>   certain meaning.
> 
> - Teach people to look in a specific place for certain information,
>   e.g., jurisdiction of a company.
> 
> In the first case, we essentially teach people about new metaphors.
> This is good if we want them to act quickly, but it needs teaching
> -- see street signals.

Are you suggesting we examine whether it's possible to teach people?
I'm not entirely sure this is a realistic expectation, depending on
what's involved.  Hypothetically, if we determine that the only way
these indicators are effective is after an hour of personal training for
each individual user, I think that would show these indicators cannot be
expected to work in the real world.

That's why indicators should be intuitive (read up on affordances).

> 
> In the second case, we teach them to find stuff expressed in a way
> that is expressed in their own language.  This is good if we want
> people to get information without much teaching, but might have
> different implications on their attention.

Again, read up on basic HCI principles.

> 
> I'm seeing these two concepts overlap in most of the proposals; yet,
> they're vastly different if you consider communication to humans.
> I'd therefore not be surprised if you'd find that, e.g., either
> all-text or all-symbol signals were ineffective, but could find some
> useful mix.
> 
> I actually wonder to what extent that kind of question has been
> studied for traffic signs (to give just one example), where the mix
> is actually different depending on countries and cultures.

Yes, there is plenty of literature on this in the warning sciences.

> 
>>>   Working on an editor's draft for what the rec track document
>>>   might look like, one question is what attributes about the
>>>   issuer and subject would actually be displayed in the
>>>   identity signal, and under what conditions.
> 
>> I'm not sure this matters for the purpose of testing.  If we're
>> just displaying identity information, we shouldn't see any
>> statistically significant results based on what type of
>> information is displayed. We're testing if users will notice
>> *any* information gleaned from the certificates and displayed
>> passively.
> 
> Are you testing whether they are *noticing* the information, or
> whether they are *acting* on that information in the desired way?
> These are different things.

Both.  Read up on the C-HIP model by Wogalter in the warning sciences.
Also read Wu's study (it's in the Shared Bookmarks).  If users do not
notice the indicators, then we cannot expect them to act on them.  So
far every type of passive indicator has been shown to fail at capturing
user attention.  For the sake of argument, this proposal assumes that
this is not the case and tests how they act.  However, in order to test
how they act, we will also be able to capture yet more data on how they
notice them.

> 
> (Think of the "the padlock is intriguing" kind of responses in some
> of the user studies.)

Which?

> 
>>> - The proposed experiment for EV doesn't actually check whether
>>>   people understand the indicator; it rather checks whether the
>>>   absence of these indicators can be used as a hook to social
>>>   engineer users into subverting the integrity of their
>>>   browser. That's a somewhat different question.
>> How is that a different question?  
> 
> The proposed experiment involved an "install a plug-in to see a
> green bar" style interaction, if I recall correctly; it therefore
> also tests malware installation paths, and creates a risk of false
> positives if, e.g., people got irritated by the plug-in part.

How is there a risk of false positives?  Can you be a bit more specific?
 I'm not sure what you're saying.

> 
> Therefore, the proposal during the call to not throw malware into
> the mix for this particular experiment, but just flash up a
> JavaScript powered idiot-box style dialogue that says something like
> this...
> 
> 	To guarantee the security and safety of your online banking
> 	experience, Bank of Suburbia's web site is protected with an
> 	Extended Validation certificate from foo-bar, the most
> 	trusted name in eCommerce.
> 	
> 	To experience our Extended Validation Enhanced online
> 	banking services, please acknowledge this Security Advisory
> 	by clicking "OK".
> 
> 	Online banking with Bank of Suburbia is secure, and
> 	protected by foo-bar's Extended Validation Guarantee.
> 
> 	[ OK ]
> 
> (Replace foo-bar by some known name, maybe remove the cable news
> pun.  And yes, I do find my energy to find an effective
> social-engineering dialogue a bit scary. ;-)

Yeah, that's lovely.  We know what users do with dialog boxes,
especially when there's just an "OK" button.  This will *greatly*
confound the experiment.  It doesn't matter if it's malware,
picture-in-picture, or some other means.  In order to test this
scenario, we need to spoof the indicator, not confound the study by
introducing something completely different.

> 
> The much broader question is of course what kinds of success
> criteria for possible material we'll come up with as far as user
> studies and the like are concerned...

Success criteria?  Statistics.  If there's no significant difference
between the status quo and this, then the proposal is fatally flawed.


serge

> 
> Regards,

-- 
/*
Serge Egelman

PhD Candidate
Vice President for External Affairs, Graduate Student Assembly
Carnegie Mellon University

Legislative Concerns Chair
National Association of Graduate-Professional Students
*/
Received on Friday, 3 August 2007 19:38:57 UTC