RE: ACTION-335 logotypes and ISSUE-96 discussion from Hallam-Baker, Phillip on 2007-11-14 (public-wsc-wg@w3.org from November 2007)

From: Hallam-Baker, Phillip <pbaker@verisign.com>
Date: Wed, 14 Nov 2007 06:51:42 -0800
To: "Ian Fette" <ifette@google.com>, "Serge Egelman" <egelman@cs.cmu.edu>
Cc: "Dan Schutzer" <dan.schutzer@fstc.org>, "W3C WSC Public" <public-wsc-wg@w3.org>
Message-ID: <2788466ED3E31C418E9ACC5C316615570E5460@mou1wnexmb09.vcorp.ad.vrsn.com>
Precisely, the only tests I really care about is what the behavior of the user would be in response to a phishing email attack or when they are visiting a site they want to establish a trust relationship with.

I simply do not believe that any of the scenarios presented to date for lab tests are realistic and I do not believe that any lab test can be realistic.  

Did the Harvard study demonstrate that Sitekey is completely ineffective or did it demonstrate that users are tollerant of apparent software failures in a lab scenario where they understand that there is not the slightest possibility of Harvard allowing them to personally lose money?


I know that nobody looks at the padlock icon, I didn't need a study to tell me that. But Overstock certainly would not be seeing an 8.6% decrease in cart abandonment rates if nobody was noticing the indicata.

Equally I know that merchants are very keen to put siteseal on their Web Sites. They would not do that if they did not see any change in resposne rates.

Perhaps what we should be measuring here is whether customers take note of the existing indicata that are not displayed in secure chrome in order to determine whether they are noticed. If consumers do notice Siteseal and clones then I would argue that we can find a way to train users to notice Siteseal and letterhead in secure chrome.

> -----Original Message-----
> From: Ian Fette [mailto:ifette@google.com] 
> Sent: Tuesday, November 13, 2007 7:22 PM
> To: Serge Egelman
> Cc: Dan Schutzer; Hallam-Baker, Phillip; W3C WSC Public
> Subject: Re: ACTION-335 logotypes and ISSUE-96 discussion
> 
> Ok, I think I see where you're coming from. I agree that if 
> something doesn't work under the best of circumstances then 
> we have a problem, however I haven't convinced myself that we 
> will be testing under the best of circumstances. The best of 
> circumstances to me implies that the user is already familiar 
> with their environment, and a short test in which many things 
> are new to them doesn't necessarily meet that criteria. I do 
> generally believe that the effect of something new (and 
> distracting, as opposed to subtle/unobtrusive) would probably 
> be most pronounced in the short test, I'm not sure how that 
> applies in the general case though, much less the case of the 
> EV indicators.
> 
> Basically agree with the rest of your post, and you're right, 
> we do need a concrete proposal before we can start really 
> getting into this, I just wanted to raise the issue about 
> adoption rates (and also length of study) as one that we 
> should consider, and I think I have done so.
> 
> On Nov 13, 2007 3:39 PM, Serge Egelman <egelman@cs.cmu.edu> wrote:
> >
> >
> > Ian Fette wrote:
> > > Disagree.
> > >
> > > While I don't agree with all of Phil's points, there was 
> one that I 
> > > definitely agree with that Serge seems to have glossed over. That 
> > > would be the point about whether you're testing the user over a 
> > > half-hour in a lab, or a longer (30+ day) field-study in 
> their natural environment.
> > > Phil's point was that anything new and disruptive is 
> likely to show 
> > > a strong effect in the short-term, but over the long-term 
> the effect 
> > > may be drastically different (including causing people to 
> stop using 
> > > the product). This is a very good point, and I think that if 
> > > possible we should aim to do a longer field-study as 
> opposed to a 30m in-lab study.
> > >
> >
> > This is something that can be examined later.  There are 
> pros and cons 
> > for both types of study designs.  But then again, if your only 
> > argument is that the effect is stronger in the lab study, 
> I'm not sure 
> > how that's a problem.  If the users can be easily fooled in 
> the best 
> > case scenario (or they simply don't notice/trust the 
> indicators), then 
> > it's likely this effect will only be stronger in the wild.  
> Of course 
> > playing arm chair quarterback here isn't going to do much good.  We 
> > need an actual study design before we can critique it.  To 
> do that we 
> > need to first figure out which questions we want the study 
> to answer.
> >
> > > As for "testing them in a perfect world" - I have no idea 
> why this 
> > > is a good experiment to run, because we know that we will 
> never be 
> > > operating in a perfect world. I'm not saying we should test in a 
> > > world with zero adoption, but rather I'm saying that we 
> should try 
> > > to figure out (guess) what /reasonable/ adoption is, and test in 
> > > that world. We already know that there are some sites 
> that are not 
> > > adopting EV because of the cost model. I'm sure someone is more 
> > > knowledgeable about the specifics than I, but my understanding is 
> > > that, for instance, Google could not buy one EV certificate for 
> > > google.com <http://google.com> and use it across all of 
> our numerous 
> > > servers, rather we would have to pay some increased
> > > (large) fee based on number of servers. (Also, does EV support 
> > > wildcard certs?). Given that, you can come up with a list of 
> > > companies for which EV would be very expensive and likely not 
> > > adopted (eBay?), and test with the assumption that those 
> sites won't 
> > > adopt. What does that do to the overall model?
> >
> > This is kind of a no-brainer.  The system will not work 
> when there's 
> > very low adoption.  Users will go to a website, see no EV 
> certificate 
> > indicator, and just assume that the site never had one.  That's 
> > because this is the norm.  Thus, if we assume everyone 
> adopts EV, we 
> > can test the rate of success in the best case.  If the 
> system is not 
> > successful in the best case scenario, then we know it won't be 
> > successful in conditions worse than that.  Likewise, if we test the 
> > system under "reasonable" circumstances (forgetting for a 
> second that 
> > "reasonable" is completely subjective and we're unlikely to 
> agree on 
> > that definition) and it's a failure or success, the validity of the 
> > study comes into question because someone will invariably 
> ask, "well, 
> > what happens when everyone adopts EV?"
> >
> > If we're going to agree that a minority of sites will adopt EV, I'm 
> > not convinced we should be making recommendations about it 
> to begin with.
> > Users will be required to remember which websites are 
> EV-enabled and 
> > which ones aren't, and that's a completely ridiculous assumption to 
> > make.  Even if (again, best case scenario) they can make this 
> > distinction, no doubt they'll still interact with websites that are 
> > not EV-enabled.  These websites will remain targets for attack.
> >
> > >
> > > Finally, I'm extremely concerned about the attitude of "Well, it 
> > > works in lab studies, so let's mandate it, vendors be damned." I 
> > > understand the desire not to be seen as being beholden to the 
> > > desires of browser manufacturers, but on the other hand, I have a 
> > > very real desire not to be seen as floating around in la-la land, 
> > > disconnected from reality. If something is going to cause 
> people not 
> > > to adopt a product, a vendor is not going to implement it, 
> > > regardless of any mandates from W3C. There is a very real risk of 
> > > steering ourselves towards irrelevancy. Without getting into too 
> > > many politics, that's why WHATWG was formed, and provides 
> a good bit 
> > > of background for the current HTML5 /realpolitik/. I 
> don't want to see us go the way of XForms 2.
> >
> > Who said if it works in lab studies we'll automatically 
> recommend it?  
> > I see the lab studies as a way for weeding things out.  If they 
> > perform well in lab studies, obviously we'll need to 
> conduct further 
> > tests.  Of course, this doesn't work if only three of us 
> are doing the 
> > lab studies on our own time...
> >
> > serge
> >
> > >
> > > My $0.02 x 3 (== £0.03)
> > >
> > > On Nov 13, 2007 8:51 AM, Dan Schutzer < dan.schutzer@fstc.org
> >
> > > <mailto:dan.schutzer@fstc.org>> wrote:
> > >
> > >     agreed
> > >
> > >     -----Original Message-----
> > >     From: public-wsc-wg-request@w3.org
> > >     <mailto:public-wsc-wg-request@w3.org>
> > >     [mailto:public-wsc-wg-request@w3.org
> > >     <mailto:public-wsc-wg-request@w3.org>] On
> > >     Behalf Of Serge Egelman
> > >     Sent: Tuesday, November 13, 2007 11:23 AM
> > >     To: Hallam-Baker, Phillip
> > >     Cc: Ian Fette; W3C WSC Public
> > >     Subject: Re: ACTION-335 logotypes and ISSUE-96 discussion
> > >
> > >
> > >     This is irrelevant for our purposes.  If we test them 
> and find that in a
> > >     perfect world they don't work, then this is moot.  If 
> we test them and
> > >     find that they're effective, then we make a 
> recommendation, and it's out
> > >     of our hands.  At that point the application vendors aren't in
> > >     compliance.
> > >
> > >     serge
> > >
> > >     Hallam-Baker, Phillip wrote:
> > >      > I have never had the slightest difficulty selling 
> the idea of
> > >     logotypes
> > >      > to customers. The problem is purely on the 
> application side. The
> > >     logos
> > >      > have no value unless they are displayed.
> > >      >
> > >      > So we risk a chicken and egg situation where the 
> application side
> > >     people
> > >      > refuse to do anything about implementation until 
> they are assured
> > >     that
> > >      > there will be 100% adoption by the site owners 
> which is not going to
> > >      > happen until there are applications to present the logos.
> > >      >
> > >      > Someone has to make the first move, we cannot gate 
> the scope of
> > >     what we
> > >      > will consider by requiring an assurance of total 
> adoption by any
> > >     market
> > >      > participant.
> > >      >
> > >      >
> > >     
> > > 
> --------------------------------------------------------------------
> > > ----
> > >
> > >      > *From:* public-wsc-wg-request@w3.org
> > >     <mailto:public-wsc-wg-request@w3.org> on behalf of Ian Fette
> >
> > >      > *Sent:* Fri 09/11/2007 4:49 PM
> > >      > *To:* W3C WSC Public
> > >      > *Subject:* ACTION-335 logotypes and ISSUE-96 discussion
> > >      >
> > >      > This action (ACTION-335) was to provide discussion 
> topics for
> > >     ISSUE-96.
> > >      > I only really have one point, and I will try to 
> state it more clearly
> > >      > than at the meeting.
> > >      >
> > >      > To me, the effectiveness of any of the logotype 
> proposals (or the EV
> > >      > proposals, for that matter) depends greatly upon 
> the adoption of
> > >     these
> > >      > technologies by sites. We can do really cool 
> flashy things when
> > >     we get
> > >      > an EV cert, or an EV-cert with a logo, but right 
> now the only two
> > >     sites
> > >      > I can find using an EV cert are PayPal and 
> VeriSign. Therefore, I
> > >     wonder
> > >      > how habituated people would become in practice, if 
> they never (or
> > >      > rarely) saw the EV/logotype interface stuff in use.
> > >      >
> > >      > My proposal is that any usability testing of the 
> EV and/or logotype
> > >      > things in the spec not only reflect how users 
> would behave in a land
> > >      > where everyone is using EV-certs and life is 
> happy, but rather
> > >     also test
> > >      > a more realistic case. That is, look at what the 
> adoption is
> > >     presently
> > >      > and/or what we can reasonably expect it to be at 
> time of last
> > >     call, and
> > >      > do usability testing in an environment that reflects that
> > >     adoption rate
> > >      > - i.e. some percentage of sites using EV certs, 
> some percentage also
> > >      > using logos, and another percentage still using 
> "normal" SSL
> > >     certs. My
> > >      > worry is that we may be thinking "EV certs will 
> solve X,Y, and
> > >     Z", but
> > >      > that may only be the case if users are used to 
> seeing them on the
> > >      > majority of sites, and should that not end up 
> being the case, we
> > >     need to
> > >      > look at the usability and benefit in that scenario as well.
> > >      >
> > >      > I think this is what the ACTION wanted, i.e. for 
> me to state this
> > >     point
> > >      > more explicitly. I am going to therefore assume 
> that my work on this
> > >      > action is complete, unless I hear otherwise.
> > >      >
> > >      > -Ian
> > >
> > >     --
> > >     /*
> > >     PhD Candidate
> > >     Vice President for External Affairs, Graduate Student Assembly
> > >     Carnegie Mellon University
> > >
> > >     Legislative Concerns Chair
> > >     National Association of Graduate-Professional Students
> > >     */
> > >
> > >
> > >
> >
> > --
> >
> > /*
> > PhD Candidate
> > Vice President for External Affairs, Graduate Student Assembly 
> > Carnegie Mellon University
> >
> > Legislative Concerns Chair
> > National Association of Graduate-Professional Students */
> >
>
Received on Wednesday, 14 November 2007 14:56:26 UTC