Re: Stopping (https) phishing from Henry Story on 2018-07-12 (public-web-security@w3.org from July 2018)

From: Henry Story <henry.story@bblfish.net>
Date: Fri, 13 Jul 2018 01:45:34 +0200
To: ryan@sleevi.com
Cc: Dave Crocker <dcrocker@gmail.com>, public-web-security@w3.org
Message-Id: <66658B27-C40B-435C-B3C2-59B5264FCBF7@bblfish.net>
> On 12 Jul 2018, at 21:04, Ryan Sleevi <ryan@sleevi.com> wrote:
> 
> 
> 
> On Thu, Jul 12, 2018 at 1:09 PM Henry Story <henry.story@bblfish.net> wrote:
> 
> 
>> On 12 Jul 2018, at 19:32, Ryan Sleevi <ryan@sleevi.com> wrote:
>> 
>> 
>> 
>> On Thu, Jul 12, 2018 at 12:06 PM Henry Story <henry.story@bblfish.net> wrote:
>> 
>> 
>>> On 12 Jul 2018, at 15:34, Dave Crocker <dcrocker@gmail.com> wrote:
>>> 
>>> On 7/12/2018 5:19 AM, Henry Story wrote:
>>>>   I have recently written up a proposal on how to stop (https) Phishing,
>>> 
>>>   http://craphound.com/spamsolutions.txt
>>> 
>>> originally written for email, but it applies here, too.
>> 
>> :D 
>> 
>> But, not really: The architectural difference between the web and e-mail are very
>> big. Furthermore the problems looked at are completely different: that questionnaire 
>> is for spam, and this is a proposal against phishing.
>> 
>> These problems are more similar than different, and what Dave linked to is just as applicable. They share complex social and political issues, and technologists that ignore that are no doubt likely to be ignored.
> 
> Except that I am not ignoring them.
> The subtitle of the post "a complete socio/technical answer" points at that.
> 
> Sure, but a subtitle without actual substance doesn’t make for a solution. I encourage you to read through that template again, and reconsider whether or not the problems are really as completely different as you suggest. While I’m aware you suggest you’re just kicking tires, the design itself is neither new nor novel, and every suggested implementation approach has suffered from those flaws mentioned - including this.

Well I'd be happy to look at this if you write a version that adapts those 
questions to the web. I am sure this group would find it very helpful too.

> 
>> Then the type of solution I provide is very unlikely to have ever been
>> thought of pre-web, given the type of technologies involved. Also I have
>> spoken to people from Symantec and presented this at the cybersecurity
>> Southampton reading group, and so it has had some initial tyre 
>> kicking already.
>> 
>> Given the lack of familiarity with Gutmann’s work, which in many ways has served as a basic reader into the PKI space, I would be careful about speculating about what ideas may or may not have been considered.
> 
> I actually refer to Gutmans 2014 book in the answer on UI here
> https://medium.com/@bblfish/response-to-remarks-on-phishing-article-c59d018324fe#1a75
> 
> Yes, while also acknowledging you devised your solution without having been familiar with those who went before and have thoroughly shown the issues with it. It would definitely be useful to read more about it, as well as the background literature around PKI user design to understand why this is not just a matter of no one having thought of it, but rather having been fairly discredited.

That happens a lot. In mathematics there are many discoveries with double-barreled names,  
because mathematicians, programmers, or type theorists ended up making the exact same discovery, without ever having read each other's work. When that happens, it is considered a significant breakthrough. 

However, there are other interesting cases. For example the category of sets and functions 
between them is the exact dual of the category finite Boolean algebras, so that the 
following equation holds


See:
https://math.stackexchange.com/questions/980933/what-is-the-opposite-category-of-set

So people working in one space (logic) and those working in functional programming are
quite likely to be discovering the mirror images of what the other is discovering without
recognising (until it is pointed out) the existence of the other side.

Still, in more empirical disciplines like the one under discussion here someone can end up following a conversation by living it. I picked up a lot of the elements of the conversation by being part of them since 1993. 

However, I agree this is a complex field full of snakes and subtleties, mirages, big egos, paranoid
souls, fake news, false problems, hidden answers (sometimes hidden for all to see as illustrated
by Edgar Allan Poe's Purloined Letters),  ... there is much room for misunderstandings.

One can cut through a lot of the details with a mathematical perspective on things,
as I will try to show. 

>> Similarly, given that Symantec has left the PKI business after a series of failures, it’s unclear if you’re speaking of the current entity or the former.
> 
> I just said I have kicked the tires a bit, not that it has gone through a full review. 
> The Spamsolution questionnaire would make sense as a first mail to send someone who
> had not thought about the problem at all, as an incentive to get them to kick the tires.
> Dave Crocker does not know me, nor if  I did some initial work on the topic,
> so I am ok that he sent it out. It's quite funny actually.
> 
>> 
>> The question of an interrelated distributed set of links - and of authority for different name spaces - is not new or original in this space. It’s true whether you look at the Web of Trust or when you consider the PKI’s support for mesh overlays with expressions of degrees of reciprocality of trust.
>> 
>> Similar, the suggestion of recognizing the different “organs”, as you suggest, for different degrees of validation is also not new. You can see this from the beginning of the X.509 discussions, in which ITU-T would maintain a common naming directory from which you could further express these links in a lightweight, distributed directory access protocol.
> 
> Well there has been a lot of progress in decentralised data since ldap which is a pre-web 
> technology that dates back to the 1980ies. But you are right a good thesis would need to 
> look at what the advances in this field of linked data have been and why things could not 
> be done then that can be done now.
> 
>> 
>> There is, underneath it all, a flawed premise resting on the idea that X.509 is even relevant to this, but that criticism could easily occupy its own voluminous email detailing the ways in which certificates are not the solution here. The basic premise - that what we need is more information to present to users - is itself critically and irredeemably flawed.
> 
> X509 is not necessary at all to my argument. It is relevant only insofar as X509 certificates are currently deployed in the browser. You could move to RFC 6698 (DNS-based Authentication of Named Entities (DANE))  and you would have the same problem, which is that all you would know
> about a server is that you have connected to it. Ie you just know this:
> 
> 
> <CompaniesHouse-urlbar-reference.gif>
> 
> 
> But what people in the real world need to know is the full web of relations 
> that site has to legal institutions they can go to if they want to lodge a
> complaint.
> 
> This is a flawed premise though. Certainly, if you’re talking to CAs wishing to sell EV certificates, they would be inclined to tell you that, but this is something that’s been fairly plumbed in the spam side.
What I am proposing is not something that EV Certificate authorities can offer.  It is quite simple to explain why this is so.

A certificate is an algebraic construct: it is a syntactic structure that is signed. 
A Signed Document to use ordinary language. Certificates can't change, and EV certs
are expensive.  (500-2000 dollars a year) So you don't want to change them every
day. As a result:
1) you try to put the longest lasting information in there that you can
2) as it is expensive to verify information (especially if you are not the source of it), very little information is placed in the cert
3)  because CA's can certify domains the world over, the weakest CA can bring everyone down, as Dan Kaminsky has argued forcefully in many talks.

Imagine you wanted a very rich and hence humanly interesting certificate, that contained not just the
location of your company headquarters, but the name and homepage of the owners, what their
stock market value was, the legal problems they have, what is said about them in the news, where
their local stores were etc.... 

This would require 
1) an extensible certificate format, which X509 based as it is on ASN.1 cannot provide. 
You'd need something built on RDF  which is what the Verifiable Claims Working group is working on.
  https://www.w3.org/2017/vc/WG/
2) Even then the more info you placed in the verifiable claim, the more often it would need to be updated, and the more expensive the update would be, as it would make the CA liable for all 
the claims in it.

So it does not make sense to put much information into the X509 Certificate or the verifiable claim. (Which is not to say that Verifiable claims will not be very useful)

However, if instead of going through an intermediary CA one went directly to the legal institutions responsible for collecting taxes, verifying companies, that had the legal infrastructure to arrest people, take them to court as well as to defend them, then the cost of maintaining this information would be exactly in the right location. The institutions designed to do this already exist, and you already pay for them through your taxes. 

What I am proposing is to allow companies that so desire to tie their domain to such a 
legally empowered registry such as  companieshouse.gov.uk in a way that does not require there to be a central registry for everything but allows browsers to follow a chain of trust to the trust anchor chosen by the user. The domain would link to the resource in
the linked registry that describes the company. This information is a web resource.
 A web resource is co-algebraic: it lives in the exact opposite category of algebras. 
Coalgebras describe processes, infinite lists, streams, etc... URIs refer to resources and resources can change. They can return different representations (which are algebraic structures) 
at different points in time. Once you realise that the web is a coalgebraic system, you can see that you need not tie yourself to certificates for every part of your architecture.
You can use the web too.

> There’s a whole host of problems with this - ranging from the Web’s distributed nature (one bad JS can compromise you),

You take into account where all the resources came from when building a UI of the reliability of a page. That is what is done at present, with the minimal information available. However, I agree there would be extra research to be done for applications
that need to fetch data across different websites. That is undoubtedly an extra UX challenge.
However, you cannot start off by saying that it is an impossible one to solve (unless you have a 
mathematical proof of that)

> it’s ephemerality (are you going to record every certificate you encountered in a transaction),

You put ephemeral information on the web into resources that can change, with a short time to live on the representations. Certificates are pretty small and long lasting in comparison, and quite easy
to store. 

> it’s jurisdictional independence (what happens if Tonga says phishing isn’t a problem?

Then your government cuts the link to Tonga, and browsers that go to those websites no longer find the meta information about the companies there. Alternatively, if a full trade war is not something
your country wants, a method could be devised for it to add a warning to any websites linked to a Tonga registry to make you aware of the problem, and to let you know that if you enter into a transaction with them, you will not be covered by the diplomatic power of your state.

> You’re either trying to find political solutions or technical, and both are deeply flawed in this).

Neither technical nor political solutions alone can work, as this is a socio/technical problem.

We are proposing to build on the institutions that exist, and that whether you like it or not, 
you are dependent on (for education, law, military, etc...). We may not like the fact that the USA has the most significant military in the world (bigger than the military expenditure of the next 17 nations), but that is the way things are. This money is used to sponsor computer companies by
creating colossal military based markets in which those companies can grow to sizes that give them very
substantial competitive advantages. This is then cleverly disguised in a free market ideology that tries to hide this political reality to gain political advantage. 

> While you suggest it’s not dependent on X.509, the entire proposal is devoted to discussing this and its relationship to EV. If your goal is merely to provide a federated identity clearing house, well, that’s been explored to - in the context of eIDAS or things like Microsoft’s Passport/Identity Cards.

For eIDAS I doubt they have tied this into an institutional web of trust as I describe. I am sure that this has not been implemented in the browsers. My proposal is one that is completely web friendly using only W3C and IETF standards. It may be able to tie into and enrich eIDAS. I'll look into that.

As for Microsoft Passport one Wikipedia says:

"Microsoft Passport Network, and Windows Live ID) is a single sign-on web service developed and provided by Microsoft that allows users to log into websites (like Outlook.com), devices (e.g. Windows 10 computers and tablets, Windows Phones, or Xbox consoles), and applications (including Visual Studio) using one account."

The problem of Passport was that it was centralised and was aimed at only Microsoft 
computers, as I understood. The proposal we have would be Operating System and browser 
agnostic. As I said it only relies on major existing W3C and IETF standards such as HTTP, 
TLS, DNS(-SEC), JSON-LD (and other Semantic Web stack components) built on top of those.

> 
> And this does nothing to solve phishing, compare to real technical solutions (like U2F/FIDO or password managers).

A password manager does not stop people being mislead into renewing their password, and 
typing it into the wrong box. 

FIDO gives a client a one time crypto key per web site, but that does not tell you if 
you are on a real or a fake web site when you first get there. FIDO is about client 
authentication. The blog post I wrote on Phishing was about server authentication! There is 
a big difference there. 

If I get a mail about a great french  chocolate shop from a friend of mine, and 
I reach their web site, I would like to know that the shop is indeed in France, and not in Australia, South America or China. If I hear of a Swiss watch maker on a forum, I 
would like to be able to verify in a click that the site I was on was a registered 
swiss artisan. If I go to a news site called WashingtonToday.com I would like to know 
that this is a US news organization, how old it is, how it is financed etc...
That would allow me to know that I am not on a fake news web site that is making up stories
to make money with Google ads (or whatever the next scam is).

> Is all of this information to be presented to the user before entering any data? If so, it might just surpass Gimp as the least usable software ever. 

No, the official information would I suppose only be shown when dealing with 
a web site that the user has never encountered before (or that has not been visited 
in a while, or where some key regulatory information has changed, such as ownership). 
It could show that while the web site is loading. 

We don't know what the best way to present such information is yet, since this has
never been tried. Security UI designers have been forced to work with an amazing
poverty of information, limiting what they could do. This would be a completely new
world. You don't know any more than I do in this respect. 

In any case it is not because Gimp has a (relatively) bad interface that there are not 
other drawing programs that have excellent ones. 

Once the user knows that the information goes beyond a simple address and could 
be useful and interesting it will be a lot more likely that (s)he is happy to 
go look at it. The user will understand the reasons for looking this information up.
My argument was that currently this (the company headquarters static address) 
is nearly incomprehensible for most people.

> Is it to be recorded for the years until a user for sure isn’t going to lodge a complaint? If so, it’s a privacy disaster.

Company ownership data can be kept around for as long as legally required. This is
current practice. Most companies want to be publicly known which is why so many spend
so much on advertising. 

I think you are confusing client and server authentication here. These are two very
different things. And it is not because some web sites want to be public that
others cannot be anonymous. I love to go skiing of piste. But I know that I have
to be more careful when I do.

> 
> It further tries to enmesh the notion of legal identity as a precursor to a domain or website. While the copyright lobby has been trying to do that for decades - hence the odd ICANN policies - there are legitimate social and technical reasons that anonymity is to be valued. And if the premise is that identity defeats phishing, then either you can’t have anonymity, or you need to thoroughly penalize it, yet both are problematic.

Many web sites do not care about anonymity at all. Indeed for most state organs, companies
and other public entities this is a non issue. It's only a cypherpunk that would think
that all communication has to be anonymous.  

We currently live in a space where in essence all sites are for all practical purposes
anonymous. Ie. there is little to tell us if we are on a news site or a fake one, if
we are on a company in France, Germany, Russia, Ukraine, or elsewhere. The proposal is
to enrich the web with that information, so that we can know where to turn to for legal
redress. 

It may be a bit of a phantasy for Google and its investors to imagine that they could
play all these roles, but I think there are some serious political limits to how far one
can go there, because of how societies function. You need schools to work, local police,
universities, small businesses, legal systems and much more for human societies to work.
No centralised agency like Google can take over all these functions without corrupting the
system. The founders of Google understood that: the Google PageRank algorithm is built on
using human intelligence as a base and enhancing it. If people stop putting good information
up then the Page Rank also suffers. I am just proposing continuing along those lines by
integrating political institutions into the web too, in a way that is respectful of state
sovereignty - and so by extension of individual ones too.

> 
> It may be that you’ve considered these things, but given the design’s similarities to past designs that failed to do so, it’s unclear if they’re being ignored, they were unfamiliar, or if there’s somehow a missing piece that isn’t documented.

I have never seen any browser anywhere close to what I am proposing here. MS Passport from 
what I can tell tried in a centralised way. Something much richer can be done in a decentralised
open standards way that is respectful of national sovereignty.

> 
> It may be helpful to work backwards - work from a statement of the problem and what the desired end result is for how users or “systems” will solve it, before trying to design those intermediate steps. Working through those real world cases will reveal a host of issues - like those mentioned in Dave’s Earlier message - that are far more difficult to solve than just hand waving distributed federation as a solution.

There are an infinite number of applications of this. 

Here is one that goes beyond browsing and into verifiable claims territory:

Imagine I am traveling in Japan and I have an accident, I would like to show someone my UK Verifiable
insurance claim. This is going to be signed by a UK health authority. How would the Japanese authority know that the signatory of my claim was one that had the capability to make such a claim?

The institutional web of trust is missing in the current web. If you open your eyes, you too will
see this. Like Edgar Allan Poe's Purloined Letter, this secret has always been in full view.

Henry

> 
> 
> 
> 
> <WebOfNations response.png>
> 
>> 
>> 
>> Philosophically the answer presented is very different too. You can see that with 
>> the first line of that "questionnaire"
>> 
>>    Your post advocates a
>>    ( ) technical ( ) legislative ( ) market-based ( ) vigilante
>>    approach to fighting spam.
>> 
>> The approach  here is none of those: it is organological [1], in the sense that it is 
>> thinking of the problem from an approach that takes the body politic (the organs of the state), 
>> law, the individual  and technology into account as forming a whole that co-individuates itself. 
>> So to start it does not fit first choice box...
>> 
>> But you don't need to understand that philosophy to understand the proposal. You just
>> have to be open to new possibilities. I
>> 
>> Henry
>> http://bblfish.net/
>> 
>> [1] There was a conference on this here for example. 
>>  http://criticallegalthinking.com/2014/09/19/general-organology-co-individuation-minds-bodies-social-organisations-techne/
>> 
>>> 
>>> And fwiw, for any UX issue, there is no certitude in the absence of very specific testing.
>> 
>> Yes of course. I do go more carefully into the problem with the https UX here
>> 
>> https://medium.com/@bblfish/response-to-remarks-on-phishing-article-c59d018324fe#1a75
>> 
>> I argue there with pictures to go along, that the problem is that there is not enough information
>> in X509 certificates for it to make sense to users. Even in EV certs. What is needed is live
>> information. 
>> 
>>> 
>>> 
>>> d/
>>> -- 
>>> Dave Crocker
>>> Brandenburg InternetWorking
>>> bbiw.net
>
Attachments

text/html attachment: stored
image/jpeg attachment: 26195668_10159835669900154_2278216665902113816_n.jpg
Received on Thursday, 12 July 2018 23:46:07 UTC