W3C home > Mailing lists > Public > public-html-comments@w3.org > August 2012

Re: Securing Password Inputs

From: Jason H <scorp1us@yahoo.com>
Date: Fri, 31 Aug 2012 12:51:10 -0700 (PDT)
Message-ID: <1346442670.55447.YahooMailNeo@web120703.mail.ne1.yahoo.com>
To: Cameron Jones <cmhjones@gmail.com>
Cc: Seth Call <sethcall@gmail.com>, "Thomas A. Fine" <fine@head.cfa.harvard.edu>, "public-html-comments@w3.org" <public-html-comments@w3.org>
1. "just generate new rainbow tables with each username based on the top passwords and have a relatively high success rate."
Well, I understand your point, but I take issue with "just" and "high success rate". It is true that hashing common passwords is always going to be easy, so password security is never going to be without the aspect of "choose something not common". If we look at the linkin breach, we see that the top 30 passwords accounted for about 4000 users out of 6.5 million. The 30th most common password was shared with 24 individuals. In a short amount of time, 160,000 passwords were recovered (2%) by hashing weak passwords, eventually 5% were recovered. These were from UNSALTED SHA1 hashes. (http://www.zdnet.com/blog/btl/linkedin-password-breach-how-to-tell-if-youre-affected/79412) The presence of unsalted SHA1 hash databases made finding the uncommon passwords easier.

2. "A phishing site would not use a password field, send via a form or even use HTML5." Which is why we should warn the user that 1) It's not using a password field

2) using a form
3) and not using HTML5
because those are phishing site fingerprints.

3. A hashed password is good for storage as both you and I have shown. But this is made better if it is double-hashed and salted.

In fact, I was not ready to talk about it, but a minor incremental improvement to the algorithm is if we repeatedly hash it some unknown number of times. Then what is stored server side is one of those hash iterations. The double-hash is the most important,  but repeatedly hashing it then storing it. This does not increase they keyspace of the attack - if a complete keyspace were known, it'd be trivial. However it does change one thing slightly. The attacker does not know how many times he'll have to un-hash the password. That is, if he successfully implements an unhash function (be it dictionary lookups or whatever) the result is another hash. He has a choice now -  move on to another user or continually unhash the hashes until he gets to something like a password. (The funny part here is, if someone uses a hash as their password, he would loop right past it. I find that immensely humorous. Anyway...)  We can easily hash 100-times. We don't even need
 to know exactly how many iterations we need, just that we'll eventually come across it. Example: we send my double hash to the server, the server stores 98 iterations of this hash. I login, it starts with the double hash and continues to hash until it finds it. It of course matches in the 98th iteration, which is good because the 100th iteration is the cut off for a wrong password. No two accounts need to use the same number of iterations either.


An in the case that computing power increases, we can change the average iteration and max iteration accordingly, WITHOUT regard to the existing user passwords. Say avg=90 and max=100 , in three years, we can up that to 250 and 300, and the old user entries work, they just get matched quicker.  This introduction of an unknown hash iteration really complicates things. True, common passwords will still be discovered but they can be made "X" times slower where X is the hash iteration average. We can change X at any time, and tune X to the hardware available to both the application and the hackers. If say, we have an average of 86400 user logins per day, and an average response time of 1 second, and we want to be really secure, then we can use one of those GPU accelerated hash algorithms that does 525 million hashes a second. We then know, that an attacker will spend 1 second per password attempt to get the right password if it a common one.  A simple
 wordlist runs about 50k entries. Let's  assume a complex word list round that up to 86400. So the hacker will spend all day going over a dictionary for one user assuming he knows to try 525million hashes. That's a significant cost. And if he doesn't know the exact number and falls short, stopping at 500 million hashes, he will never ever find it. If he chooses 600million hashes then he is penalized 75 million hashes for every wrong guess. And he'll be making a wrong guess a very substantial part of the time. This is the ultimate mathematical control we can wield over attackers. We can be even so lazy to maintain that variable that our software adds 1000 hashes a day, so it just remains secure over time.

Once the infrastructure is installed to handle the double-hash, we only need to ever change X. We could skip the double-hash and install just the X logic if we could trust everyone, but we can't. 

One more idea I was playing with along with X lines, was what if the client had its own X as well? Suppose I drop the double-hash, and went X-hash as well. We'd get the benefits of X both server and client side. But there is a complication in that my phone, my laptops (home, work etc) and any borrowed computer would have to know what X to send from the client side. While some browsers are now capable of that, I am not ready to recommend anything more than a double hash at this time. Let me call the client X-hash iteration "W". That is W+X the total iteration count for the stored hash. Now, W must be less than the invalid password iteration threshold. We then have only one rule: W<=Z. (Where Z is max iteration count - the invalid password cut-off) We can come up with all kinds of ways to determine W. But what this does is some amount of processing is done on the client, and then some more processing is done on the server to match the credential. Mobile
 clients can choose a small W, (min=2, max=100?) laptops and desktops can be much larger. The server must be the biggest of them all, but never smaller than the largest client value. People much smarter than myself can make these determinations, however for example purposes, if we assume a function of months since Jan 1 1970, e.g. months_since^2, and a slightly faster function for the server, (months_since^2.01)+(months_since*GPU) we can be assured that W<=Z. Incidentally for these functions today, we would be adding 1025 client hash iterations a month to a base of 262,144, the server would have 16,874 hash iterations to add to the client's. I added a term for those who want to add some more hashes if you use a GPU Anyway, returning from that example, we have substantially stepped up hashing troubles for the hackers.Hashes observed in transit must be matched by hashing to the client maximum, or found first. As the function grows exponentially they have
 to add hardware exponentially to match it. Now the low hanging fruit becomes the oldest insecure passwords. Did I mention under guessing is completely penalized? :-) . The cool thing that also comes out is as the Z is increased, sites can bump-hash the oldest passwords, to keep them from getting too old. That is, take the hash, and add 100,000 iterations to it monthly. Now you have a user database that can always be up to-the-day secure.


... <skipping some>


6. I wish that we could have bank-level security everywhere, for free. I think it is a mistake to assume that banks are the most valuable and other sites not. Values change over time. My facebook account and twitter accounts are relatively worthless when they started. They are somewhat meaningful now. We can't assume our values won't change, therefore, I want to give everything the best security for the cheapest cost.


Thank you for the dialog and have a great weekend. I'll be back on Monday.







________________________________
 From: Cameron Jones <cmhjones@gmail.com>
To: Jason H <scorp1us@yahoo.com> 
Cc: Seth Call <sethcall@gmail.com>; Thomas A. Fine <fine@head.cfa.harvard.edu>; "public-html-comments@w3.org" <public-html-comments@w3.org> 
Sent: Friday, August 31, 2012 1:13 PM
Subject: Re: Securing Password Inputs
 
On Fri, Aug 31, 2012 at 3:49 PM, Jason H <scorp1us@yahoo.com> wrote:
> They might be cagey, but they are completely absent in implementation in the
> storage routines of user credentials for most sites.
>

You're attempting to paint everyone with the same stick. Those same
security folks are the ones who came up with the cryptographic
techniques and SSL your attempting to lean on.


On Fri, Aug 31, 2012 at 4:31 PM, Jason H <scorp1us@yahoo.com> wrote:
> 1. I don't think you understand how rainbow tables works. As I've shown a
> salt defeats rainbow tables, meaning you have to rute force it. Even if you
> know you're going to attack the account of alice on domain.com, you have to
> start decades earlier to discover any usable portion of the hash table for
> that domain/user. The odds are astronomical low that you'll get a hash hit.
> 15 * 10^62, being hashed (I'm using bitcoin mining numbers here) at 525
> million hashes/second comes out to be 1 * 10^46 years for the hash table to
> be complete. If you double hash it, your odds are
> 1/10,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000 that in a
> year, you'll find a password. I like those odds. Even with computing
> advances, even if we divide that by 10^23, I still like those odds.
>

A salt does not "defeat" rainbow tables, it just renders the current
ones obsolete. Rainbow tables are an algorithmic cache, the values
have just already been computed.

In this case, the attacker is not attempting to target specific
individuals, they are attempting to collect username\password
combinations for use on other services because people typically reuse
the same passwords across many sites.

If the salting algorithm is known the attacker can just generate new
rainbow tables with each username based on the top passwords and have
a relatively high success rate.

If a site you use is hacked the best thing to do is change your
password, nothing will stop that. Before you even get to that point,
don't use predictable passwords and your security is up in the top
percentile.

> 2. The browser has a variety of ways to detect non-compliance, however
> phishers are crafty, and I don't feel I anyone can fully anticipate the
> changes. The simplest for me, would be to JS-override the keystrokes so they
> are captured and submitted seperately. The current use of this is for
> password strength checking. But the double-hash renders that unnecessary. So
> we could disallow JS reading or writing of these inputs. And again, the DTD
> will be V5 so we know when to apply these semantics.
>

A phishing site would not use a password field, send via a form or
even use HTML5.

> 3. I am not trying to push these on sites, I am trying to push this in the
> browser. The sites will follow. This technique was designed to require
> minimal changes to the sites as possible, with maximal gain. It is not
> NEGATIVE. it is substantially better than what we have now. It's beter for
> the companies and for the users.
>
> It's pretty cool that even with access to the system, that they cannot gain
> access to my password.
>

The concept of having a non-readable password is good and that's why
people already hash them for storage.

The "upgrade path" you suggest is based on having original and
client-hashed passwords on file, so where is the benefit?

HTML5 pages served to HTML4 clients will not pre-hash the password and
send it plaintext as they currently do, so this won't just magically
disappear.

Having to change their authentication system for a company which
already salts passwords is not something which i expect them to
embrace.

You can't force good security practice, if people leave windows and
doors open then that is their prerogative and your choice as a
consumer as to if you want to liaise with such an organization.

> 4. A Nonce is used only once, but it's value is changed periodically,
> generally every time it is used. Based on the math above, I a am comfortable
> with changing the none every 100 years.

It is changed on every use, that's the point of it.

>
> 5. And the HTTP auth proposal doesn't require more changes? It in fact
> requires *substantial* changes to applications, authentication knowledge,
> and servers. That's moving a mountain. Adoption will be glacial.
>
> SSL already prevents the hash from being snooped, and requires no
> application changes.
>
> I charge that the digest authentication is more work than the application
> level changes, and the MitM attacks are moot because we already have SSL.
> Only applications that both 1) send a  new password and 2) wanting to
> support the double hash need to modify the login page to v5 DTD, and then
> tweak the routing to expect the double-hash of their password. Application
> that just supply a "reset" link work unmodified.
>

Well, digest isn't going to do what you want anyway as it's only for
protecting transfer. At the end of the pipe the password is translated
back to plaintext, just with the knowledge that nobody read it,
changed it or replayed it along the way.


On Fri, Aug 31, 2012 at 4:39 PM, Jason H <scorp1us@yahoo.com> wrote:
> In general you are right, however the security minded people are absent in
> application programming. Are these the same people who developed HTTP
> Auth:BASIC?

There's nothing wrong with BASIC, you assume all sites need bank-level
security. Simple web forums where the only risk of breech is someone
posts something impersonating you is about as bad as it gets and they
shouldn't have to deal with unnecessary overhead.

> What we're talking about here isn't JS validation or parameter sanitation,
> it is merely that whatever password inputs you get will be pre-hashed. It is
> opaque to the server and application for the most part. The only issue are
> services that supply a new password during password reset. In these
> situations, a reset link is even easier, or the application can be modified
> to accept the double-hashed version of the password.

There are a few more issues that that, as we have discussed.

Thanks,
Cameron Jones
Received on Friday, 31 August 2012 19:51:39 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 31 August 2012 19:51:39 GMT