Please clarify security aspects of state finding from Thomas Roessler on 2006-06-06 (www-tag@w3.org from June 2006)

From: Thomas Roessler <tlr@w3.org>
Date: Tue, 6 Jun 2006 17:48:56 +0200
To: david.orchard@bea.com
Cc: www-tag@w3.org
Message-ID: <20060606154856.GK20104@lavazza.does-not-exist.org>
Hi David,

a couple of remarks about the state finding; I'm referring to
the 2006-04-19 version at:

  http://www.w3.org/2001/tag/doc/state-20060419

First, let me note that this is important and good work.
Congratulations to it.

I think the document would benefit from a slightly more
systematic exposition that clearly explains what kinds of
players you assume: A client, a server, some caches, the
application (e.g., "bank account").  Each of these can have
state, and can share state with any of the others.  Currently,
I think, the document talks too much about "the system",
without being clear about which of the components above are
meant.

Likewise, the document often talks about "security" without
drilling down into details.  This is unfortunate: Security
essentially means a guarantee that things behave as you want
them to behave (and an analysis of what happens if someone
player you don't trust makes use of their freedom and behaves
maliciously).  The important thing, then, is to spell out what
you *really* want to achieve, and how a certain design decision
would affect this particular goal. 

It is also useful to cleary differentiate between
identification (who is this), authentication (prove that it's
them) and authorization (are they allowed to do this?).  For
instance, an authorization decision may be based on the
authorized party's identity, and on the fact that this identity
is authenticated properly. (Note the state transitions here!)
I'll come back to this later.

Going through the document...

3.2 quotes a definition for a stateless server, and says "this
definition, while true, ..." -- that doesn't make sense, since
a definition doesn't hold a truth value.

4.1, second paragraph: I gather that "such as session ids" is
meant to clarify "'identifying information'"; still, that could
be phrased clearer.  Also, it might be worth pointing out that
session IDs mean that client state consists of a reference to
server state that is sent along with every request; the "real"
state information is kept on the server.

You claim that cookies are easier to handle than URI rewrites,
without giving much empirical evidence.

4.1, third paragraph: I don't understand how this relates to
the topic of the finding.  Also, what's the conclusion from
your observation about QNames and URIs?  (I don't see any
relevant one.)

4.2 could use a bunch of clarifications.

It may be easier to structure this section by saying what goals
you want to achieve, how these can be achieved, what kinds of
attackers may want to subvert applications, and how that can
happen.

The discussion of storing state in cookies doesn't really
benefit from the username/password example: If a cookie is used
to store credentials, then the application can evaluate these
credentials for every single request.  This approach is
(basically) equivalent to HTTP Basic Authentication.

It seems like what you want to discuss is the approach of
storing authorization state (i.e., "this user has previously
presented credentials, and he's authorized to delete the main
database") on the client side, and the trust implications.
(I.e., either the client must be trusted [don't do that], or
some cryptography is needed to keep them from tampering with
state, replaying state from another session, etc.)

"One feature of session IDs is the relative difficulty of
guessing a valid session ID" is wrong in this generality -- the
point here is really that session IDs *can* be large random
numbers (or other hard-to-reproduce stuff), making it difficult
for an attacker to generate one.

"which parts of the application can be signed or encrypted" --
it's not about parts of the application, but about data
exchanged by the parties to a transaction.  Your description of
SSL reads as if it wasn't covering the HTTP header (and, in
particular, the URI that is requested); that could use some
clarification.

The argument about putting user names and passwords into URIs
doesn't work, since the visibility of URIs and the other
content of an HTTP request is basically the same.  (Putting
user names and passwords into URIs is actually a bad idea
because the resource that is being accessed is mixed with the
credentials needed to access it.)

I'm a bit unhappy about the Kiosk PC argument.  Doing anything
sensitive at all using these machines is a bad idea; this
applies to *all* information that passes through them.  I do
realize, though, that it's still worth reducing the exposure by
not persistently storing confidential information on the client.

4.3, performance & scalability: Logarithmic growth is slower
than linear growth, and hence preferable.  Also, a "linear
curve" doesn't "turn logarithmic".  What you probably want to
say is that typical response times grow linear for small loads,
and for a nice system grow logarithmically for large loads.

(Then again, this particular discussion isn't referred to
later, so it may be worth just dropping it.)

4.4, reliability: "We have made a simplifying, but erroneous
assumption that systems where the client has the state are
'stateless'." It may be a good idea to rewrite this section
without making this assumption, and simply talk about
components that hold or don't hold state, and their respective
properties.  This is one instantiation of my earlier comment
about "the system."

Section 7 uses "resource" in a way that seems inconsistent with
web architecture.

7.1, you mention "security concerns" without specifying, and
then go on to discuss encrypting parts of URIs.  This could use
a bit more discussion of what you really want to achieve.
(E.g., resource identifiers that don't let you derive the
offline account number that they are connected with?  In this
case, you could as well have a simple mapping table, etc.)

You say "it may be easier to populate and parse the data from
someplace other than the URI, such as FORM POST or cookie
data." This strikes me as speculative ("may"), and a function
of the development environment in use.

The rest of this section seems to somehow suggest that cool and
persistent URIs aren't really worth it, and things are much
easier when all the navigation is put into cookies and POST.
May I suggest an explicit cost-benefit analysis here?

7.2 describes what web architecture calls a URI collision. Once
again, a proper cost-benefit analysis might be in order. "This
has security advantages as the information can be encrypted"
strikes me as problematic -- the advantages are over what other
approach?  The "URI for every resource" one?  If so, how can't
confidential information be encrypted with that approach?

Also, I sense that there is some confusion here about the
identity of the resource that is being accessed ("a's account")
vs. the identity of the requestor ("a" -- or maybe "b", if "b"
is authorized to access "a's account").

8.1: "and they store the state using HTTP cookies or using URL
rewriting".  It is worth being very clear about what state is
meant here, and about the possible (lack of) difference when
comparing the cookie-and-form based approaches to HTTP
Authentication, seen from a state management perspective.  "The
security timing out" probably means that you don't want the
client to persistently store credentials.  Maybe say that.

8.2, below Example 5: The "obvious security reasons" are
probably worth being spelled out.  The real point here is once
again that you'll want to distinguish between the resource that
is being accessed and the accessor's credentials, that you
don't want to have to manage credentials along with the
identifier, etc.

Also, "encryption" doesn't prevent tampering or guessing by
itself; it also doesn't prevent replay attacks.

Below Example 6: Why is having the session id in the URI a
"security downside"?  Once again, it would be more productive
to talk about what the actual effects are, than just invoke
"security." In the context of using session IDs, the next part
is also confusing: If I can pass on a URI that references all
session state (including whether or not I was logged in), how
is this "confusing and inefficient at worst", as opposed to
"passing on all the privileges I have"?  The "may be difficult"
text further on is, once again, speculative.

The ultimate conclusion -- that application and session state
maybe should be kept separate -- is a good one.

8.3, the server doesn't really have control over where the
client stores cookies.  The differences between session and
persistent cookies are really about being intended per session
and persistent, not so much about being stored in memory or on
disk.

Example 7, incidentally, doesn't really store session state
(such as, "I've authorized that guy to do X"), but credentials,
so it's a copy of HTTP Basic Authentication using cookies.

Section 9 starts: "The previous examples showed how browser
based technologies support ..." -- it may be more accurate to
characterize this as traditional HTTP or some such, since the
basic paradigms discussed aren't really specific to browsers.

Section 9.2, above example 12: This is a discussion about
whether or not people should use HTTP facilities.  I wonder if
that's appropriate in the context of this finding -- I doubt
it.  If, however, this is found appropriate, I think a deeper
cost-benefit analysis might be in order.  The current text
reads a bit like "the benefit of not actually using HTTP is
that you don't have to actually use HTTP."

Example 13 is characterized as a "good middle ground".  That
sounds like a conclusion from a somewhat deeper analysis.  It
might be worth writing up that analysis.

Section 9.3 seems to be mostly about WS-Addressing vs URIs.
This strikes me as off-topic for the scope of the suggested
finding.

I hope these are helpful.

Regards,
-- 
Thomas Roessler, W3C   <tlr@w3.org>
Received on Tuesday, 6 June 2006 15:49:10 UTC