- From: Thomas Roessler <tlr@w3.org>
- Date: Tue, 6 Jun 2006 17:48:56 +0200
- To: david.orchard@bea.com
- Cc: www-tag@w3.org
Hi David, a couple of remarks about the state finding; I'm referring to the 2006-04-19 version at: http://www.w3.org/2001/tag/doc/state-20060419 First, let me note that this is important and good work. Congratulations to it. I think the document would benefit from a slightly more systematic exposition that clearly explains what kinds of players you assume: A client, a server, some caches, the application (e.g., "bank account"). Each of these can have state, and can share state with any of the others. Currently, I think, the document talks too much about "the system", without being clear about which of the components above are meant. Likewise, the document often talks about "security" without drilling down into details. This is unfortunate: Security essentially means a guarantee that things behave as you want them to behave (and an analysis of what happens if someone player you don't trust makes use of their freedom and behaves maliciously). The important thing, then, is to spell out what you *really* want to achieve, and how a certain design decision would affect this particular goal. It is also useful to cleary differentiate between identification (who is this), authentication (prove that it's them) and authorization (are they allowed to do this?). For instance, an authorization decision may be based on the authorized party's identity, and on the fact that this identity is authenticated properly. (Note the state transitions here!) I'll come back to this later. Going through the document... 3.2 quotes a definition for a stateless server, and says "this definition, while true, ..." -- that doesn't make sense, since a definition doesn't hold a truth value. 4.1, second paragraph: I gather that "such as session ids" is meant to clarify "'identifying information'"; still, that could be phrased clearer. Also, it might be worth pointing out that session IDs mean that client state consists of a reference to server state that is sent along with every request; the "real" state information is kept on the server. You claim that cookies are easier to handle than URI rewrites, without giving much empirical evidence. 4.1, third paragraph: I don't understand how this relates to the topic of the finding. Also, what's the conclusion from your observation about QNames and URIs? (I don't see any relevant one.) 4.2 could use a bunch of clarifications. It may be easier to structure this section by saying what goals you want to achieve, how these can be achieved, what kinds of attackers may want to subvert applications, and how that can happen. The discussion of storing state in cookies doesn't really benefit from the username/password example: If a cookie is used to store credentials, then the application can evaluate these credentials for every single request. This approach is (basically) equivalent to HTTP Basic Authentication. It seems like what you want to discuss is the approach of storing authorization state (i.e., "this user has previously presented credentials, and he's authorized to delete the main database") on the client side, and the trust implications. (I.e., either the client must be trusted [don't do that], or some cryptography is needed to keep them from tampering with state, replaying state from another session, etc.) "One feature of session IDs is the relative difficulty of guessing a valid session ID" is wrong in this generality -- the point here is really that session IDs *can* be large random numbers (or other hard-to-reproduce stuff), making it difficult for an attacker to generate one. "which parts of the application can be signed or encrypted" -- it's not about parts of the application, but about data exchanged by the parties to a transaction. Your description of SSL reads as if it wasn't covering the HTTP header (and, in particular, the URI that is requested); that could use some clarification. The argument about putting user names and passwords into URIs doesn't work, since the visibility of URIs and the other content of an HTTP request is basically the same. (Putting user names and passwords into URIs is actually a bad idea because the resource that is being accessed is mixed with the credentials needed to access it.) I'm a bit unhappy about the Kiosk PC argument. Doing anything sensitive at all using these machines is a bad idea; this applies to *all* information that passes through them. I do realize, though, that it's still worth reducing the exposure by not persistently storing confidential information on the client. 4.3, performance & scalability: Logarithmic growth is slower than linear growth, and hence preferable. Also, a "linear curve" doesn't "turn logarithmic". What you probably want to say is that typical response times grow linear for small loads, and for a nice system grow logarithmically for large loads. (Then again, this particular discussion isn't referred to later, so it may be worth just dropping it.) 4.4, reliability: "We have made a simplifying, but erroneous assumption that systems where the client has the state are 'stateless'." It may be a good idea to rewrite this section without making this assumption, and simply talk about components that hold or don't hold state, and their respective properties. This is one instantiation of my earlier comment about "the system." Section 7 uses "resource" in a way that seems inconsistent with web architecture. 7.1, you mention "security concerns" without specifying, and then go on to discuss encrypting parts of URIs. This could use a bit more discussion of what you really want to achieve. (E.g., resource identifiers that don't let you derive the offline account number that they are connected with? In this case, you could as well have a simple mapping table, etc.) You say "it may be easier to populate and parse the data from someplace other than the URI, such as FORM POST or cookie data." This strikes me as speculative ("may"), and a function of the development environment in use. The rest of this section seems to somehow suggest that cool and persistent URIs aren't really worth it, and things are much easier when all the navigation is put into cookies and POST. May I suggest an explicit cost-benefit analysis here? 7.2 describes what web architecture calls a URI collision. Once again, a proper cost-benefit analysis might be in order. "This has security advantages as the information can be encrypted" strikes me as problematic -- the advantages are over what other approach? The "URI for every resource" one? If so, how can't confidential information be encrypted with that approach? Also, I sense that there is some confusion here about the identity of the resource that is being accessed ("a's account") vs. the identity of the requestor ("a" -- or maybe "b", if "b" is authorized to access "a's account"). 8.1: "and they store the state using HTTP cookies or using URL rewriting". It is worth being very clear about what state is meant here, and about the possible (lack of) difference when comparing the cookie-and-form based approaches to HTTP Authentication, seen from a state management perspective. "The security timing out" probably means that you don't want the client to persistently store credentials. Maybe say that. 8.2, below Example 5: The "obvious security reasons" are probably worth being spelled out. The real point here is once again that you'll want to distinguish between the resource that is being accessed and the accessor's credentials, that you don't want to have to manage credentials along with the identifier, etc. Also, "encryption" doesn't prevent tampering or guessing by itself; it also doesn't prevent replay attacks. Below Example 6: Why is having the session id in the URI a "security downside"? Once again, it would be more productive to talk about what the actual effects are, than just invoke "security." In the context of using session IDs, the next part is also confusing: If I can pass on a URI that references all session state (including whether or not I was logged in), how is this "confusing and inefficient at worst", as opposed to "passing on all the privileges I have"? The "may be difficult" text further on is, once again, speculative. The ultimate conclusion -- that application and session state maybe should be kept separate -- is a good one. 8.3, the server doesn't really have control over where the client stores cookies. The differences between session and persistent cookies are really about being intended per session and persistent, not so much about being stored in memory or on disk. Example 7, incidentally, doesn't really store session state (such as, "I've authorized that guy to do X"), but credentials, so it's a copy of HTTP Basic Authentication using cookies. Section 9 starts: "The previous examples showed how browser based technologies support ..." -- it may be more accurate to characterize this as traditional HTTP or some such, since the basic paradigms discussed aren't really specific to browsers. Section 9.2, above example 12: This is a discussion about whether or not people should use HTTP facilities. I wonder if that's appropriate in the context of this finding -- I doubt it. If, however, this is found appropriate, I think a deeper cost-benefit analysis might be in order. The current text reads a bit like "the benefit of not actually using HTTP is that you don't have to actually use HTTP." Example 13 is characterized as a "good middle ground". That sounds like a conclusion from a somewhat deeper analysis. It might be worth writing up that analysis. Section 9.3 seems to be mostly about WS-Addressing vs URIs. This strikes me as off-topic for the scope of the suggested finding. I hope these are helpful. Regards, -- Thomas Roessler, W3C <tlr@w3.org>
Received on Tuesday, 6 June 2006 15:49:10 UTC