- From: =JeffH <Jeff.Hodges@KingsMountain.com>
- Date: Tue, 01 May 2012 17:20:26 -0700
- To: W3C WebAppSec WG <public-webappsec@w3.org>
[ this message resolves:
ACTION-51: Review CORS new sec cons language and provide editorial fixes
http://www.w3.org/2011/webappsec/track/actions/51
]
[ these comments are on <http://www.w3.org/TR/2012/WD-cors-20120403/> ]
Although the CORS spec appears to be technically solid and is implemented in
essentially all major browsers (congrats :) -- and obviously is the result of
much hard work -- it could use a thorough editorial cleanup. This reader found
it difficult to understand without having first been introduced to a fair
amount of web security folklore as well as web browser internals
colloquialisms. Given that the CORS spec is referenced by other specs, it would
seem worthwhile to smooth its rough edges for posterity.
[ Note: this review is incomplete, except nominally for the review of the
security considerations section, on which I concentrated. I'm sending this in
its present state because the bigger points are covered and we're at the end of
the Last Call period. There's more to comment on if given more time.
Also, the below includes review comments by Brad which I'm including raw rather
than smoothly incorporating due to deadline pressure here.
by the way, this is really long and gnarly. ]
bigger overall items:
---------------------
The Security Considerations section is situated ahead of the meat of the spec,
which makes it difficult to comprehend unless one skips ahead.
The spec's exposition of its functionality would be greatly enhanced by at
least one diagram, e.g. a "ladder" or "swimlane" protocol flow diagram.
The spec abstract contains a specific technical example. It should have a more
general overall description of the spec's problem space and approach, instead.
Should perhaps (boldly) define a new term for the notion of "an instance of an
application from a foreign origin, executing in the user agent"
suggestion: web application client instance
The examples are set off with the double vertical bar glyph along the left-hand
edge -- however, they are not otherwise denoted explicitly as examples. The
first time I read the spec, I hadn't before seen that example-denoting style
and so was trying to read the para beginning with "If a resource author has a
simple text resource..." as a logical continuation of the preceding paragraph
and got pretty confused. I suggest the examples have at least a single line at
the top (also set off using the double vertical bar glyph on the left) saying
at least "For example: ". E.g...
|| For example:
||
|| If a resource author has a simple text resource....
||
The spec should define, or reference a definition of..
Ambient authority
(perhaps others, tbd)
overall detailed comments/questions:
------------------------------------
The spec uses the terms "server" and "resource" almost but not quite
interchangeably. There should be an exposition of what the terms mean and what
the differences are between them, if these different terms must be used. Could
one of them, eg resource, be used consistently for both? Or is "server" needed
in some particular places? Also, the term "resource" is undefined in this spec.
(perhaps more, tbd)
editorial items
---------------
Section 5 Syntax -- perhaps should be divided up into two separate numbered
subsections, one for response headers, the other for request headers. Also
should have more whitespace in there in general, found it tough to visually parse.
Section 4 Security Considerations
---------------------------------
overall, this Sec Cons section is long and complicated. I'd divide it up into 3
numbered subsections, which will make referring to the otherwise ambiguously
numbered clauses less ambiguous.
To establish a point of reference, I'll denote this beginning parag as "1st parag":
"Security requirements and considerations are listed throughout this
specification. This section lists advice that did not fit anywhere else."
[ plus I'm not counting the numbered items as bare paragraphs ]
2nd & 3d paragraphs:
The link entitled "simple cross-origin request" leads to the "simple
cross-origin request algorithm" which doesn't actually define what denotes a
request as being a "simple cross-origin request". See below for further details
on this particular issue. Also 2nd & 3d parags should be their own numbered
subsection.
Otherwise, here's a suggested re-write for 2nd parag combined with the 3d
parag, rather than trying to comment/dissect in detail..
"Conformant resources must always be prepared to receive simple
cross-origin requests with user credentials because such requests are
commonplace. For example, such requests can be generated by user agents
processing cross-origin form element submissions or when fetching the
source of script elements, whether or not the user agent conforms to this
specification. Also, because of this, resources for which simple requests
are not idempotent must protect themselves from Cross-Site Request Forgery
(CSRF) by requiring the inclusion of an unguessable token in the explicitly
provided content of the request. [CSRF]"
[ perhaps ought to add a terminology definition for "conformant resources" ]
[[ Brad's comment on above comment:
[Hill, Brad] There is a distinction between the properties of being "safe" and
being "idempotent" that is critical here. A GET request that deletes your
account is idempotent, but it is not safe. Rather than try to over-explain
this, I chose to go with "significance other than retrieval". That may be
confusing, but to simply say idempotent is wrong.
I also don't think we need to restrict this advice to conformant resources. In
fact, every resource that has significance other than retrieval must protect
itself from CSRF, whether it expects CORS requests or not. That is a
deliberate implication of the current wording - that this is an EXISTING
problem in the Web architecture, not one created by CORS - though perhaps I was
too subtle in that.
]]
4th parag:
[note: suggest that 4th parag plus it's 3 following items be moved above
present 2nd & 3d parag, and should become a separate numbered subsection ]
suggested rewrite:
"This specification defines a mechanism by which a resource of one origin
may explicitly authorize user agents to provide the resource's
representation, as returned in an HTTP response, to requesters of other
explicitly identified origins. However, ertain types of resources should
not attempt to specify particular authorized origins, but instead either
deny or allow all origins. For example: "
Item 2 (after 4th parag), states:
"A resource that is publicly accessible, with no access control checks,
can always safely return an Access-Control-Allow-Origin header whose
value is "*". "
What is meant by "with no access control checks" is not clear. I suspect that
what is meant here is: a resource that is intentionally publicly accessible and
which uniformly processes all incoming requests without discrimination. Thus it
can reasonably be made available to any cross-origin request. This can be
signaled by returning a Access-Control-Allow-Origin header with a value of "*"
to all requests. Yes?
If so, then Item 2 could be re-written to more clearly say this. Also, whether
doing so is "safe" or not is somewhat out of scope for a specification such as
this. "Reasonable" is perhaps a better term. For example:
"A publicly accessible resource which is intended to uniformly process
all incoming requests can possibly reasonably be made available to any
cross-origin request. This implies that such a resource could always
return an Access-Control-Allow-Origin header whose value is "*"."
[[ Brad's comment on above comment:
[Hill, Brad] I'd go stronger and say:
"A publicly accessible resource which is intended to uniformly process all
incoming requests SHOULD always return an Access-Control-Allow-Origin header
whose value is "*"."
This is because: say content from origin A wants to include content from origin
B. If the resource responds uniformly to all requests, the server at A could
always act as a proxy for the user-agent. Rather than force this undesirable
pattern, uniform public content SHOULD be accessible cross-origin at the user
agent.
]]
Item 3, beginning with "A GET response whose entity body happens to parse as
ECMAScript":
I take it that the phrase "provided there are no sensitive comments" is meant
to mean that an ECMAscript resource representation could ostensibly contain
sensitive data of some form, most likely in code comments. And also "as it can
be accessed cross-origin using an HTML script element" seems to be a way of
saying that the browser's default Same Origin Policy does not apply to the
stipulated source of an explicit HTML <script src="..."> tag.
This seems to me as being essentially be a particular case of item 2, and is
arguably subsumed by item 2, although it may be useful as an example.
The ending sentence, "If needed, such resources can implement access control
and CSRF protections as described above." could ostensibly also apply to item
2, yes?
[[ Brad's comment on above comment:
[Hill, Brad] No, this is different than item 2. Item 2 specifically refers to
content that is public and with no authorization checks (and therefore proxiable).
Item 3 refers to the fact that user agents do not apply the Same Origin Policy
to the script portions (e.g. not comments) of content interpretable as
ECMAScript when included via a <script src=> tag. That does not at all imply
that the content is uniform - it might be a JSON response that contains private
data or has side-effects authorized with the user's ambient authority. It
means that applying access-control headers to this content SHOULD NOT be done
(lest it be thought effective) because such content is already accessible,
cross-origin, through means outside CORS.
]]
5th parag:
Beginning "Care must always be taken by applications when making cross-origin
requests...". -- I would add at the end of this parag a sentence of "In
particular: "
There's some confusion introduced here by this parag and item 1 when they refer
to treating the Origin header as a credential, because up in the terminology
section, the Origin header is explicitly called out as not being user
credentials. Perhaps the bare term "credentials" needs to be also defined?
Item 1:
"when relying on the Origin header as a credential" -- the "as a
credential" phrase conflicts with the definition for "user credentials",
and begs the question of the Origin header being a member of a
broader class of "credentials", but this notion isn't explicated
in the spec. Maybe this can be addressed by not trying to term the
Origin header a credential, reather indicating that the server/resource
is relying on it.
"authorizing a request" (on the part of a server) is not defined. It
appears to mean the situation where a server observes incoming HTTP
requests and decides whether to fulfill/honor the requests based upon
information (eg header fields) in the requests themselves and/or metadata
about the requester and/or overall context of the requests.
suggested rewrite:
When requests have significance other than retrieval, and when relying on
the Origin header, servers must be careful to distinguish between accepting and
processing a request, and authorizing (via the mechanism defined in this
specification) the requester to have access to the returned resource
representation.
[[ Brad's comment on above comment:
[Hill, Brad]
Both cookies and the Origin header are "credentials" or, more precisely, they
may be treated as security capabilities by certain receivers.
I wanted to distinguish for server-side application authors between the two
types of credentials:
1. Cookies, that describe who the user is, from the view of the site that set
them. (given SOP, the same server receiving them or a related administrative
domain)
2. Origin Header(s) which describe the browser's view of the origin of the
application which caused the original request to be made, plus possibly other
origins traversed in the course of HTTP redirects.
The general best practice is that whether to "accept and process" (e.g. return
200 or 400) should be based only on (1), and that (2) informs a client-side
user agent decision. Really (2) is ideally only returned to the server to
allow it to return a more efficient and privacy-preserving authorization
header, say by including only that specific origin instead of a list of
hundreds or thousands possible authorized origins.
As much as I would like to say that one MUST NOT use (2) in making
authorization decisions as to whether or not to accept and service a request, I
have to expect that application authors will do so. In the best case, only to
save the cost of producing and delivering an expensive response that will just
be discarded, but perhaps to only authorize certain origins to produce certain
responses or side-effects.
So I'm trying to walk a tightrope between recommending against something and
providing realistic details for people who ignore that advice. :P I'm not
surprised it came out a mess.
]]
Item 1.1:
"Authorization for a request" seems to be imply the decision the server
makes regarding whether to fulfill the request or respond with an
error -- this should be clarified.
"authority of the user" is undefined -- should "ambient authority" be used
here? Or "user credentials" ? Are they sometimes different things?
suggested rewrite:
Servers should authorize requests using only the intersection of the user's
credentials, any other user ambient authority, and the requesting origin(s). In
the case of redirects, the Origin header may present multiple origins, and all
must be authorized.
Item 1.2:
Is there an appropriate reference to cite for "forwarding attacks"?
"..the user agent sends the Origin header corp.example." -- is confusing.
should be: "..the user agent sends a request with an Origin header having
a value of 'corp.example'."
"If corp.invalid or the network is malicious, it may cause the request to
be delivered to corp.example,.." Not clear how 'corp.invalid' could itself
cause a request to be delivered to corp.example. Also is worth a footnote
explaining how the network might be malicious (eg DNS has been poisoned).
And an appropriate reference would be good too.
Why is it a problem for corp.example to receive a request it's client
instance intended for corp.invalid? This harks back to the question about
"forwarding attacks".
[[ Brad's comment on above comment:
[Hill, Brad] First to explain: evil.com and good.com, to make it simpler.
good.com is a IP 1.2.3.4. good.com has a naïve anti-CSRF approach that trusts
any request it receives with an Origin header with the value 'good.com'.
good.com makes a CORS request to evil.com. evil.com changes its own DNS
records to say it's IP is 1.2.3.4. As a result, the request actually gets
delivered back to good.com, with an Origin header of value 'good.com'. If
good.com honors this, without checking to see that the Host header said
'evil.com', it may be vulnerable to something like CSRF.
Perhaps "forwarding attack" is not a good term for this, but I believe it falls
within the definition as used in e.g. crypto protocol analysis: cause a message
intended for one recipient to be delivered to another, who attempts to process it.
]]
WRT: "Even better would be to exclusively use secure connections for
cross-origin requests to enable user agents to detect such
misdirections." Yes, but such advice applies equally to all connections
made by the user agent, plus being able to effectively do so is
outside of the control of the web app in question -- e.g., the target
resources may not be available over secure connections.
[[ Brad's comment on above comment:
[Hill, Brad] Yes, this is just a little banging of the HTTPS everywhere drum.
Perhaps can be removed, but it is a valid countermeasure for such attacks, if
you are trying to defend on the server. (e.g. matching the Host header doesn't
have to happen in server-side application logic if it is done by client-side
TLS connection logic)
]]
Item 1.3:
begs question of oauth relation to cors
I found this confusing: "It is often appropriate for servers to require
an authorization ceremony asking a user to consent that cross-origin
requests with credentials be honored from a given origin."
Possible rewording: " < had rewording in mind but inadvertantly deleted it.
would need to re-think & re-create > " -- i do think the above sentence can be
made more clear.
Item 2.1 - 3d parag:
"As a substitute for JSONP-style cross-origin credentialed requests, use
of this specification significantly improves the security posture of the
requesting application, as it provides cross-origin data access whereas
JSONP operates via cross-origin code-injection. The requesting application
has to validate that data received from origins that are not completely
trusted conforms to expected formats and authorized values."
It is not clear whether the last sentence applies to the JSONP case, or to when
a CORS-based approach is used.
[[ Brad's comment on above comment:
[Hill, Brad] Applies to both - very hard to do in the case of CORS, because the
remote server gets to run code!
]]
Item 2.2:
First sentence: "For resources that are safe and idempotent per HTTP,
and where the credentials are used only to provide user-specific
customization for otherwise publicly accessible information."
"safe and idempotent per HTTP" apparently refers to S9.1 of RFC2616, which is
about methods rather than resources.
Suggested rewrite:
"Safe and idempotent" HTTP methods are employed for retrieval only, and also
if the supplied user credentials are used only to provide user-specific
customization for otherwise publicly accessible information.
2nd sentence: "In this case, restricting access to certain origins may
protect user privacy by preventing customizations from being used to
identify a user, except at authorized origins."
Suggested rewrite:
In this case, only allowing certain origins to obtain user-specific data,
such as customizations, may help in protecting user privacy.
Item 3:
[ not sure I want to touch this item because of the history of how this sec
considerations section came about. there are a few relatively minor fixes that
could possibly be made, but I'm not sure these parags need full-on rewriting ]
The term "implicit credentials" isn't defined.
[[ Brad's comment on above comment:
[Hill, Brad] "implicit credentials" == "ambient authority" good substitution
to make
]]
6th parag:
"Authors of client-side Web applications are strongly encouraged to
validate content retrieved from a cross-origin resource as it might
be harmful."
Is there some reference(s) to point readers at? Also, should not Web app
client instances always be careful with data returned over the network because
any/all of it could be harmful, eg if not using verified secure connections
could be subject to DNS poisoning attacks for example.
[[ Brad's comment on above comment:
[Hill, Brad] Yes, this is another consequence of UMP - CORS disagreements.
Pro-CORS folks said "developers on the Web understand and expect this to always
be true" Pro-UMP folks didn't think so. This has some relevance in that even
cross-origin data retrieved over HTTPS should be validated to prevent
cross-origin DOM-based code injection attacks. The attack patterns here are
complex and unfortunately, you are right, the advice amounts to "untrusted data
is untrusted". :(
]]
7th parag:
"Authors of client-side Web applications using a URL of the type
people.example.org/~author-name/ are to be aware that only cross-origin
security is provided and that therefore using a distinct origin rather
than distinct path is vital for secure client-side Web applications."
This is confusing. Perhaps, if I understand correctly, this rewrite..
Web applications that are not uniquely identified by specific host names,
and/or mapped to specific ports, do not necessarily have a unique origin, and
thus will not be able to securely utilize the mechanism defined in this
specification. This is because an origin is composed of only the triple of
scheme, hostname, and port.
For example, a web application whose URL is of the type
example.org/app-name/ and the app-name portion is necessary to distinguish the
web application from other web applications also running at example.org, will
be unable to securely employ the mechanism defined in this specification.
Mapping web applications to distinct origins is vital for secure web
applications.
"simple cross-origin request"
-----------------------------
The term "simple cross-origin request" conspicuously lacks a simple, all in one
place, prose definition. Instead, the reader must click, and then the "simple
cross-origin request" algorithm is displayed, but this algorithm doesn't itself
say anything about what constitutes a "simple cross-origin request", rather one
needs to click on "make a request steps", and another algorithm is displayed,
however, the first sentence does open with "Whenever the make a request steps
are applied, fetch the request URL from origin source origin...". Thus the
reader can plausibly determine that a "simple cross-origin request" comprises
fetching a URL, however there's nothing in any of these 3 levels of ostensible
definition that indicate what makes the request "simple", or "cross-origin".
In subsequent casting about, the reader may notice that in the terminology
section there's a definition for "simple method", and also that so that plus
context elsewhere in spec seems to define "simple request" as one whose method
is either GET, HEAD, or POST.
Also, "simple cross-origin request" is effectively defined by step 2 in section
"7.1 Cross-Origin Request", because that step defines what actually constitutes
a "simple cross-origin request" (i.e. the request method is a simple method).
Perhaps the definition for "simple cross-origin request" should simply be moved
to be step 2 in section "7.1 Cross-Origin Request".
[[ Brad's comment on above comment, and on comments above in general:
[Hill, Brad] Perhaps "simple cross-origin request" is the wrong term. What is
described is hardly simple in the context of the overall model. What it really
is, is a "legacy cross-origin request" - what I said in the sec cons with:
"A simple cross-origin request has been defined as congruent with those which
may be generated by currently deployed user agents that do not conform to this
specification."
It is not defined according to a particular logic other than, "these are the
types of requests that may originate from user agents that do not implement
CORS, and CORS does not attempt to provide any access control properties for
these requests that differ from what non-CORS aware servers and clients have
always done".
That definitely should be made more clear, and would have a salubrious effect
across the specification, I expect. Unfortunately, the behavior of "legacy"
user agents in this regard is not clearly defined, except piecemeal in HTTP,
and absent of security considerations.
Overall, very good food for thought. Some good clarifications, some that I
think miss the mark, but expose important muddiness in the spec and language.
]]
---
end
Received on Wednesday, 2 May 2012 00:20:56 UTC