Re: Comments on Access Control (http://www.w3.org/TR/access-control/) from Anne van Kesteren on 2007-03-22 (public-appformats@w3.org from March 2007)

From: Anne van Kesteren <annevk@opera.com>
Date: Thu, 22 Mar 2007 10:22:19 +0100
To: "Marc Silbey" <marcsil@windows.microsoft.com>, public-appformats@w3.org
Message-ID: <op.tpk2zhn464w2qv@id-c0020>
Hi Marc,

I've marked a few comments as "Address later" for now. I hope that's ok. I  
also have some questions on some of the proposed changes. Please see below.


On Wed, 21 Mar 2007 19:22:21 +0100, Marc Silbey  
<marcsil@windows.microsoft.com> wrote:
> As promised I've included our comments and questions below on Access
> Control. The draft looks very good and the below comments are minor. I'm
> looking forward to discussing these in further detail.

Thanks!


>  1. Introduction
> COMMENT 1) Maybe change "For security reasons, web browsers typically do
> not permit a website..." to "For security reasons, web browsers
> typically do not permit client scripting code running on one website..."
>
> COMMENT 2) Maybe change "The access-control mechanism enables web
> resources to permit websites to access their content" to "The
> access-control mechanism enables web resources to permit scripts running
> from other domains to access their content"

The idea is that it can be used outside of scripting as well. Such as  
cross-site XSLT. Therefore I haven't made the suggested changes.


>  1.1 Background
> COMMENT 3) Correction for our agreed scope: change "The access-control
> header allows an XML data document to..." to "The access-control header
> allows a web resource to" and change "XML document" to "resource" in "By
> specifying an access control header that "allows" example.com to read,
> that particular XML document"

Fixed.


>  1.1.1 Definition of Read Access to Web Resources
> COMMENT 4) typo remove "s" in "resources"

Fixed. (Made it "to a Web Resource".)


> COMMENT 5) Correction for scope: change "XML document" to "resource" for
> "A request made by an application to load a web resource in a manner
> that allows the application to inspect the contents of that resource"

Fixed.


>   1.2. Conformance Criteria
> COMMENT 6) typo remove "over" from "...written with more concern for
> clarity than efficiency"

Oops, fixed.


>  1.3. Security Considerations
> COMMENT 7) "User agents which implement this capability should take care
> not to expose other trusted data (cookies, HTTP header data)
> inappropriately" - we should probably provide some scenarios that we're
> trying to protect so readers can easily understand this

Address later.


> COMMENT 8) It maybe more clear to say "Authors should take care to
> protect against exposing themselves to cross-site scripting attacks by
> rendering or executing the retrieved content directly without
> validation."

Fixed.


> 2. Access Control Read Policy
> COMMENT 9) We should define what extra safety measures are required for
> HTTP methods besides HEAD and GET. We should think again about adding
> POST because some folks will argue that it is as safe as GET and would
> be a useful addition.
>
> QUESTION: What happens in the case of trailing headers? Maybe we should
> specify that this appears in the headers that come before the body

Address later.


> COMMENT 10) Proposed rewording "When access to a resource is not
> permitted by this policy, the request is said to be in error and access
> to that resource MUST be denied in such a way that the status or
> existence of the blocked resource is not revealed to the caller (to
> prevent enumeration/fingerprinting attacks)."

It's not clear to me what text this is supposed to replace.


> COMMENT 11) Proposed rewording "Resources to which the access control
> read policy applies have an associated unordered list (which can be
> empty) of access control rules. There are allow and block lists. An
> access control rule consists of an allow ruleset and optionally a deny
> ruleset to handle exceptions. Each of these rulesets is an unordered
> list of access items. How each access control rule is matched against
> the request URL to determine whether access to the resource is to be
> granted is described in the next section.

This proposed text uses inconsistent terminology. If the proposal is  
simply to replace "except" with "deny" I'm not sure if that's ok as in the  
currently described policy having something in the "except" list doesn't  
necessarily mean that access to the resource is denied. I'm not entirely  
convinced we should go through with that though and I've asked the person  
who originally proposed this to provide use cases. (On  
member-accesscontrol-tf@w3.org. I can't get a permalink right now as the  
mailing list archives appear to be offline.)


> COMMENT 12) Proposed change to EBNF:
>
> An access item MUST match the following EBNF:
> access-item     ::= scheme-specifier "://" domain-pattern ( ":"
> port-specifier )? | "*"
> domain-pattern  ::= wildcard-label | wildcard-label "." domain
> wildcard-label  ::= label | "*"
> scheme-specifier ::= scheme | "*"
> port-specifier ::= port | "*"

Since the port is still optional what should it default to if no scheme is  
provided?


>  We're concerned that allowing "example.*" wildcarding maybe
> unnecessarily flexible and lead to mistakes by web developers

We agreed yesterday that maybe requiring the TLD would be good. So that  
you can't omit things like .com etc. but that doesn't actually solve the  
problem with .co.uk for instance. (And the various hundreds, maybe more,  
more complex registration systems.) Do we really want to go there?


> COMMENT 13) Proposed rewording "In addition to matching the above EBNF,
> the ToASCII algorithm MUST apply successfully (without errors) to each
> label component from the access item. If the access item doesn't match
> the EBNF or the ToASCII algorithm fails, the request is denied."

This follows from it being in error. I suppose we can drop that though as  
in error always leads to access being denied. Address later.


> COMMENT 14) Proposing removal of the following examples following the
> above comments on wildcards
>  https://*.*:80
>  *://example.org
>  http://example.org:*

I'll address the examples once the above comment is addressed.


>  2.1. Content-Access-Control header
> QUESTION) Does this plus symbol mean that there are always two rules
> defined? "ruleset ::= rule (LWS? "," LWS? rule)+"

Fixed (I think).


> COMMENT 15) Proposed rewording: "If the Content-Access-Control header
> doesn't match the specified syntax, the request is denied." If we decide
> to go with "deny" instead of "except" there are other replacements.
> Similarly we should think about changing "resource is in error" to
> "request is denied"

Address later.


>  3. Matching Algorithm
> COMMENT 16) Maybe add "It should be observed that the DENY rules take
> precedence over any ALLOW rules." after the first algorithm. We should
> think about joining the allow and deny rulesets so the operate on the
> full list together.

This is not the idea of the current algorithm. Though see above, it's not  
clear we need it. Address later.


> COMMENT 17) Proposed changes to the second algorithm to help clarify
>
> 1. Let request URL be origin and access item be rule.
> 2. If item is a single U+002A (*) there is a match. Abort this
> algorithm.
> 3. Drop the path, query, and fragment part in origin so that it
> matches the access item production.
> 4. Count the U+002E (.) characters in both origin and item. If the
> results are not equal, there is no match;  abort this algorithm.
> 5. Compare the scheme from origin and item. If there's a match,
> drop the scheme from both including the :// sequence following it.
> Otherwise, abort this algorithm.
> 6. Compare the port from origin and item. If either of them doesn't
> have the port explicitly specified use the default port for the scheme.
> If there's a match, drop the port from both including the U+003A (:)
> preceeding it. Otherwise, abort this algorithm.
> 7. Split origin and item on the U+002E (.) character and preserve
> the order of new set of LabelItems. In case there's no U+002E character,
> each set will have exactly one LabelItem. Now for each set of LabelItems
> (one from origin and one from item):
>  1. Let the LabelItem from origin be OriginLabel and the
> item from item be RuleLabel.
>  2. If RuleLabel is a single U+002A (*) character, then
> there is a match. Perform this sub algorithm again for the next set of
> LabelItems or abort this sub algorithm if there's no next set of
> LabelItems.
>  3. Apply the ToASCII algorithm to OriginLabel and
> CompareLabel.
>  4. Compare OriginLabel and CompareLabel. If there's a
> match, do this sub algorithm again for the next set of LabelItems or
> abort this sub algorithm if there's no next set of LabelItems.
> Otherwise, abort this algorithm.
> 8. There's a match. Abort this algorithm.

Address later.


> Many thanks to the IE folks that reviewed the spec and a special thanks
> to Eric Lawrence, one of our Networking gurus, for helping provide
> detailed comments

Do you have a more complete list of names for the acknowledgements  
section? Thanks!

Cheers,


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Thursday, 22 March 2007 14:35:30 UTC