More comments on access-control from Ian Hickson on 2007-11-14 (public-appformats@w3.org from November 2007)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 14 Nov 2007 19:14:03 +0000 (UTC)
To: Anne van Kesteren <annevk@opera.com>
Cc: "WAF WG (public)" <public-appformats@w3.org>
Message-ID: <Pine.LNX.4.62.0711141807390.22952@hixie.dreamhostps.com>
On Wed, 14 Nov 2007, Anne van Kesteren wrote:
> 
>   http://dev.w3.org/2006/waf/access-control/

1.1 has an example that reads:

   Access-Control: <hello-world.invalid>

...which seems invalid.


"case-insensitive match" is defined poorly. If it is intended to _only_ be 
about swapping a-z for A-Z, it should say so explicitly, not in 
parenthesis. If it is about full Unicode mapping, then it should be stated 
appropriately and the a-z part should be removed. Also, generally it is 
better to lowercase and compare than uppercase and compare, since in full 
Unicode cases the lowercase versions are more canonical iirc.


The algorithm to "obtain the values from a space-separated list" mixes its 
tenses. It starts in the simple present ("must replace"), and then 
switches to the present progressive ("dropping ... and then chopping"). 
The way it is phrased doesn't technically define how you obtain values, it 
defines how you replace characters, which for some reason involves 
chopping the string.


2.1 Access Item: "When the access item is used as part of the 
Access-Control HTTP header authors must specify the result of applying the 
ToASCII algorithm to the internationalized domain name as HTTP does not 
support Unicode." still doesn't make sense to me. The requirement is that 
the author provide a purely ASCII domain name, not that they take an IDN 
and apply ToASCII, IMHO.


2.1 Access Item: Example "http://example.org:*" is said to be invalid but 
as far as I can tell it is valid.


Why is the "*." bit redundant in the domain part? How do I make sure 
something matches "livejournal.com" but not "ianhickson.livejournal.com"? 
There are numerous hosts where the subdomain space isn't trusted but where 
the hostname itself is secure, and "example.com" doesn't at all convey 
that all subdomains are also trusted. I think we should require 
"*.example.com" to indicate that subdomains are trusted.

It actually seems that even in the spec there is confusion about this, for 
example there is this example:

   Access-Control: allow <example.org> <*.example.org>


2.4. Referer-Root (sic) HTTP header: Do we need to continue misspelling 
this?


3.1. Cross-site Access Request: "followed by the port (defaulting to the 
default port for the scheme) of the resource" -- it makes no sense to 
default the port in this case, since the resource had to have a port for 
the request to have been made in the first place.


3.1. Cross-site Access Request: "of the resource from which the request 
originated" -- is this true? Isn't it of the resource that the calling 
spec wants used as the origin? e.g. in XHR I would imagine that the actual 
URI used would be the origin, which (e.g. in the case of data: URIs) might 
not match the resource's own URI at all. The next paragraph seems to agree 
with me.


3.1. Cross-site Access Request: Does the referrer root URI include the 
port even if it is the default port?


3.1. Cross-site Access Request: what does "Specifications are strongly 
encouraged to define this in equivalent ways." mean?


3.1. Cross-site Access Request: "As this algorithm is used by other 
specifications, those specifications must ensure to handle all return 
values. Specifications may ignore "reason" if "error" is "true"." -- this 
paragraph makes no sense at this point. What algorithm? What return 
values? What are "reason" and "error"? I recommend, before this paragraph, 
giving an overview of what the algorithms can return.


3.1.1. Generic Cross-site Access Request Algorithms: "are same-origin" is 
not defined yet.


3.1.1. Generic Cross-site Access Request Algorithms: It's not clear which 
algorithm "this algorithm" is. The "Generic Cross-site Access Request 
Algorithms"? The "generic redirect steps"?


3.1.1. Generic Cross-site Access Request Algorithms: What does 
"transparently follow the redirect while observing the set of request 
rules" mean?


Tuples are denoted (like, this) not "like, this". (e.g. in 3.1.1. Generic 
Cross-site Access Request Algorithms.) In fact in general you seem to 
overuse quote marks -- I recommend only using them for strings, quotes, 
euphemisms, and sarcasm, not for variables and literals.


3.1.2. Cross-site GET Access Request: "Perform an access check" isn't 
defined yet nor hyperlinked. Same applies in "3.1.3. Cross-site Non-GET 
Access Request".


3.1.2. Cross-site GET Access Request: Why do you invent "current request 
URI"? It's just given the value of "request URI" and seems to only be used 
once, so why not just use "request URI"? I'm assuming this is related to 
the "macro" steps in "3.1.1. Generic Cross-site Access Request 
Algorithms", but it isn't clear to me how this all works. For example, 
those refer to "origin" but I don't know what origin that is.


3.1.3. Cross-site Non-GET Access Request: The first paragraph has the MUST 
for the list of steps, but the second paragraph confuses matters by being 
"in the way".


3.1.3. Cross-site Non-GET Access Request: What is the "target URI"?


3.1.3. Cross-site Non-GET Access Request: Again with the mention of 
"origin" -- whose origin? Where does it come from? It doesn't seem to be 
any of the arguments passed from the other spec.


"If there is a Method-Check-Expires  HTTP response headers that can be 
successfully parsed it must be honered." misspells "honored", but in any 
case it doesn't define what honoring it means. It should probably say 
instead that the entry must be removed once the current time exceeds the 
time specified by the header, or some such. I assume how to parse the 
header is defined somewhere?


3.2. Access Control Check: "The second subsection of this section" is 
confusing. I couldn't tell if "this section" was section 3 or section 3.2, 
and whethe the second subsection was 3.2, or 3.2.2. I'd just remove 
paragraphs that tell you what you're about to read, frankly.


The way you have the "temp method list" defined, you don't cache as 
much as you should. Consider a resource with the following:

   <?access-control allow="example.com" method="POST"?>
   <?access-control allow="example.com" method="PUT"?>
   <?access-control allow="example.com" method="DELETE"?>

Now imagine you do a POST followed by a PUT, followed by another POST. 
Ideally, we should send a single GET, and then the POST, and then the PUT, 
and then the final POST, because we know the PUT will succeed. However, 
instead, we will send a GET, a POST, another GET, a PUT, and then a POST.

I believe we should cache all the methods that are allowed, not just the 
methods of the access-control item that was matched.


Incidentally, you should mention whether the authorization request cache 
can have multiple items with the same key. (It seems that it can.)


The rules for processing access-control PIs will drop any PI with a 
method="" pseudo-attribute at the moment. In fact the pseudo-attribute is 
generally not supported by the algorithm as far as I can tell.


The rules for processing access-control PIs look like they won't drop PIs 
with multiple pseudo-attributes of the same name other than exclude="". 
e.g. <?access-control allow="example.com" allow="example.com"?> doesn't 
get dropped by the current rules.


3.3. Access Item Check, step 1: This line is confusing. You are letting 
the algorithm's parameters be overwritten by undefined variables. I think 
you mean "let origin be..." and "let item be..." not the other way around.


3.3. Access Item Check, step 6: how can "origin" not have a scheme?


HTH,
-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 14 November 2007 19:14:23 UTC