Re: More comments on access-control from Anne van Kesteren on 2007-11-19 (public-appformats@w3.org from November 2007)

From: Anne van Kesteren <annevk@opera.com>
Date: Mon, 19 Nov 2007 23:28:32 +0100
To: "Ian Hickson" <ian@hixie.ch>
Cc: "WAF WG (public)" <public-appformats@w3.org>
Message-ID: <op.t118puop64w2qv@annevk-t60.oslo.opera.com>
Thanks a lot Ian! A few questions below.


On Wed, 14 Nov 2007 20:14:03 +0100, Ian Hickson <ian@hixie.ch> wrote:
> On Wed, 14 Nov 2007, Anne van Kesteren wrote:
>>   http://dev.w3.org/2006/waf/access-control/
>
> 1.1 has an example that reads:
>
>    Access-Control: <hello-world.invalid>
>
> ...which seems invalid.

Fixed.


> "case-insensitive match" is defined poorly. If it is intended to _only_  
> be about swapping a-z for A-Z, it should say so explicitly, not in
> parenthesis. If it is about full Unicode mapping, then it should be  
> stated appropriately and the a-z part should be removed. Also, generally  
> it is
> better to lowercase and compare than uppercase and compare, since in full
> Unicode cases the lowercase versions are more canonical iirc.

Fixed.


> The algorithm to "obtain the values from a space-separated list" mixes  
> its tenses. It starts in the simple present ("must replace"), and then
> switches to the present progressive ("dropping ... and then chopping").
> The way it is phrased doesn't technically define how you obtain values,  
> it defines how you replace characters, which for some reason involves
> chopping the string.

Any idea how you're going to change  
http://www.w3.org/TR/2007/CR-xbl-20070316/#attributes0 as that's pretty  
much what text I'm reusing.


> 2.1 Access Item: "When the access item is used as part of the
> Access-Control HTTP header authors must specify the result of applying  
> the ToASCII algorithm to the internationalized domain name as HTTP does  
> not
> support Unicode." still doesn't make sense to me. The requirement is that
> the author provide a purely ASCII domain name, not that they take an IDN
> and apply ToASCII, IMHO.

Fixed.


> 2.1 Access Item: Example "http://example.org:*" is said to be invalid but
> as far as I can tell it is valid.

Fixed.


> Why is the "*." bit redundant in the domain part? How do I make sure
> something matches "livejournal.com" but not "ianhickson.livejournal.com"?

   allow <livejournal.com> exclude <ianhickson.livejournal.com>

or more generic

   allow <livejournal.com> exclude <*.livejournal.com>


> There are numerous hosts where the subdomain space isn't trusted but  
> where the hostname itself is secure, and "example.com" doesn't at all  
> convey
> that all subdomains are also trusted. I think we should require
> "*.example.com" to indicate that subdomains are trusted.

Writing

   allow <example.com> <*.example.com>

was expected to be the general case and therefore we previously decided to  
go for

   allow <example.com>

to address that case. I'm not really comfortable with revisiting that once  
again.


> It actually seems that even in the spec there is confusion about this,  
> for example there is this example:
>
>    Access-Control: allow <example.org> <*.example.org>

Examples are not always updated when the normative text does  
unfortunately. Fixed now.


> 2.4. Referer-Root (sic) HTTP header: Do we need to continue misspelling
> this?

It seems more consistent with the existing header.


> 3.1. Cross-site Access Request: "followed by the port (defaulting to the
> default port for the scheme) of the resource" -- it makes no sense to
> default the port in this case, since the resource had to have a port for
> the request to have been made in the first place.

Fixed.


> 3.1. Cross-site Access Request: "of the resource from which the request
> originated" -- is this true? Isn't it of the resource that the calling
> spec wants used as the origin? e.g. in XHR I would imagine that the  
> actual URI used would be the origin, which (e.g. in the case of data:  
> URIs) might not match the resource's own URI at all. The next paragraph  
> seems to agree with me.

Fixed.


> 3.1. Cross-site Access Request: Does the referrer root URI include the
> port even if it is the default port?

That's what the definition says, no?


> 3.1. Cross-site Access Request: what does "Specifications are strongly
> encouraged to define this in equivalent ways." mean?

I reworded this. The intention is that specifications base it on the same  
"source" as much as possible.


> 3.1. Cross-site Access Request: "As this algorithm is used by other
> specifications, those specifications must ensure to handle all return
> values. Specifications may ignore "reason" if "error" is "true"." -- this
> paragraph makes no sense at this point. What algorithm? What return
> values? What are "reason" and "error"? I recommend, before this  
> paragraph, giving an overview of what the algorithms can return.

Done.


> 3.1.1. Generic Cross-site Access Request Algorithms: "are same-origin" is
> not defined yet.

Defined.


> 3.1.1. Generic Cross-site Access Request Algorithms: It's not clear which
> algorithm "this algorithm" is. The "Generic Cross-site Access Request
> Algorithms"? The "generic redirect steps"?

Reworded.


> 3.1.1. Generic Cross-site Access Request Algorithms: What does
> "transparently follow the redirect while observing the set of request
> rules" mean?

It somehow needs to point back to the algorithm that invoked it where  
there is a list of "request rules" which define what to do in case of a  
network error, redirect, etc.


> Tuples are denoted (like, this) not "like, this". (e.g. in 3.1.1. Generic
> Cross-site Access Request Algorithms.)

Fixed.


> In fact in general you seem to
> overuse quote marks -- I recommend only using them for strings, quotes,
> euphemisms, and sarcasm, not for variables and literals.

If you have suggestions for what to use instead that would be welcome. I'm  
often wondering what would be best to use in a particlar case.


> 3.1.2. Cross-site GET Access Request: "Perform an access check" isn't
> defined yet nor hyperlinked. Same applies in "3.1.3. Cross-site Non-GET
> Access Request".

Fixed.


> 3.1.2. Cross-site GET Access Request: Why do you invent "current request
> URI"? It's just given the value of "request URI" and seems to only be  
> used once, so why not just use "request URI"?

The idea is that "current request URI" is updated during a redirect and  
"request URI" always points to the initial starting point. I suppose we  
could just update "request URI" along the way. I wasn't sure if that would  
be confusing or not.


> I'm assuming this is related to
> the "macro" steps in "3.1.1. Generic Cross-site Access Request
> Algorithms", but it isn't clear to me how this all works. For example,
> those refer to "origin" but I don't know what origin that is.

That's defined at the start of the algorithm that invokes it. It's the  
referrer root URI.


> 3.1.3. Cross-site Non-GET Access Request: The first paragraph has the  
> MUST for the list of steps, but the second paragraph confuses matters by  
> being
> "in the way".

Reordered.


> 3.1.3. Cross-site Non-GET Access Request: What is the "target URI"?

Typo I think. I can no longer find it. Probably should've been request URI.


> 3.1.3. Cross-site Non-GET Access Request: Again with the mention of
> "origin" -- whose origin? Where does it come from? It doesn't seem to be
> any of the arguments passed from the other spec.

It is defined at the start of the algorithm, no? "Let origin be the  
referrer root URI."


> "If there is a Method-Check-Expires  HTTP response headers that can be
> successfully parsed it must be honered." misspells "honored", but in any
> case it doesn't define what honoring it means. It should probably say
> instead that the entry must be removed once the current time exceeds the
> time specified by the header, or some such.

Tried a fix.


> I assume how to parse the header is defined somewhere?

It's no better defined than the HTTP-date production (also used in the  
Expires header). I'm afraid to look into that.


> 3.2. Access Control Check: "The second subsection of this section" is
> confusing. I couldn't tell if "this section" was section 3 or section  
> 3.2, and whethe the second subsection was 3.2, or 3.2.2. I'd just remove
> paragraphs that tell you what you're about to read, frankly.

Ok.


> The way you have the "temp method list" defined, you don't cache as
> much as you should. Consider a resource with the following:
>
>    <?access-control allow="example.com" method="POST"?>
>    <?access-control allow="example.com" method="PUT"?>
>    <?access-control allow="example.com" method="DELETE"?>
>
> Now imagine you do a POST followed by a PUT, followed by another POST.
> Ideally, we should send a single GET, and then the POST, and then the  
> PUT, and then the final POST, because we know the PUT will succeed.  
> However,
> instead, we will send a GET, a POST, another GET, a PUT, and then a POST.

Actually, the idea *was* that the PUT would simply not be allowed. Error  
flag to "fail" and "detail" to "network". I guess we should revisit that.  
See below.


> I believe we should cache all the methods that are allowed, not just the
> methods of the access-control item that was matched.

Ok, so the idea is to keep "looping" and adding methods to the list when  
it's ok?


> Incidentally, you should mention whether the authorization request cache
> can have multiple items with the same key. (It seems that it can.)

The idea is that you can't. When would this be possible?


> The rules for processing access-control PIs will drop any PI with a
> method="" pseudo-attribute at the moment. In fact the pseudo-attribute is
> generally not supported by the algorithm as far as I can tell.

Fixed.


> The rules for processing access-control PIs look like they won't drop PIs
> with multiple pseudo-attributes of the same name other than exclude="".
> e.g. <?access-control allow="example.com" allow="example.com"?> doesn't
> get dropped by the current rules.

Duplicate pseudo-attributes are a parse error per the <?xml-stylesheet?>  
specification.


> 3.3. Access Item Check, step 1: This line is confusing. You are letting
> the algorithm's parameters be overwritten by undefined variables. I think
> you mean "let origin be..." and "let item be..." not the other way  
> around.

Fixed.


> 3.3. Access Item Check, step 6: how can "origin" not have a scheme?

Fixed.

Kind regards,


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Monday, 19 November 2007 22:28:30 UTC