Re: Review of 22 Jan editor's draft from Anne van Kesteren on 2008-02-04 (public-appformats@w3.org from February 2008)

From: Anne van Kesteren <annevk@opera.com>
Date: Mon, 04 Feb 2008 13:07:35 +0100
To: "Thomas Roessler" <tlr@w3.org>
Cc: "WAF WG (public)" <public-appformats@w3.org>
Message-ID: <op.t5z1axcw64w2qv@annevk-t60.oslo.opera.com>
On Wed, 30 Jan 2008 00:46:21 +0100, Thomas Roessler <tlr@w3.org> wrote:
>>> - There's a lot of very detailed stuff about white space
>>>   separated lists going on in section 2; I'd rather see this
>>>   dropped, and grammars and useful language used closer to the
>>>   parsing steps.
>
>> I rather not try to change this at this point at the risk of
>> introducing errors.
>
> I disagree.  The over-precise description that's currently in there
> probably has a *greater* risk of leading to errors than a more
> abstract specification style.

Looking at the text in question I don't see what's problematic with it.  
It's a rather simple definition. We use similar definitions in XBL 2.0 and  
nobody has had a problem with it there either.


> The clauses that were - in the 22 Jan version - listed in the bullet
> points under 3 are, to my knowledge, not repeated in the processing
> model.
>
> Please don't mix the processing model into the security
> considerations.

Those considerations are external to the processing model.


>>> - "Authors sharing content with domains that are on shared hosting
>>>   environments" rather misses the point by just talking about ports:
>>>   Namely, that -- because we assume that the protocol / technology
>>>   that hosts the access-control framework uses a same-origin policy
>>>   -- authorizations can only be given with a granularity of origins.
>>>   Anything below that is futile.
>>
>> Origin includes ports (and schemes).
>
> We're talking past each other here.  You are calling out a specific
> way in which policy authors may extend authorizations to more
> origins than intended.  I was concerned about a section that should
> explain the limitations of what an origin is.

I have a hard time following what you're saying. Are you saying you want a  
new section that explains what "origin" means?


>>> - "Authors are encouraged to check the Referer-Root HTTP header" --
>>>   this should be somewhere in the processing model, not a side
>>>   remark in the security considerations.  It *is* an additional
>>>   policy enforcement point, and should be called out clearly.
>
>> The processing model is not for authors.
>
> Policy authors *must* be able to understand the processing model --
> otherwise, they won't understand the effects that their policies
> have, or how to deploy them.
>
> (However, your statement explains the current style, and is rather
> worrisome.)

I don't believe most authors will read the specification. And those that  
do will be able to understand what is in there.


>>> - The design seems a bit inconsistent about IDNs: The syntax permits
>>>   them, but HTTP doesn'tl the latter is called out in a note.  I'd
>>>   rather see that done consistently.  When speaking about IDNs, it
>>>   might be useful to adapt the A-label and U-label terminology from
>>>   this I-D:
>>>  http://tools.ietf.org/html/draft-klensin-idnabis-issues-05
>
>> Do you have any concrete suggestions?
>
> Have you actually followed the reference?

I couldn't really figure out what you meant with your comment because I  
believe we're not inconsistent about IDNs. And using terminology from a  
draft is not possible unless we define the terms ourselves and I don't see  
why that would be necessary.


>>> - '"method" rule' is oddly phrased.
>
>> I couldn't figure out where this was referring to.
>
> What you call "method rule" isn't actually a rule, but rather a
> phrase or a clause -- pick one, but don't call it rule.

I don't use "method rule". There is some other usage of rule in there  
though, but I believe that was taken from RFC 2616.


> "The referrer root URI is the scheme followed by ://, followed by
> the domain without any trailing ...." and so on -- that construction
> assumes that the URI syntax in use is HTTP-like.  Not every possible
> URI syntax is.
>
> So, this needs to either be generalized, or it needs to be said that
> the referrer root URI is undefined for URI schemes that different
> from http, ftp, and file.
>
> (Example: The referrer might be a URN or an about: URI. What now?)

It already mentions what to do in case the resource does not have a  
host-based authority.


> I would rather see the processing model phrased in more natural
> language than the current one.

I would rather keep it this way.


>>> - Why is the authorization request cache mandatory?
>
>> Is there a good reason to make it optional?
>
> The algorithm can function without it.  That suggests optional.
> Also, the way the cache handling is specified is suboptimal to say
> the least (see below).

Ok.


>>> - There is no discussion as to how Vary or Cache-control headers on
>>>   HTTP responses that were received are handled.  How do these
>>>   interact with the separate caching model specified here
>>
>> OPTIONS is not cached so those headers are not relevant.
>
> Consider an XML document with the processing instruction that has
> its own set of cache-control headers.  What assumptions are we
> making about the interaction between the different caches?

I don't see how an XML document is relevant here.


> Well, simply that a discussion of how a hosting protocol deals with
> these things, in generic terms, would be very useful.

Ok.


>>> - The access control check algorithm goes to an excrutiating level
>>>   of detail, while confusing the reader.  It is probably much easier
>>>   to write up how to parse the various headers into the mapping from
>>>   origins to methods, and how to deal with that.
>
>> At least for me, it wasn't.
>
> Well, there have been several proposals to phrase this differently.

That were complete and accurate? Pointer?


>>> - Once more, we have forward references to generic material,
>>>   undeclared variables used to pass around information between
>>>   different sections, and a general lack of readability.  For
>>>   example, temp method list isn't temporary, not introduced before
>>>   its first appearance, and only specified in the "allow list check"
>>>   section.
>
>> "temp method list" is introduced well in advance.
>
> It's not introduced in a useful way.  In particular, there is no
> indication in section 5.2.1 that the "allow list check" step changes
> temp method list as a side effect.  If this was code, it would be
> seriously bad programming style.  Here, it's seriously bad
> spec-writing style.

I suppose I could add a note.


>>> - "parse ... using a streaming XML parser" -- I'm pretty sure you
>>>   don't mean to prescribe use of a streaming XML parser, but rather
>>>   want to allow use of one, right?
>
>> I do mean to prescribe the use of an XML parser. Otherwise
>> results would be inconsistent with different clients picking
>> different strategies.
>
> The emphasis is on *streaming*.  Please spell out what requirements
> you specifically expect such a parser to fulfill, beyond being an
> XML Processor.
>
>   http://www.w3.org/TR/xml/#dt-xml-proc

Those requirements are mentioned.


>>> - In the allow list check, item 3 of the algorithm looks like it's
>>>   wrong.  This actually prunes the list of methods that are added to
>>>   the temp method list depending on the current request's method.
>
>> I don't think the specification is wrong here. Could you elaborate?
>
> The authorization decision cache is updated when an access check is
> performed.  This update maps an (origin, request URI) pair to a
> (method list, expiry) pair.
>
> The method list is calculated in the access control check.
>
> Consider this access control policy:
>
>  allow a.com method POST PUT  (1)
>  allow a.com method DELETE  (2)
>
> An access control check is performed for a cross-origin POST from
> a.com.  In this case, step 3 will cause rule (2) to be skipped.  The
> authorization decision cache will therefore record (1) only.

A yes. Step 4 has to occur before step 3. I'm not sure why I hadn't seen  
that before.


> A subsequent DELETE will *remove* the entry for POST and PUT and
> cause a new access control check.  Now, the policy for DELETE will
> be stored.
>
> A subsequent PUT will *remove* the entry for DELETE, and cause a new
> access control check.
>
> In short, the handling in step 3 is inconsistent with the aim of the
> cache, an actually suggests that the cache design needs some serious
> re-thinking.

I think it's solved now actually.


> E.g., if a caching model needs to be specified, then it might be
> more useful to use the triple (origin, request URI, method) as a key
> for the cache.

That doesn't seem necessary.


> I'm sorry, but that doesn't address my comment.  Please refrain from
> writing pseudo-code and write English language instead.  The
> specification's suitability for review is an important piece of
> specification quality.

We've been over this several times. Some people are not ok with how the  
specification style is done and some people are. So far we have no  
replacement text and tweaking the current text will only lead to errors.  
I've seen that happening enough with CSS 2.1 and I rather not fall in the  
same pit.


> The algorithm given re-specifies parts of the URI and DNS
> specifications, is opaque, and requires review for correctness from
> the perspective of these specifications.  A high-level description
> which says that the domain names are compared, with the "*"
> wildcard, is simply not subject to this class of errors.

It is subject to introducing a whole new class of errors that the current  
text doesn't have. And the current text has been written with URI and DNS  
specifications in mind.


>> The requirements recently got changed. Is this still an issue?
>
> Yes.

I left out replies to comments on the requirements section as they seem to  
be being addressed by separate threads. I hope that's ok. If it's not ok  
could someone else in the WG please have a look at those comments and see  
what we need to do with them?


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>
Received on Monday, 4 February 2008 12:04:02 UTC