Review of 22 Jan editor's draft from Thomas Roessler on 2008-01-23 (public-appformats@w3.org from January 2008)

From: Thomas Roessler <tlr@w3.org>
Date: Wed, 23 Jan 2008 19:21:10 +0100
To: public-appformats@w3.org
Message-ID: <20080123182110.GC316@iCoaster.does-not-exist.org>
My notes about last night's editor's draft...

- The abstract is somewhat hard to understand.

  Also, it mis-characterizes the document somewhat, as this is not
  about "enabling client-side cross-site requests", but about a
  mechanism by which web applications can communicate policies about
  cross-site requests and data sharing to other web applications.

- Introduction: I find much of the introduction chapter somewhat
  disorganized.  I'd like the document to start out by saying rather
  precisely what's going on, along these lines:
  
  	Web application technologies commonly apply same-origin
	restrictions to network requests.  These restrictions
	prevent a web application running from one origin from
	obtaining data retrieved from another origin, and also
	limit the amount of unsafe HTTP requests that can be
	automatically launched toward destinations that differ from
	the runnign application's origin.
	
	In Web application technologies that follow this pattern,
	network requests typically use ambient authentication and
	session management information, including HTTP
	authentication and cookie information.
	
	This specification extends this model in several ways:
	
	* Web applications are enabled to annotate the data that are
	  returned in response to an HTTP request with a set of
	  origins that should be permitted to read that information
	  by way of the user's web browser.

	  The policy expressed through this set of origins is
	  enforced on the client.

	* Web user agents are enabled to discover whether a target
	  resource is prepared to accept cross-site HTTP requests
	  using unsafe methods (non-GET) from a set of origins.
	  
	  The policy expressed through this set of origins is
	  enforced on the client.

	* Server side applications are enabled to discover that an
	  HTTP request was deemed a cross-site request by the client
	  web user agent, through the Referer-Root HTTP header.
	  
	  This extension enables server side applications to enforce
	  limitations on the cross-site requests that they are
	  willing to service.

        This specification is a building block for other
	specifications, which will define the precise model by which
	this specification is used.  Examples for user
	specifications include extensions to XMLHttpRequest, XBL2, @@...

- Going on in the current introduction text, the specification
  doesn't define an access control policy, but an access control
  policy framework.  The use cases and requirements should move up
  into the introduction, or at least close to it.  The Note about
  form.submit() belongs somewhere into the design FAQ or security
  considerations.  The access control policy is *not* "defined in
  the resource" (except for XML documents).  "The client is trusted"
  is an awfully broad statement.  The text "The resource would look at
  follows" is followed by a snippet from an HTTP transaction.

- Conformance criteria: The document says awfully little in terms of
  "a specification that wants to use this framework needs to do the
  following things", even though section 2 claims so.
  
- "User agents MAY optimize..." is besides the point.  Instead: "User
  agents MAY employ any algorithm to implement this specification
  that leads to the exact same results as the algorithms included in
  this specification."

- There's a lot of very detailed stuff about white space separated
  lists going on in section 2; I'd rather see this dropped, and
  grammars and useful language used closer to the parsing steps.

- The security considerations should ideally be a discussion of
  security effects, i.e., "can trigger GET, and here's why this is
  harmless"; "we care about POST, because".  Instead, there's a lot
  of normative material clumped together in this section that would
  better go to the places where actual processing is described.

- "Authors sharing content with domains that are on shared hosting
  environments" rather misses the point by just talking about ports:
  Namely, that -- because we assume that the protocol / technology
  that hosts the access-control framework uses a same-origin policy
  -- authorizations can only be given with a granularity of origins.
  Anything below that is futile.

- "evil" applications don't really have a place here, maybe talk
  about an attacker.  "Authors SHOULD ensure that GET ..." is
  re-stating HTTP; that should be rephrased as an admnishment to
  adhere to the HTTP spec's semantics.

- "Authors are encouraged to check the Referer-Root HTTP header" --
  this should be somewhere in the processing model, not a side
  remark in the security considerations.  It *is* an additional
  policy enforcement point, and should be called out clearly.

- The design seems a bit inconsistent about IDNs: The syntax permits
  them, but HTTP doesn'tl the latter is called out in a note.  I'd
  rather see that done consistently.  When speaking about IDNs, it
  might be useful to adapt the A-label and U-label terminology from
  this I-D:
  
  http://tools.ietf.org/html/draft-klensin-idnabis-issues-05

- "If the scheme omitted it will match" is normative language, but
  looks as though it's formatted as a note.  Or maybe I'm just
  confused about the formatting.  Oh, and the grammar is wrong.

- The Access-Control production continues to use comma-separated
  method identifiers.  Also, shouldn't there be at least one method
  given?

- "In case resources on a domain are not in control..." mixes a use
  case and processing rules into the middle of a syntax description,
  and is generally quite a mess. Please make a pass through the
  document to give it a useful structure.

- '"allow rules" can be used to allow read access ...' sounds like a
  remnant from the old voice browser spec.  At this point, I believe
  tha the syntax description should limit itself to describing a
  (multivalued) mapping from authorized origins to methods, with the
  specific exception that GET is used to generically determine
  access to the data returned, no matte what method was used to
  retrieve these data.
  
  (Incidentally, that's a point that is going to confuse policy
  authors without end.  Maybe we need something different here.)

- '"method" rule' is oddly phrased.

- 4.4 says what the syntax of the Referer-Root header is.  It would
  be useful to point out here when that header is transmitted.  In
  particular, "in case the Referer header is not included" makes it
  sound as though user agents had a choice between these headers.

- 5.1, cross-site access request.  The English grammar of the first
  paragraph needs improvement.

- The processing model confuses user agent behavior and input that
  is given to user agent behavior to be specified elsewhere.  That
  doesn't make things particularly easy to read.

- "The referrer root URI ..." assumes an HTTP-like URI syntax.
  That's not necessarily present everywhere.  Needs clean-up!
  
- Much of the processing model is phrased in terms of forward
  references to generic steps.  I find this pseudo-code like
  configuration style extremely hard to read, and suspect that it'll
  make useful security review more difficult than necessary.

- Why is the authorization request cache mandatory?

- The authorization request cache isn't actually an authorization
  request cache, but an authorization decision cache.  The current
  name is confusing at least.

- There is no discussion as to how Vary or Cache-control headers on
  HTTP responses that were received are handled.  How do these
  interact with the separate caching model specified here

- Why does the specification follow redirects upon OPTIONS?  If I
  read RFC 2616 correctly, then redirects for HTTP methods other
  than GET and HEAD shouldn't happen without user intervention.
  
  The current specification material around redirects looks like
  it's pseudo-code ripped out of context; this needs more work to be
  comprehensible, and a clear explanation what the expectations are
  for a hosting specification.  Either the processing model or the
  security considerations should explain very clearly what tradeoffs
  a hosting specification faces in specifying any behavior
  concerning redirects.

- The access control check algorithm goes to an excrutiating level
  of detail, while confusing the reader.  It is probably much easier
  to write up how to parse the various headers into the mapping from
  origins to methods, and how to deal with that.

- Once more, we have forward references to generic material,
  undeclared variables used to pass around information between
  different sections, and a general lack of readability.  For
  example, temp method list isn't temporary, not introduced before
  its first appearance, and only specified in the "allow list check"
  section.

- "parse ... using a streaming XML parser" -- I'm pretty sure you
  don't mean to prescribe use of a streaming XML parser, but rather
  want to allow use of one, right?

- In the allow list check, item 3 of the algorithm looks like it's
  wrong.  This actually prunes the list of methods that are added to
  the temp method list depending on the current request's method.
  Also, this item has bad grammar.

- Having atomic steps like "set the allow access flag to true"
  (point 5) might be a useful technique in programming.  In English
  text, it doesn't actually help understand the algorithm.

- Starting at item 10 of the access item check algorithm, we go into
  defining how domain names are parsed and compared.  That can be
  said in much shorter terms by referring to terms from the relevant
  specs. Roughly: Origin and item are converted to ASCII.  They are
  compared string-insensitively, with the additional property that
  the leftmost label of item might be "*", and can match an
  arbitrary number of labels.  (Or something like this.)

- The requirements tend to confuse authentication and authorization.
  E.g., under 1, you're really talking about deployments that base
  their authorization decisions exclusively on somebody being on the
  right side of a firewall.

- In the part that talks about cross-site POST, it might be useful
  to speak of UPNP as a possible target.

- "Should not fail to properly enforce security policy..." sounds
  like a copy of requirement 13 later on.

- I continue to disagree with requirement 3, "must be deployable to
  existing..."; this is highly dependent on the cricumstances of a
  particular deployment.  I suggest saying clearly what is really
  meant.  E.g., what abilities should be sufficient in order to
  deploy the thing -- like, ability to write to XML files.

- requirement 4 (more "easily deploy" that I actually disagree with)
  only holds for XML content.  Please qualify this requirement.

- req 6 could use some elaboration. The current text could be
  misread to say that the authorized party should be identified with
  resource-level granularity, which we know is a bad idea.

- req 9 is somewhere between an implementation requirement and a use
  case.  Strikes me as somewhat wierdly phrased..

- req 10 effectively says "APIs for cross-site data access
  shouldn't differ from these for same-origin data access"; I'd
  suggest changing to that

- req 12 is badly worded.  I suspect it means "shouldn't break
  HTTP".  If there's more to it, please express that more clearly.

Regards,
-- 
Thomas Roessler, W3C  <tlr@w3.org>
Received on Wednesday, 23 January 2008 18:27:00 UTC