Some thoughts on XCAP's resource architecture

During the DC IETF, I expressed some reservations about XCAP to Ted and 
Jonathan. Jonathan asked me to send a message to the SIMPLE list with 
my comments, so here it is...

Based on the mailing list on the traffic, it appears that  XCAP is 
supposed to be an extension or profile of HTTP, rather than just a 
protocol that mimics the HTTP interaction style, and that as such it is 
intended to be compatible with other extensions of HTTP.  I'm concerned 
that the current architecture of XCAP makes this difficult.  In 
particular the XCAP resource ontology and the URL addressing style that 
goes with it shifts the HTTP design along two major axes:

1) Resource granularity
2) Dependency between resource

The first shift is in size and number of resources.  Because the path 
part of the URL allows for XML node selection, there are many more 
resources for a given volume of content material.  This affects us in a 
number of ways.

1a) HTTP intermediaries and scaling: caches aren't designed to cache 
huge numbers of tiny resources.  It would probably be wise to disable 
caching on XCAP responses.

1b) HTTP servers aren't designed for that many small resources.  
There's a certain amount of overhead to maintaining the metadata (at a 
minimum, the ETag) for so many small resources.  An HTTP server might 
have to be rearchitected to do a scalable job of supporting XCAP, which 
increases the XCAP implementation costs in the long run.

1c) Performance: HTTP is designed to batch requests in a certain way 
based on the granularity assumptions.  Recall that latency is a much 
bigger problem than bandwidth above a certain (low) bandwidth, and in 
modern Internet applications it's usually the latency that kills you.  
A more granular approach to resources doesn't in itself kill 
performance but it does if you stay with HTTP's request granularity.  
What XCAP is saving in bandwidth it will lose, in many use cases, in 
latency costs.

1d) Extensions to HTTP have also been designed with HTTP's current 
granularity in mind.  RFC2518, RFC3253, RFC3229, RFC3744 all extend 
HTTP in useful ways, and they're all written with the assumption that 
the granularity of resources is pretty much what it is today.  Access 
control, in particular, has a lot of overhead per resource

2)  Dependencies:  HTTP servers are designed such that static resources 
are handled independently of each other. Their ETag management is 
stand-alone, the request and response handling and concurrency are 
designed for that independence.  By contrast, XCAP contemplates a large 
number of resources which really map to parts of the same underlying 
file.  As far as I can tell, that introduces dependencies between 
resources (for example that a PUT to one URL would require the ETag of 
another URL to change).

2a) HTTP implementation barriers.  The last HTTP server I developed 
would have to be rearchitected in several places to handle XCAP's 
interdependencies, work beyond what you'd expect from adding XCAP 
support.  Throughout the server, the code uses exact matching of URLs 
to figure out what to do -- not URL pattern matching. So for example:
  - The way ETags were generated and stored and changed would have to be 
thrown out because ETags were generated independently for every 
resource.
  - Since resources were independent, write requests for different 
resources could be handled concurrently with ease, but that would have 
to change.

2b) How interdependencies work within existing HTTP extensions: For 
one, somebody would have to write a specification for how the existing 
access control standard (RFC 3744) might work with XCAP.  Since XCAP 
can have two different URLs that point to the same underlying piece of 
data, what does it mean to apply certain access control settings to 
either or both of those URLs?

I haven't examined every feature of HTTP and its extensions to see how 
well it deals with interdependencies, but that's a sampling.

So, what to do? It doesn't seem to me that XCAP is going to go back to 
the drawing board (or needs to), but it would be sufficient for most of 
the above concerns to simply make the definition of "resource" stay 
with the root XML documents that XCAP deals with.  The existing 
extensions to HTTP work a lot better on that size resource.  Part of 
this change involves putting the XPATH-like part of the XCAP URL out of 
the path part of the URL.  It could go in a header or even in the URL 
after the query delimiter (the question mark).  There is a theoretical 
problem with using query parameters on HTTP URLs in PUT requests if 
write-through caches don't know how to handle those, but there aren't a 
lot of write-through caches and overall it's a smaller problem and less 
work to deal with.

Full disclosure: I'm partially responsible for the current design 
because I pointed out the write-through cache problem with a previous 
design that used query params in PUT URLs. Unfortunately, I think that 
on balance the problems with the current architecture are worse.

Lisa

Received on Monday, 22 November 2004 02:59:17 UTC