Re: Some thoughts on XCAP's resource architecture

Lisa Dusseault wrote:

> ...
> 1a) HTTP intermediaries and scaling: caches aren't designed to cache 
> huge numbers of tiny resources.  It would probably be wise to disable 
> caching on XCAP responses.

I'd call that an implementation issue, not a problem with the design as 
such. You can avoid it by making responses non-cacheable, ot you can 
kick the vendor of the intermediary to fix the product (for instance, by 
modifying it not to cache very small entitities).

> 1b) HTTP servers aren't designed for that many small resources.  There's 
> a certain amount of overhead to maintaining the metadata (at a minimum, 
> the ETag) for so many small resources.  An HTTP server might have to be 
> rearchitected to do a scalable job of supporting XCAP, which increases 
> the XCAP implementation costs in the long run.

<http://www.ietf.org/internet-drafts/draft-ietf-simple-xcap-05.txt>, 
Section 8.5 seems to indicate that all subordinate resources of the 
document share the same etag, so this wouldn't be a problem. For 
instance, if you would view a subversion server as an XCAP server, the 
global revision number of the repository could be used as an etag.

> 1c) Performance: HTTP is designed to batch requests in a certain way 
> based on the granularity assumptions.  Recall that latency is a much 
> bigger problem than bandwidth above a certain (low) bandwidth, and in 
> modern Internet applications it's usually the latency that kills you.  A 
> more granular approach to resources doesn't in itself kill performance 
> but it does if you stay with HTTP's request granularity.  What XCAP is 
> saving in bandwidth it will lose, in many use cases, in latency costs.

But it enables the client to decide on it's own. If latenty proves to be 
a problem, it can still update larger chunks in single requests.

> 1d) Extensions to HTTP have also been designed with HTTP's current 
> granularity in mind.  RFC2518, RFC3253, RFC3229, RFC3744 all extend HTTP 
> in useful ways, and they're all written with the assumption that the 
> granularity of resources is pretty much what it is today.  Access 
> control, in particular, has a lot of overhead per resource

I disagree that these extensions have been designed with some specific 
particular granularity in mind (having been active member of the WQebDAV 
WG for the last 3,5 years and being one of the authors of RFC3744).

> 2)  Dependencies:  HTTP servers are designed such that static resources 
> are handled independently of each other. Their ETag management is 
> stand-alone, the request and response handling and concurrency are 
> designed for that independence.  By contrast, XCAP contemplates a large 
> number of resources which really map to parts of the same underlying 
> file.  As far as I can tell, that introduces dependencies between 
> resources (for example that a PUT to one URL would require the ETag of 
> another URL to change).

Yes. But why is that a problem? It simply means that it's hard to adapt 
*specific* implementation for XCAP.

> 2a) HTTP implementation barriers.  The last HTTP server I developed 
> would have to be rearchitected in several places to handle XCAP's 
> interdependencies, work beyond what you'd expect from adding XCAP 
> support.  Throughout the server, the code uses exact matching of URLs to 
> figure out what to do -- not URL pattern matching. So for example:
>  - The way ETags were generated and stored and changed would have to be 
> thrown out because ETags were generated independently for every resource.

That would IMHO be a problem anyway, because ETags *can't* work reliably 
if they are independant of each other if the server allows namespace 
operations.

>  - Since resources were independent, write requests for different 
> resources could be handled concurrently with ease, but that would have 
> to change.

I think you're arguing from a very specific implementer's point of view. 
There are already WebDAV servers in place that handle XML in a similar 
way to XCAP (Slide/Tamino and (I think) Oracle10 come to mind).

> 2b) How interdependencies work within existing HTTP extensions: For one, 
> somebody would have to write a specification for how the existing access 
> control standard (RFC 3744) might work with XCAP.  Since XCAP can have 
> two different URLs that point to the same underlying piece of data, what 
> does it mean to apply certain access control settings to either or both 
> of those URLs?

If two URLs map to the same underlying piece of data, they identify the 
same resource. Thus they would have the same ACLs. What am I missing?

> I haven't examined every feature of HTTP and its extensions to see how 
> well it deals with interdependencies, but that's a sampling.
> 
> So, what to do? It doesn't seem to me that XCAP is going to go back to 
> the drawing board (or needs to), but it would be sufficient for most of 
> the above concerns to simply make the definition of "resource" stay with 
> the root XML documents that XCAP deals with.  The existing extensions to 
> HTTP work a lot better on that size resource.  Part of this change 
> involves putting the XPATH-like part of the XCAP URL out of the path 
> part of the URL.  It could go in a header or even in the URL after the 
> query delimiter (the question mark).  There is a theoretical problem 
> with using query parameters on HTTP URLs in PUT requests if 
> write-through caches don't know how to handle those, but there aren't a 
> lot of write-through caches and overall it's a smaller problem and less 
> work to deal with.

Putting the selector into the query part doesn't change anything it all. 
That would be merely a change in syntax with almost no impact in 
implementations (possibly except working around implementation problems 
in intermediaries).

Putting it into separate headers would make the document fragments 
non-resources (not even second-class resources), thus I fail to see how 
that would be an improvement. You would simple have to reinvent a lot of 
syntax to do something that HTTP was already doing for you.

> Full disclosure: I'm partially responsible for the current design 
> because I pointed out the write-through cache problem with a previous 
> design that used query params in PUT URLs. Unfortunately, I think that 
> on balance the problems with the current architecture are worse.

Best regards, Julian

-- 
<green/>bytes GmbH -- http://www.greenbytes.de -- tel:+492512807760

Received on Wednesday, 24 November 2004 22:19:22 UTC