Re: [Simple] Some thoughts on XCAP's resource architecture from Jonathan Rosenberg on 2004-12-04 (ietf-http-wg@w3.org from October to December 2004)

From: Jonathan Rosenberg <jdrosen@cisco.com>
Date: Sat, 04 Dec 2004 00:53:13 +0000
To: Lisa Dusseault <lisa@osafoundation.org>
CC: HTTP working group <ietf-http-wg@w3.org>, "'simple@ietf.org'" <simple@ietf.org>
Message-ID: <41B10A54.5060209@cisco.com>
Thoughts inline.

Lisa Dusseault wrote:

> During the DC IETF, I expressed some reservations about XCAP to Ted and 
> Jonathan. Jonathan asked me to send a message to the SIMPLE list with my 
> comments, so here it is...
> 
> Based on the mailing list on the traffic, it appears that  XCAP is 
> supposed to be an extension or profile of HTTP, rather than just a 
> protocol that mimics the HTTP interaction style, and that as such it is 
> intended to be compatible with other extensions of HTTP.

It is not an extension, it is a usage or profile. That is, an xcap 
server is a compliant HTTP server. What we are doing is defining how to 
map HTTP URLS to a set of resources that constitute the components of an 
http document. The nature of those resources has an impact on how you 
might want to cache them and handle their etags, and that is described 
in xcap.


   I'm concerned
> that the current architecture of XCAP makes this difficult.  In 
> particular the XCAP resource ontology and the URL addressing style that 
> goes with it shifts the HTTP design along two major axes:
> 
> 1) Resource granularity
> 2) Dependency between resource
> 
> The first shift is in size and number of resources.  Because the path 
> part of the URL allows for XML node selection, there are many more 
> resources for a given volume of content material.  This affects us in a 
> number of ways.
> 
> 1a) HTTP intermediaries and scaling: caches aren't designed to cache 
> huge numbers of tiny resources.  It would probably be wise to disable 
> caching on XCAP responses.

This is described in section 9 of xcap, which more or less says that you 
will want to disable caching for dynamic resources - ones frequently 
written by clients or otherwise. However, there are application usages 
where the data is primarily read-only. In that case, its reasonable to 
allow caching of those resources.

As such, there is an interaction with xcap and caching, but not because 
the resources are small - because they may be dynamic. This is an issue 
for any http resource.

> 
> 1b) HTTP servers aren't designed for that many small resources.  There's 
> a certain amount of overhead to maintaining the metadata (at a minimum, 
> the ETag) for so many small resources.  An HTTP server might have to be 
> rearchitected to do a scalable job of supporting XCAP, which increases 
> the XCAP implementation costs in the long run.

This is a possibly an implementation issue, certainly not a protocol 
one. As Joe pointed out in this thread, http itself says nothing of the 
typical size of resources, and lots of them are these tiny little 
things. In any case, I might expect xcap implementations to use database 
backing stores, but its an implementation choice and ymmv.

> 
> 1c) Performance: HTTP is designed to batch requests in a certain way 
> based on the granularity assumptions.  Recall that latency is a much 
> bigger problem than bandwidth above a certain (low) bandwidth, and in 
> modern Internet applications it's usually the latency that kills you.  A 
> more granular approach to resources doesn't in itself kill performance 
> but it does if you stay with HTTP's request granularity.  What XCAP is 
> saving in bandwidth it will lose, in many use cases, in latency costs.

I don't follow here. If you want to pipeline requests where you don't 
require conditional puts, go ahead and do so.

> 
> 1d) Extensions to HTTP have also been designed with HTTP's current 
> granularity in mind.  RFC2518, RFC3253, RFC3229, RFC3744 all extend HTTP 
> in useful ways, and they're all written with the assumption that the 
> granularity of resources is pretty much what it is today.  Access 
> control, in particular, has a lot of overhead per resource

Again, I don't think http itself says anything about what the 
granularity of a resource is supposed to be.

> 
> 2)  Dependencies:  HTTP servers are designed such that static resources 
> are handled independently of each other. Their ETag management is 
> stand-alone, the request and response handling and concurrency are 
> designed for that independence.  By contrast, XCAP contemplates a large 
> number of resources which really map to parts of the same underlying 
> file.  As far as I can tell, that introduces dependencies between 
> resources (for example that a PUT to one URL would require the ETag of 
> another URL to change).

Yes, xcap does introduce interdependencies between resources. AFAIK, 
there is nothing in http that says this is disallowed. Indeed, one can 
very well imagine that, in many cases, a change in one resource - 
affected through http directly or through back end applications, will 
affect other resources too.

> 
> 2a) HTTP implementation barriers.  The last HTTP server I developed 
> would have to be rearchitected in several places to handle XCAP's 
> interdependencies, work beyond what you'd expect from adding XCAP 
> support.  Throughout the server, the code uses exact matching of URLs to 
> figure out what to do -- not URL pattern matching. So for example:
>  - The way ETags were generated and stored and changed would have to be 
> thrown out because ETags were generated independently for every resource.
>  - Since resources were independent, write requests for different 
> resources could be handled concurrently with ease, but that would have 
> to change.

The proof is in the implementations; we have a few already, and those 
folks have posted here that they havent seen these problems.

> 
> 2b) How interdependencies work within existing HTTP extensions: For one, 
> somebody would have to write a specification for how the existing access 
> control standard (RFC 3744) might work with XCAP.  Since XCAP can have 
> two different URLs that point to the same underlying piece of data, what 
> does it mean to apply certain access control settings to either or both 
> of those URLs?

Its a good question, but not one new to this application. Certainly 
other http applications may require a client to obtain multiple locks in 
order to make a change across several resources that they require to 
affect. Same would be true here.

> 
> I haven't examined every feature of HTTP and its extensions to see how 
> well it deals with interdependencies, but that's a sampling.
> 
> So, what to do? It doesn't seem to me that XCAP is going to go back to 
> the drawing board (or needs to), but it would be sufficient for most of 
> the above concerns to simply make the definition of "resource" stay with 
> the root XML documents that XCAP deals with.  The existing extensions to 
> HTTP work a lot better on that size resource.  Part of this change 
> involves putting the XPATH-like part of the XCAP URL out of the path 
> part of the URL.  It could go in a header or even in the URL after the 
> query delimiter (the question mark).  There is a theoretical problem 
> with using query parameters on HTTP URLs in PUT requests if 
> write-through caches don't know how to handle those, but there aren't a 
> lot of write-through caches and overall it's a smaller problem and less 
> work to deal with.

I think that the PUT issue is a small detail, but there is a much more 
fundamental problem with what you are proposing.

The whole idea of xcap is that you can PUT a section of the xml document 
to the server, and that this made sense because the resource you were 
PUTting to *was* section of the xml document you want to affect. The URL 
referred to that resource. Thus, it is fundamental that the resources 
that we are manipulating are the various pieces of the overarching xml 
document. Your complaint is not that we are using a query string instead 
of a path separator, but that the *resource* is the document 
sub-components instead of the document itself. As such, changing xcap so 
that the only resource is the document is a fundamental architectural 
change. The fact that a PUT with query parameters might not work is 
symptomatic of the problem with this approach, and I suspect there are 
others.

Jamie Lokier wrote:
> Lisa Dusseault wrote:
> 
>>> If you have multiple changes to make to a 1 MB or smaller document, 
>>> batch them up together if possible, even if it requires uploading the 
>>> whole document afresh.  The current design of XCAP encourages changes 
>>> to be made independently, and each change will require a full 
>>> round-trip (no pipelining possible because you need to wait for the 
>>> server to respond with an ETag each time).
> 
> 
> That seems like a flaw which should be fixed.

I have to disagree with Lisa here, this is not a constraint on xcap. You 
do not have to wait for the etag to send the next request. You only have 
to wait for the etag if you wish your next change to be conditioned on 
the fact that the document hasn't been modified since your last change. 
That's true of http generally, and true here too - its the defining 
purpose of the If-* headers. The various application usages talk about 
ways in whcih the documents are structured so that one doesn't need to 
used conditional PUts'. In that case, feel free to pipeline.

Thanks,
Jonathan R.

-- 
Jonathan D. Rosenberg, Ph.D.                   600 Lanidex Plaza
Director, Service Provider VoIP Architecture   Parsippany, NJ 07054-2711
Cisco Systems
jdrosen@cisco.com                              FAX:   (973) 952-5050
http://www.jdrosen.net                         PHONE: (973) 952-5000
http://www.cisco.com
Received on Saturday, 4 December 2004 12:37:26 UTC