draft-ietf-webdav-binding-protocol-02 from fielding@ebuilt.com on 2000-01-21 (w3c-dist-auth@w3.org from January to March 2000)

From: <fielding@ebuilt.com>
Date: Thu, 20 Jan 2000 22:09:08 -0800
To: w3c-dist-auth@w3.org
Message-ID: <OFD84EAC65.2680719E-ON8825686D.000A8B5A@ebuilt.net>
Here are my detailed comments on the binding protocol specification.
Some of these have already been mentioned by Yaron, but I don't have
time to separate them into separate messages.  Sorry.

General
=======

I am a bit surprised by the size of the specification. After all, the
manpage for the Unix ln command is only 4 pages.  I think that this shows
there were far too many extremely intelligent people working on the spec.

I am going to highlight portions of the spec which are not relevant to
the Web interface, and thus shouldn't be part of the protocol. In doing
so, I am not trying to slam the authors or even the fact that many of
these reflect important implementation issues -- they simply aren't
relevant to the Web, for reasons that I will attempt to explain.

Formatting
==========

I know that the draft guidelines say that the author should paginate
the document.  The guidelines are wrong.  Pagination is only applicable
to final RFCs, and only get in the way of draft reviews. In particular,
the RFC editor will properly paginate the document according to the rules
of nroff processing macros, so completely messing up all the figures
and examples by automated addition of header/footers in the *wrong*
places is just annoying.

Details
=======

> Abstract

The first and third paragraphs of the Abstract are irrelevant to this
memo -- such things belong in the introduction (if at all).  A simple
rewrite is:

 This specification extends the Hypertext Transfer Protocol (HTTP/1.1),
 as previously extended by the WebDAV Distributed Authoring Protocol,
 to enable clients to create new identifiers for existing resources.
 It defines the semantics for a new request method, BIND, that creates
 a new name binding within the namespace of a target DAV collection.


> Servers are required to insure the integrity of any bindings 
> that they allow to be created.

This is an implementation concern that has no relevance to the client.
Resources disappear for many reasons, and it isn't possible for a
binding to be more persistent than a resource.

> 1 Notational Conventions

This section should follow the Introduction, not precede it.

> 2 Introduction
> 
> This is one of a pair of specifications that extend the WebDAV 
> Distributed Authoring Protocol to enable clients to create new access 
> paths to existing resources.  This capability is useful for several 
> reasons:

The above paragraph can be deleted without losing any useful info.

> URIs of WebDAV-compliant resources are hierarchical and correspond to a 
> hierarchy of collections in resource space.  The WebDAV Distributed 
> Authoring Protocol makes it possible to organize these resources into 
> hierarchies, placing them into groupings, known as collections, which 
> are more easily browsed and manipulated than a single flat collection. 
> However, hierarchies require categorization decisions that locate 
> resources at a single location in the hierarchy, a drawback when a 
> resource has multiple valid categories. For example, in a hierarchy of 
> vehicle descriptions containing collections for cars and boats, a 
> description of a combination car/boat vehicle could belong in either 
> collection. Ideally, the description should be accessible from both. 
> Allowing clients to create new URIs that access the existing resource 
> lets them put that resource into multiple collections.
> 
> Hierarchies also make resource sharing more difficult, since resources 
> that have utility across many collections are still forced into a single 

> collection. For example, the mathematics department at one university 
> might create a collection of information on fractals that contains 
> bindings to some local resources, but also provides access to some 
> resources at other universities.  For many reasons, it may be 
> undesirable to make physical copies of the shared resources on the local 

> server: to conserve disk space, to respect copyright constraints, or to 
> make any changes in the shared resources visible automatically. Being 
> able to create new access paths to existing resources in other 
> collections or even on other servers is useful for this sort of case.

resources != storage objects.  What is the motivation for creating aliases
within one namespace to identifiers in some other namespace?  That is what
needs to be described in the above paragraph -- storage is irrelevant.

> The BIND method defined here provides a mechanism for allowing clients 
> to create alternative access paths to existing WebDAV resources. HTTP 
> and WebDAV methods are able to work because there are mappings between 
> URIs and resources.  A method is addressed to a URI, and the server 
> follows the mapping from that URI to a resource, applying the method to 
> that resource.  Multiple URIs may be mapped to the same resource, but 
> until now there has been no way for clients to create additional URIs 
> mapped to existing resources. 
> 
> BIND lets clients associate a new URI with an existing WebDAV resource, 
> and this URI can then be used to submit requests to the resource.  Since 

> URIs of WebDAV resources are hierarchical, and correspond to a hierarchy 

> of collections in resource space, the BIND method also has the effect of 

> adding the resource to a collection.  As new URIs are associated with 
> the resource, it appears in additional collections.

These are not WebDAV resources -- they are Web resources.  The aliased
resource doesn't even need to be in the http namespace.

> The companion specification, RFC xxxx, defines redirect reference 
> resources, a different mechanism for creating alternative access paths 
> to existing resources.  A redirect reference is a resource in one 
> collection whose purpose is to forward requests to another resource (its 

> target), usually in a different collection.  In this way, it provides 
> access to the target resource from another collection.  It redirects 
> most requests to the target resource using the HTTP 302 (Moved 
> Temporarily) status code, thereby providing a form of mediated access to 

> the target resource.

Forwarding requests is what proxies do.  This paragraph should simply say

  Another mechanism for creating alternative access paths to existing
  resources, using redirect references, will be defined in a separate
  specification.

and the three comparison paragraphs deleted.  Cross-specifying protocols
is evil.

> 3 Terminology
> 
> The terminology used here follows and extends that in the WebDAV 
> Distributed Authoring Protocol specification [WebDAV]. Definitions of 
> the terms resource, Uniform Resource Identifier (URI), and Uniform 
> Resource Locator (URL) are provided in [URI].
> 
> URI Mapping
>      A relation between an absolute URI and a resource.  For an 
>      absolute URI U and the resource it identifies R, the URI mapping 
>      can be thought of as (U => R).  Since a resource can represent 
>      items that are not network retrievable, as well as those that are, 
>      it is possible for a resource to have zero, one, or many URI 
>      mappings. Mapping a resource to an "http" scheme URL makes it 
>      possible to submit HTTP protocol requests to the resource using 
>      the URL.

Actually, it is more like ({U,t} -R-> {V1, V2, ...}), where t is the
current time, R is the resource, -R-> is a mapping function that has
been implemented according to the semantics of resource R), and the range
is a set of values representing that resource at time t.

I covered some of the problems with the way the binding spec tries to
redefine resources in other e-mail.  However, the real deciding issue
in my mind is that none of this is useful to the client in defining
the protocol.  It would be better to simply say:

   For the purpose of this protocol, each unique URI is considered to
   be identifying a unique resource, even when the URI corresponds to
   a binding created by BIND.  The only distinguishing characteristics
   between the two that may be discovered by the client are ...

> Path Segment
>      Informally, the characters found between slashes ("/") in a URI.
                                                              xxxxxxxxx
in the path component of a hierarchical URI.

>      Formally, as defined in section 3.3 of [URI].

> Binding

Ugh.  Sorry, but all this does is define a bunch of things which are of
no concern to the client.  The abstraction is broken.

A binding is a name within a collection namespace.  This includes "normal"
resources, redirect references, immediately subsidiary collection names,
etc.  We know this because we want a DELETE on a binding to have the 
effect
of deleting any of these names -- even if the server is required to 
respond
with an error under some conditions, since the semantics remain the same.
The only thing BIND does is define a new way to create a binding,
similar to PUT and POST.  Once the binding is created, there is no way
for the client to differentiate it from the original PUT resource.  Thus,
the protocol doesn't care either.


> Collection
>      A resource that contains, as part of its state, a set of bindings 
>      that identify member resources.

Hey, that's what I just wrote -- I should have used a larger read-ahead.
So, if a collection is a set of bindings, why is it so hard to define
bindings?  They are just names.

>      In [WebDAV], a collection is defined as containing a list of 
>      internal member URIs, where an internal member URI is the URI of 
>      the collection, plus a single path segment.  This definition 

sucks.  [Well, it would make for a shorter spec, but okay I guess you
can't just replace it like that.  Yaron's suggestion to move this
stuff to an appendix is a good one.]

> 4 Overview of Bindings
> 
> Bindings are part of the state of a collection. In general, there is a 
> one-to-many correspondence between a collection's bindings and its 
> internal member URIs, as illustrated in Figure 2 above.  The URI segment 

> associated with a resource by one of a collection's bindings is also the 

> final segment of one or more of the collection's internal member URIs. 
> The final segment of each internal member URI identifies one of the 
> bindings that is part of the collection's state.

Egads.  Bindings are the names within a collection.

> Bindings are not unique to advanced collections, although the BIND 
> method for explicitly creating bindings is introduced here.  Existing 
> methods that create resources, such as PUT, MOVE, COPY, and MKCOL, 
> implicitly create bindings.  There is no difference between implicitly 
> created bindings and bindings created with BIND.

Yes!
 
> The identity of a binding C:(S -> R) is determined by the URI segment 
> (in its collection) and the resource that the binding associates.  If 
> the resource goes out of existence (as a result of some out-of-band 
> operation), the binding also goes out of existence.  If the URI segment 
> comes to be associated with a different resource, the original binding 
> ceases to exist and another binding is created.

No!  It is fundamentally impossible to implement the above.  In order
to be a protocol requirement, we must specify it in terms of the 
interface.
In other words, what must be done on a PUT/POST/DELETE upon a binding,
where "binding" includes any resource in the namespace.  Once you do
that, you will discover that no one will want to implement this.

> It would be very undesirable if one binding could be destroyed as a side 

> effect of operating on the resource through a different binding.  It is 
> not acceptable for moving a resource through one binding to disrupt 
> another binding, turning that binding into a dangling path segment.  Nor 

> is it acceptable for a server, after removing one binding, to reclaim 
> the system resources associated with its resource while other bindings 
> to the resource remain.  Implementations MUST ensure the integrity of 
> bindings.

I don't need any of these requirements.  If I don't need them, then they
must be specifying something more than a binding, because I've already
implemented every semantic in this spec aside from the BIND method itself.
I am damn sure that I am not going to post-hoc implement alias integrity
within Apache just because this protocol says that a DELETE must result
in removal of all bindings to the binding being deleted.  Late binding,
where the existence or nonexistence of a resource is determined at the
time requested, is far easier to implement and corresponds to what
a user expects to happen -- no magic.

I'm just going to skip the rest of the spec that mentions integrity.

> 5 BIND Method
> 
> 5.1 Overview of BIND
> 
> The BIND method creates a new binding between the resource identified by 

> the Request-URI and the final segment of the Destination header (minus 
> any trailing slash).  This binding is added to the collection identified 

> by the Destination header minus its trailing slash (if present) and 
> final segment.  The Destination header is defined in Section 9.3 of 
> [WebDAV].

As discussed in other mail, this is backwards because the Request-URI
needs to identify the collection that will be changed so that the right
authentication is picked up prior to other processing.  It can be
implemented in the reverse, but doing so is much less efficient for
a general-purpose HTTP server.

>...
> After successful processing of a BIND request, it MUST be possible for 
> clients to use the URI in the Destination header to submit requests to 
> the resource identified by the Request-URI.

That says nothing.  Not even what was intended.  A client can submit
anything it wants at any time.

> By default, if the Destination header identifies an existing binding, 
> the new binding replaces the existing binding. This default binding 
> replacement behavior can be overridden using the Overwrite header 
> defined in Section 9.6 of [WebDAV]. 

Yuck.  I don't like this either -- force the client to do a DELETE.

> 5.2 Bindings to Collections
> 
> Bindings to collections can result in loops.  If a server wants to 
> prevent a loop from being created, it MAY fail the BIND request with a 
> 403 (Forbidden) status code.  If a server allows a loop to be created, 
> it MUST detect the loop when processing "Depth: infinity" requests that 
> encounter the loop.  It is sometimes possible to complete an operation 
> in spite of the presence of a loop.  However, the 506 (Loop Detected) 
> status code is defined in Section 12.1 for use in contexts where an 
> operation is terminated because a loop was encountered. 

I'd prefer a new 4xx code instead of 403.  If we are going to all the
trouble of officially extending HTTP, we might as well take advantage
of the benefits.

> Creating a new binding to a collection makes each resource associated 
> with a binding in that collection accessible via a new URI, and thus 
> creates new URI mappings to those resources but no new bindings.

except for the binding created.  I think we'd all live longer if this
and the following paragraphs trying to explain it were just left to the
reader to figure out for themselves -- the description is far more
difficult to understand than just saying "you can create multiple
bindings to a collection resource".  Besides, the Note below does a
far more effective job of saying the same thing.

> 5.3 URI Mappings Created by a BIND
> 
> Suppose a BIND request causes a binding from "Binding-Name" to resource 
> R to be added to a collection, C.  Then if C-MAP is the set of URI's 
> that were mapped to C before the BIND request, then for each URI "C-URI" 

> in C-MAP, the URI "C-URI/Binding-Name" is mapped to resource R following 

> the BIND request.

Wow, that's an even more obscure way of saying that a BIND adds a name
to a collection such that the new name indirectly identifies R.

> Note that if R is a collection, additional URI mappings are created to 
> the descendents of R.  Also note that if a binding is made in collection 

> C to C itself (or to a parent of C), an infinite number of mappings is 
> introduced.

>...

> 5.5 BIND Status Codes
> 
> 201 (Created): The binding was successfully created.
> 
> 400 (Bad Request): The client set an invalid value for the Destination 
> header or Request-URI.
> 
> 403 (Forbidden): This server has a policy that forbids creation of 
> bindings that would result in loops.

403 means that the request is forbidden for reasons that the server
does not want to explain to the client.  You can't redefine 403 to be
this specific because its purpose is not to be specific.

> 412 (Precondition Failed): The Overwrite header is "F", and a binding 
> already exists for the Destination header.
> 
> 507 (Cross-Server Binding Forbidden): The server is unable to create the 

> requested binding because it would bind a segment in a collection on one 

> server to a resource on a different server.

I don't see why that is any different than the code formerly known as 403.

> 6 DELETE and Bindings
> 
> The DELETE method was originally defined in [HTTP]. This section 
> redefines the behavior of DELETE in terms of bindings, an abstraction 
> not available when writing [HTTP]. [HTTP] states that "the DELETE method 

> requests that the origin server delete the resource identified by the 
> Request-URI."  Because [HTTP] did not distinguish between bindings and 
> resources, the intent of its definition of DELETE is unclear.  The 
> definition presented here is a clarification of the definition in 
> [HTTP].

Much easier to just say that DELETE in HTTP/1.1 now refers to deletion
of any binding, including those created by bind, and then add
 
> The DELETE method requests that the server remove the binding between 
> the resource identified by the Request-URI and the binding name, the 
> last path segment of the Request-URI. The binding MUST be removed from 
> its parent collection, identified by the Request-URI minus its trailing 
> slash (if present) and final segment. 
> 
> Once a resource is unreachable by any URI mapping, the server MAY 
> reclaim system resources associated with that resource. If DELETE 
> removes a binding to a resource, but there remain URI mappings to that 
> resource, the server MUST NOT reclaim system resources associated with 
> the resource.

> Although [WebDAV] allows a DELETE to be a non-atomic operation, the 
> DELETE operation defined here is atomic.  In particular, a DELETE on a 
> hierarchy of resources is simply the removal of a binding to the 
> collection identified by the Request-URI, and so is a single (and 
> therefore atomic) operation. 

Single operations do not imply atomicity.  Atomic implies that nothing
else can happen during the processing of the request, which is false
for any but the most trivial operations.  DELETE is never trivial.

> Section 8.6.1 of [WebDAV] states that during DELETE processing, a server 

> "MUST remove any URI for the resource identified by the Request-URI from 

> collections which contain it as a member."  Servers that support 
> bindings MUST NOT follow this requirement.

Huh?  For the sake of this discussion, just ignore WebDAV.  Define what
the new requirement is and note in the appendix how (and why) it differs
from RFC 2518.  On other words, the above requirement creates an infinite
loop binding between two protocol specs, rather than updating one.

> 9 Bindings and Other Methods

I agree with Yaron that static versus dynamic in this section is 
irrelevant.

> 10 Determining Whether Two Bindings Are to the Same Resource
> 
> It is useful to have some way of determining whether two bindings are to 

> the same resource.  Two different resources might have identical 
> contents and identical values for the properties defined in [WebDAV]. 
> Although the DAV:bindings property defined in Section 13.1 provides this 

> information, it is an optional property.
> 
> The REQUIRED DAV:resourceid property defined in Section 13.2 is a 
> resource identifier, which MUST be unique across all resources for all 
> time.  If the values of DAV:resourceid returned by PROPFIND requests 
> through two bindings are identical, the client can be assured that the 
> two bindings are to the same resource.

Whoa, where did this requirement come from?  The URI is a resource ID.
If somebody wants to create a general metadata field for some sort of
sacred-name, then go wild, but this is not needed for bindings.

A question like this should be answered by a method applied to the
collection, not to the individual bindings.

> 10.1 resourceid URI Scheme
> 
> The value of DAV:resourceid is a URI and may use any URI scheme that 
> guarantees the uniqueness of the value across all resources for all 
> time.  The resourceid URI scheme defined here is an example of an 
> acceptable URI scheme.
> 
> The resourceid URI scheme requires the use of the Universal Unique 
> Identifier (UUID) mechanism, as described in [ISO-11578].  UUID 
> generators may choose between two methods of creating the identifiers. 
> They can either generate a new UUID for every identifier they create or 
> they can create a single UUID and then add extension characters.  If the 

> second method is selected, then the program generating the extensions 
> MUST guarantee that the same extension will never be used twice with the 

> associated UUID.
> 
> resourceid-URI = "resourceid:" UUID [Extension] ; The UUID production is 

> the string representation of a UUID, as defined in [ISO-11578].  Note 
> that white space (LWS) is not allowed between elements of this 
> production.
> 
> Extension = path ; path is defined in Section 3.3 of [URI].

No.  No.  No.

There shall be no repeat of the stupid "opaquelocktoken" URI scheme
just because this is another DAV-related spec.  A URI scheme defines
a namespace, not a purpose for a namespace.  If you want to use UUIDs,
the implementer can CHOOSE to use the "uuid" scheme.  Or they can
choose to use the "mid" scheme, the "urn" scheme, or any other scheme
that fits the need at hand.  Defining a new URI scheme for each new way
that a URI can be used is CONTRARY TO THE DESIGN GOALS OF URI.

>...

> 16.3 Bindings, and Denial of Service
> 
> Denial of service attacks were already possible by posting URLs that 
> were intended for limited use at heavily used Web sites.  The 
> introduction of BIND creates a new avenue for similar denial of service 
> attacks.  If cross-server bindings are supported, clients can now create 

> bindings at heavily used sites to target locations that were not 
> designed for heavy usage.

That isn't denial of service.  There is no need for this subsection.
 
I think that's it.  I'm sorry that I didn't get a chance to review
this earlier, but I've been pretty busy the past two years.

...........................................................
Roy T. Fielding                         fielding@ebuilt.com
eBuilt, Inc.                            tel:+1.949.609.0000
2652 McGaw Ave.                         fax:+1.949.609.0001
Irvine, CA 92614-5840   USA           http://www.eBuilt.com
...........................................................
Received on Friday, 21 January 2000 01:11:17 UTC