RE: Advanced Collections-04 Review from Slein, Judith A on 1999-08-24 (w3c-dist-auth@w3.org from July to September 1999)

From: Slein, Judith A <JSlein@crt.xerox.com>
Date: Tue, 24 Aug 1999 11:10:00 -0400
To: "'jamsden@us.ibm.com'" <jamsden@us.ibm.com>, w3c-dist-auth@w3.org
Message-ID: <8E3CFBC709A8D21191A400805F15E0DBD2440C@crte147.wc.eso.mc.xerox.com>
Jim, Thanks very much for giving the 04 spec such a thorough review.

I've been working at splitting the collections spec into 3 parts -- one on
bindings, one on redirect references, and one on ordered collections.  The 3
new specs were announced this morning, as you may have noticed.  In addition
to the mechanics of splitting the spec, I've been focusing on giving better
context in the introductions and on clarifying the definitions.  This all
should help with some of your more basic questions.  Most of your issues are
still relevant to the new specs, however. We'll address them in the next
revision.

Explanations below . . .

> -----Original Message-----
> From: jamsden@us.ibm.com [mailto:jamsden@us.ibm.com]
> Sent: Monday, August 23, 1999 3:32 PM
> To: w3c-dist-auth@w3.org
> Subject: Advanced Collections-04 Review
> 
> 
> 
> 
> Here's my review of the advanced collections spec 04. I tried 
> to keep my
> comments based on the order of the spec to uncover anything a 
> new reader might
> have trouble with. This is a nice piece of work. In 
> particular, the BIND method
> seems to be just what we need. However, redirect references 
> still seem pretty
> complex and full of special cases. It would require doing an 
> implementation to
> root out all the detailed issues, but the specification as it 
> stands is pretty
> close to what we need in order to start implementing. Thanks 
> to the authors for
> all the hard work they put into this.
> 
> Last paragraph, section 3 Introduction: Might be helpful to say this
> specification defines client specified, server maintained 
> orderings. Its not
> clear that the server is not supporting the ordering, only 
> maintaining an
> ordering given by a client.

<js>
Yes, it needs to be made more clear what we are defining.

All the orderings we specify are on the server side, and we are most
interested in orderings that are maintained based on client requests to
insert a child at a certain position in the collection's ordering. (We call
these client-maintained orderings.)  But we do have some minimal support for
server-maintained orderings (ones where it's the server's responsibility to
insert children at the right position in the ordering just based on the
ordering identifier).  All we do for this last sort of ordering is provide a
way for clients to discover what orderings the server can support in this
way, and to pick from the list of available server-maintained orderings.
</js>

> 
> 4.1, second bullet: HTTP servers provide URI mappings, but no 
> protocol for
> specifying them. Must be done in server-dependent 
> configuration and often
> requires server restart.

<js>
This bullet is gone now.
</js>

> 
> Last bullet: making support for cross-server bindings 
> optional does not
> eliminate the referential integrity problem caused by 
> disconnected servers. So a
> server could support cross-server bindings sometimes, but not 
> at other times if
> the server is disconnected. Unfortunately, this is stateful 
> and depends on
> connectivity, not anything to with the bindings themselves. 
> So referential
> integrity could never be guaranteed, and either must be 
> optional or cross-server
> bindings must be prohibited.
> 

<js>
</js>

> 4.2, 1st paragraph: aren't bindings a many-to-1 relationship 
> between URI
> mappings to resources? Can't a collection have more than one 
> binding to the same
> internal member? Note that the definition of internal member 
> given in this
> specification implies that a resource can be an internal 
> member of more than one
> collection.

<js>
No. No. No. (if I take you literally, but the spirit of what you say is
correct. Here's what I mean:)

A binding is a relation between the final segment of a URI relative to a
collection and a resource.  So think of the binding as the triple (Segment,
Collection, Resource).
An internal member URI is not a resource, but an absolute URI.  Any given
binding induces one or more internal member URIs.  But bindings are to
resources, not to internal member URIs.  A collection can have more than one
binding to the same resource.
What is really strictly speaking contained in a collection are bindings.  A
resource can be (loosely speaking) contained in more than one collection
because there can be bindings to it in more than one collection.
</js>

> 
> Cross-server bindings: would it be acceptable to 1) refuse to 
> delete a resource
> that would create dangling references? 2) only delete the 
> binding, and employ
> some garbage collection algorithm (including leaving the 
> garbage)? and/or 3)
> delete the resource and all references if it can be known 
> (through say reference
> counts) that there are no other references? Should these 
> suggestions be added to
> the spec to indicate possible ways referential integrity 
> could be maintained
> across servers?

<js>
1 - yes.
2 - yes, this is what we say DELETE means.
3 - I think not, though perhaps I don't understand.  You're only allowed to
delete the binding identified by the request-URL, unless the All-Bindings
header is used.
But 1 and 2 assume that the server knows what bindings there are to the
resource.  The trouble is that we don't define any server-to-server protocol
elements, which is what would be required for the server where the resource
lives to keep an accurate reference count or list of bindings.
</js>

> 
> 4.2.1 says that a Request-URI ending in a "/" must bind to a 
> collection, but
> does not indicate a Request-URI NOT ending in a "/" CAN bind 
> to a collection
> too.
>

<js>
We do need to straighten out what we say about Request-URI that do / do not
end in a "/".
</js>
 
> 4.2.2 506 (Loop Detected) is a server error status code, but 
> this is likely a
> successful operation that clients mostly don't care about. 
> Should be a 2xx
> status code?
> 

<js>
We need to make some decisions about whether to allow loops to be created in
the first place, and if so what to do when they are encountered.  This was a
topic of discussion at Oslo.  There are also some related threads on the
list, but we need more discussion.
</js>

> I had trouble making sense of section 4.2.3. What's a 
> non-WebDAV collection or
> non-WebDAV advanced collection? Where is domain name variant 
> defined? What is
> the authority part of a URL? Should step 4 be left to right? 
> How is S bound to R
> in step 4? Does this algorithm only apply to bindings whose 
> destination is a
> collection?
> 
> The example in 4.2.4 uses the destination URL not the request 
> URL as specified
> in 4.2.3. Are users expected to understand and use this 
> algorithm in order to
> know what is bound to what?

<js>
We know that sections 4.2.3 - 4.2.4 need a lot of work.  I haven't tried to
touch this in the new drafts coming out, but we will fix it.
</js>

> 
> Section 4.2.5, 409 (Conflict): so the binding is only created 
> for the right-most
> path segment. All other collections in the path segment must 
> already exist? BIND
> doesn't create bindings for these path segments in their 
> parent directory too?
> In example 4.2.6, collection /~whitehead/dav/ must already 
> exist? The bind
> operation wouldn't create a binding for dav if it didn't 
> already exist?

<js>
Correct on all counts.
</js>

> 
> Example 4.2.7: the existence of / should not be required. The 
> binding should
> have been created although it is probably not what the user 
> intended. In all
> other cases, WebDAV behaves as if the / were present, and 
> returns any URI's with
> the / added. BIND should be consistent with this convention. 
> In any case, this
> error condition is not described in section 4.2.5.

<js>
We need to look at this.  It's part of the general problem of straightening
out what we say about the presence or absence of "/" and making sure we are
consistent with the rest of WebDAV.
</js>

> 
> Should section 8.6.1 of [WebDAV] be corrected so the last 
> paragraph of 4.2.8
> isn't necessary?

<js>
Yes. It is our intention to try to get this changed as part of the process
of preparing [WebDAV] for draft status.  Jim Whitehead, is this on the
issues list?
</js>

> 
> 4.2.9, 2nd paragraph: if depth is specified for the COPY 
> method, it must be
> Depth: infinity. Depth: 0  or 1 is not allowed.

<js>
I think Depth 0 is allowed according to [WebDAV] Section 8.8.3.
</js>

> 
> Section 4.2.10: The semantics of MOVE are fine, but they are 
> different than
> [WebDAV]. In this case, the resource is not moved at all, 
> only bindings to the
> resource are manipulated. The implication is that the 
> resource is unchanged. Its
> the same resource in the same physical storage location on 
> the same server, etc.
> This is not the case with [WebDAV], and may not be what was 
> desired. Perhaps
> there are two MOVE operations that need to be distinguished. 
> Note that COPY
> created R' while MOVE didn't. Another possible outcome for 
> the example is that a
> new resource is created R' which is a copy of R, and URI 1, 
> URI 2, and URIX are
> all bound to R'. All bindings to R are deleted. This interpretation is
> consistent with [WebDAV]. Second to the last sentence in the 
> last paragraph: ...
> request-URI cannot be moved.

<js>
A lot of work went into trying to insure that the definition of MOVE in
advanced collections is logically equivalent to the definition in [WebDAV].
The authors of [WebDAV] were consulted.  I suppose we should claim only
equivalence from the point of view of the client, who doesn't know about
storage locations, but only about what URIs can be used to access the
resource before and after the MOVE operations.
</js>

> 
> Section 4.2.10.1: OK, we're in agreement. However, it is not 
> clear what server
> writers are required to do with implementation notes. It also 
> seems the protocol
> should not be specifying anything about implementation. 
> Perhaps these two
> sections can be replaced by one that specifies a logical move 
> where either
> implementation would be valid. Or, these are not equivalent 
> logical operations
> and clients may wish to distinguish them. (I don't think so 
> though as the client
> probably couldn't tell the difference).

<js>
We can change the name of this section and revise the wording if that seems
better.  But we want to keep the basic information of this section, as this
is where we were trying to explain the equivalence of our MOVE semantics to
the semantics in [WebDAV].
</js>

> 
> Section 4.2.12: Seems to assume that all bindings have the 
> same behavior if they
> are bindings to the same resource. However, servers do 
> special things with
> non-WebDAV collection names (acting as functors) such as 
> cgi-bin, servlet, etc.
> In this case, different bindings to the same resource will behave very
> differently. The 4th paragraph on PUT may not be possible due 
> to the effect of
> these functors.
> 

<js>
I'll add this to the issues list.
</js>

> Should we introduce a new DAV property that provides a GUID 
> for each resource
> that can act as an identifier for that resource? Then we can 
> determine if two
> bindings are to the same resource by examining a property. 
> Note that just
> because the contents and properties are the same for a pair 
> of resources, this
> does not mean they are the same resource.

<js>
An interesting suggestion.  I'll add it to our issues list.
</js>

> 
> 4.2.13: If the optional DAV:bindings property exists for a 
> resource, does it
> have to contain ALL the bindings? Even those from other 
> servers? Is this
> optional on a server or resource basis?

<js> Well, we've said that servers are required to guarantee the integrity
of bindings.  So they have to fail BIND requests unless they can guarantee
that the server where the resource lives knows about the binding and will
honor it (won't let the resource be destroyed while the binding still
exists, etc.).  Which pretty much means that no one will support
cross-server bindings, as far as I can see.  But in any case it means that
the server maintaining DAV:bindings would have the information needed to
include all bindings, and I think it should be required to do so.
The property is optional on a resource basis, like all capabilities in
WebDAV.
</js>

> 
> 4.3, last paragraph: the semantics of the Passthrough header 
> seem to be
> described in reverse. The last sentence of the last paragraph 
> sounds like a
> NoPassthrough header, not Passthrough. The default should be 
> to return the
> properties of the reference or return a 302. 

<js>
We probably need to talk about the values of Passthrough (T and F) to make
it clearer what is meant here.  The default in the case of a PROPFIND is to
return a 302.  So using Passthrough: F would get the reference's own
properties.
</js>

>Why not return a 
> 302 of the
> Passthrough header is not present, and if it is present, it must be a
> referencing-aware client and just do what the header says. 
> "T" means Passthrough
> to the target and don't return a 302. "F" means operate on 
> the reference. 

<js>
We don't want to end up with a hybrid direct / redirect reference.  One of
the essential characteristics of a redirect reference is that the server
never has to resolve it.  The server never operates on the target, but only
responds with a 302 and gives the client the information it needs to send a
request to the target.  This was to insure that server implementations would
be easy and that there would be no complexity about using redirect
references to resources on a different server.
</js>

>Second
> paragraph of 4.3.2 indicates the problem. The response to a PROPFIND
> Depth:infinity on a collection containing redirect references 
> returns 302 for
> the redirect references, but also returns the 
> DAV:resourcetype and DAV:location
> properties (described as DAV:reftarget in section 4.3?) for 
> the redirect
> reference but no other properties. Seems like a lot of 
> special cases. Could
> using a three-state Passthrough header eliminate the special cases?

<js>
There is only 1 special case: a redirect reference is encountered when doing
a PROPFIND with Depth > 0.  The special processing for that case is to
return a 302 with DAV:resourcetype and DAV:location, and no other
properties.

We could have said that the server must resolve the reference and return its
target's properties, but that would be to turn the redirect reference into a
direct reference, which we were not willing to do.
</js>

> 
> 4.3.1 MKREF should use the Destination header like BIND. 
> Operations involving
> redirect references use a Location header. BIND uses 
> Destination. MKREF uses
> Ref-Target while the redirect reference has a DAV:reftarget 
> property. These
> should be normalized.

<js>
We should definitely take another look at this, but there were reasons.
Ref-Target and DAV:reftarget differ from Destination in allowing relative
URIs as values.  People felt that in some situations it would be desirable
to use relative URIs.
The Location header and DAV:location pseudo-property are used in situations
where [HTTP] requires the Location header.  You have to use Location with a
302, and we introduced DAV:location for use inside a Multi-Status response
to convey the same information.
</js>

> 
> 4.3.1.1, 409: Examples that MAY produce a conflict include 
> reference to a target
> that does not exist on a server that does not support 
> dangling references. This,
> and similar server behavior needs to be part of the 
> specification in order to
> ensure interoperability.
> 

<js>
ok
</js>

> 4.3.3. Seems like COPY should just copy the redirect 
> reference resource, just
> like any other resource, and there should be no special 
> cases. This looks like
> its attempting to mix binding and redirect reference 
> semantics on a case-by-case
> basis. It will be too hard to explain and remember these 
> semantics. So I don't
> agree with "For a COPY request to a redirect reference, the 
> expectation would be
> a 203 response that the client could use to copy the target 
> resource." The
> client is made aware on a GET on the redirect reference that 
> it is a redirect
> reference. So on COPY, the client would expect to copy the 
> redirect reference to
> a new location, but have it redirect to the same target. The 
> behavior of the
> newly copied redirect reference is exactly the same as the 
> old one. We shouldn't
> special case methods based on resource type. I don't think 
> the client would
> expect a new, independent copy of the target resource because 
> that's not what
> was copied. An alternative would be to go ahead and copy the 
> reference but still
> return a 302 unless the Passthrough header was specified. If 
> Passthrough is "T",
> copy the target. If Passthrough is "F", copy the reference, 
> same target, and no
> 302.
>

<js>
We can revisit this, but we've been through it so many times I don't promise
a different decision.  It's a balancing act between conflicting
considerations, and no answer seems right from all points of view.
</js>
 
> 4.3.4 Passthrough for DELETE isn't consistent with its use on 
> other methods.
> Passthrough : T should not generate a 302, it should delete 
> the target resource
> and the reference. 

<js>
You'll be tired of hearing this by now, but one thing a redirect reference
will never do is operate on the target resource.  Passthrough: T for a
redirect reference always results in a 302, which allows the client to
submit a request to the target resource.  Passthrough: F for a redirect
reference means operate on the reference.  If we ever re-introduce direct
references, Passthrough: T for them will mean operate on the target.
</js>

>Third paragraph: this is one 
> interpretation of COPY and MOVE.
> Another is given by GET, PUT, DELETE semantics. COPY is a GET 
> followed by PUT to
> the new destination. MOVE is a COPY followed by a DELETE of 
> the source. In this
> interpretation, these methods require no special cases. The 
> semantics of
> references should be independent of these implied 
> implementation details.

<js>
I don't think the paragraph implies anything about implementation.  It
certainly wasn't meant to.
But I'll have to agree that this rationale has always seemed pretty shaky to
me.  What I really think people have in mind is the behavior of Unix
symbolic links (where removing the link doesn't remove the target) and the
desire to do the least-potentially-damaging thing.
</js>

> 
> 4.3.6. This is too complicated and has too many special 
> cases. If redirect
> references are exposed as resources, then they should be 
> treated like resources.
> LOCK on a redirect reference should lock the reference. It 
> should have no effect
> on the target unless Passthrough is specified to "T". Then 
> the target resource,
> not the reference is locked. A locked reference can't have 
> its properties
> changed. In particular, one can't change its target. Or 
> consider removing
> redirect references from the spec.

<js>
Again, redirect references are not direct references and so never operate on
the target resource.  We start from the position that the default behavior
for any method for a redirect reference is to respond with a 302, and that
there has to be a good reason to deviate from that behavior for any method.
LOCK and COPY have been the difficult cases, where we've made exceptions for
one reason or another.  In the case of LOCK the decisive consideration was
that LOCK on a collection with Depth: infinity would always fail for
collections that contained redirect references if we stayed with the 302
behavior.  So we have the LOCK apply to the reference instead.
</js>

> 
> 5.2.1, last paragraph: DAV:orderingtype should be optional. Missing
> DAV:orderingtype is equivalent to DAV:unordered. This would 
> be more compatible
> with existing servers, and simpler.

<js>
Sounds ok.
</js>

> 
> 5.4. What happens to requests that occur between changing the 
> DAV:orderingtype
> and the ORDERPATCH method? Won't the ordering be incorrect?

<js>
Yes.  Maybe we should use ORDERPATCH both for changing the ordering
semantics and for manipulating the order of the members.
</js>

> 
> 6.4 Passthrough "T" should not return a 302, but instead 
> should operate directly
> on the target resource. The client is already referencing 
> aware (or wouldn't
> have used the header), and has expressed his intent to 
> operate on the target not
> the reference. The server should do the operation without 
> requiring another
> round trip. Also, Passthrough: T and no Passthrough header do 
> the same thing.

<js>
For redirect references, Passthrough: T returns a 302.  If we ever
re-introduce direct references, Passthrough: T for them will operate on the
target resource.

Whether Passthrough: T and no Passthrough header do the same thing depends
upon the method in question.  For DELETE and MOVE the default behavior is to
operate on the reference.  So the absence of a Passthrough header for those
methods is equivalent to Passthrough: F.
</js>

> 
> 6.6: perhaps it should be an error to adding a binding to a 
> collection with a
> client-maintained ordering and the Position header is not 
> specified. Putting the
> resource at the end seems a bit arbitrary. See section 8.4: 
> If the collection
> has a custom ordering type, how do we know any given client 
> will add resources
> in the desired order? Requiring the Position header at least 
> makes the client
> think about ordering. But the result looks pretty useless as 
> there is no way to
> ensure the client ordering semantics are followed.

<js>
In any case, yes, it's entirely up to the clients that add bindings to the
collection to see that the ordering semantics are followed.

I do like the idea of requiring the Position header when adding a binding to
a collection with a client-maintained ordering.  The only hesitation I have
is that someone other than the person who specified the ordering might be
trying to add a binding.  If we assume that the owner cares more about
maintaining the ordering than about allowing others to add to the
collection, we can add this constraint.  Otherwise, maybe better not.
</js>

> 
> 7.1 loop detected should be a 2xx status code, not an 506 
> which indicates an
> internal server error. Loops aren't errors.

<js>
We'll be taking another look at this.
</js>

> 
> 11.1 indicates a resource that provides resource sharing MUST 
> support both
> bindings and redirect references while the next sentence 
> indicates an OPTIONS
> request MUST indicate which of these capabilities the 
> resource supports. Aren't
> bindings and redirect references independent? Couldn't a 
> server support either
> one or both?
> 

<js>
In the split specs, we define separate compliance classes for bindings and
redirect references, so servers will be free to implement either one or both
or neither.
</js>
Received on Tuesday, 24 August 1999 11:10:36 UTC