RE: Collections Protocol Review from Yaron Goland on 1999-02-22 (w3c-dist-auth@w3.org from January to March 1999)

From: Yaron Goland <yarong@microsoft.com>
Date: Mon, 22 Feb 1999 15:07:25 -0800
To: "'Slein, Judith A'" <JSlein@crt.xerox.com>, "'ejw@ics.uci.edu'" <ejw@ics.uci.edu>, WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <3FF8121C9B6DD111812100805F31FC0D08792F59@RED-MSG-59>
These are only my initial comments and only cover the first twenty pages or
so of the spec. They also only identify what I consider to be significant
issues. Having found a sufficient number of them I decided to not provide
the entire shopping list of minor issues.

I apologize for not being able to provide additional comments on the rest of
the paper but I am completely swamped at work and have to do this review on
my single free day (I'm currently working six days a week). I hope to be
freed up sometime in April after the WinHEC conference.

See comments below.

			Yaron


> The term Collection is intended to mean exactly the same as 
> in the WebDAV
> specification.  I think I copied the definition from the 
> WebDAV spec.  Since
> WebDAV does not specify what the results of GET on a 
> collection will be, I
> would be more inclined to define "contains" in terms of which 
> URIs will be
> included in the response to a PROPFIND with Depth = 1 on the 
> collection.
> 

I know what you meant, it just wasn't what the spec says.

> I agree that the definition of "referential resource" is 
> pretty lame, but I
> do think it should be possible to come up with a definition 
> of reference
> that encompasses both direct and redirect references.  They 
> really were
> intended to be 2 different implementatons of the same basic 
> capabilities:
> one puts more burden on the client, the other puts more burden on the
> server.  They both allow users to construct new collections 
> that include
> (loosely speaking) resources that really live someplace else, 
> without making
> a physical copy of the resource in the new collection.  How about:
> 
> a resource that provides access to the content and properties 
> of another
> resource
>  

The purpose of a spec is to help implementers and we should choose
definitions that aid in that cause. Redirect references are based on 3xx and
should be defined as such. This makes the situation extremely clear to
anyone writing code.

> Good idea.  We just have to make sure that whatever we do 
> doesn't confuse
> down-level clients.  The whole point of implementing redirect 
> references
> using 302 with Location was that this would allow down-level 
> clients to use
> redirect references.

Each enough. Continue to use Location and define an addition header
LocationEx (a nod to my MS roots =). The definition is "First use Location
and if that doesn't work then try the list of URIs in LocationEx." 100%
compatible with existing systems and better functionality for advanced
collection enhanced clients.


> The design team will need to revisit this issue.  I think we 
> need to take
> your argument into account.  However, the consequences of 
> following your
> suggestion might be worse than our current situation.
> 

The current situation is that this spec provides a definition whereby one
has absolutely no idea what will happen as a result of executing a request.
As a clear violation of the Hardie Rule this means you are not providing an
interoperable solution. In other words, things can't possibly get worse.
Uglier, perhaps, but not worse.

> Since we are ruling referential integrity out of scope, it is 
> true that
> clients won't know at the time they create a reference 
> whether or how its
> integrity will be enforced.  This is bad.  But they can at 
> least examine the
> value of the DAV:refintegrity property once the reference has 
> been created
> in order to find out.  (Of course, since we are also not defining any
> standard values for DAV:refintegrity except DAV:weak, the 
> client is unlikely
> to understand the value unless it has been designed to work with the
> particular server.)
> 

Given the ramifications of integrity enforcement as so well explained by
your own paper any situation where a client does not know if a reference
will be integrity enforced and thus has absolutely no idea what the final
behavior will be is, by definition, non-interoperable and thus unacceptable.
Your only options are to either define referential integrity (a hopeless
task) or to specify that the current spec will only allow for the creation
of unenforced references and that later specs can add different types of
referential integrity.

> If we take your suggestion and require clients to specify a particular
> integrity policy whenever they create a reference, but 
> decline to define
> additional values of Ref-Integrity, we prevent clients from creating
> references at all unless they have some private agreement 
> with the server
> about values or unless servers do publish values of 
> Ref-Integrity that they
> use -- so clients would have to be designed to work with 
> particular servers,
> and we get no interoperability out of this spec.
> 

The ONLY interoperability you can get without defining referential integrity
is to provide the do-not-enforce value. That is it. You can not do better.
As your own spec clearly illustrates, allowing for referential integrity
without defining it means non-interoperability by definition!

> It's not really likely that clients get to choose what policy 
> a server would
> use on a per-reference basis in any case.  The server has a 
> policy that it
> uses to enforce referential integrity.  It may allow clients to say
> "do-not-enforce", but it won't allow clients to choose how to 
> enforce.  So
> it seems reasonable to have just 2 values "do-not-enforce" 
> and "enforce".
> Clients will find out how referential integrity is being 
> enforced by trial
> and error or by examining the value of DAV:refintegrity.
> 

This sounds like the old COPY arguments that lead to the definition of the
Hardie rule. You are not helping interoperability when a method can have any
number of completely unknown ramifications. Even discovering that your link
is "enforced" tells you nothing as you know nothing about the type of
enforcement policy being used.
Systems which can not support non-enforced references will be forced to
publish their own schemes and will only work with clients which support
those schemes. Hopefully this will lead implementers to be willing to make
sufficient tradeoffs that a common scheme can be created. But until then
simply saying "your mileage may vary" on the fundamental ramifications of a
command is a non-starter.

> > 
> > (Issue #5) Section 4.3.1 - Why DAV:reftarget, reftype and 
> > refintegrity are
> > stand alone properties - I do not understand why 
> > DAV:reftarget, DAV:reftype
> > and DAV:refintegrity are all defined as properties. As these 
> > values help to
> > define the nature of the resource type should they not have 
> > their values
> > included inside the DAV:reference element in the 
> > DAV:resourcetype property?
> 
> It does seem more elegant to make them part of a reference element of
> DAV:resourcetype.  A practical reason for not doing that is that DASL
> doesn't (yet) support searches on structured properties and 
> these seem like
> properties we would want to be able to search on.  Actually, 
> I see that we
> need to revisit the values of these properties in any case, 
> since they are
> all structured at present.
> 

The design rational for including these values in resourcetype are clear and
should be followed. If DASL can not support WebDAV then it has violated its
charter and needs to be changed.

> > 
> > (Issue #6) Section 4.3.1 - MKREF and the use of bodies - Why 
> > doesn't MKREF
> > follow the same rules that MKCOL has regarding the possible 
> > inclusion of a
> > body? They provide for the inclusion of a body but specify 
> that if the
> > content-type of the body is not understood then the request MUST be
> > rejected. I believe the same rational that merited the 
> > inclusion of this
> > language in MKCOL's definition applies here.
> 
> I'm not sure what the rationale for allowing MKCOL to have a 
> body was.  My
> own view was that it could be used to populate the collection 
> with members
> in a single request when the collection was created.  There's nothing
> analogous to that for references.
> 
> I'd be glad to see scenarios for how a message body for MKREF 
> might be used.
> 

One of the most important rules in HTTP is to never ban anything unless you
have a damn good reason because it will always come back to bite you. Just
because you can't think of a reason to include a body does not mean that
someone far smarter than both of us will come forward some day and come up
with a good reason. The impetus is upon the authors to argue why banning a
body is such a critical requirement that it is worth cutting off this
direction of extensibility.

> > 
> > (Issue #7) Section 4.4.1 - Do references to collections have 
> > to provide
> > references for all the members of the collection? - If one creates a
> > reference to a collection is one also required to create 
> > references to all
> > the members of that collection? I suspect the answer is yes 
> > but this is not
> > clear from the specification. For example, the referential 
> > resource in the
> > example is http://www.svr.com/MyCollection/tuva/ and it points to a
> > collection which has a member called history.html. Does this 
> > mean that a GET
> > on http://www.svr.com/MyCollection/tuva/history.html MUST 
> > succeed? In other
> > words, that by creating a reference to a collection one is 
> required to
> > create references to its entire tree? Again, I suspect the 
> > answer is yes
> > (this would explain the PROPFIND response format for direct 
> > references as
> > well as the COPY behavior for direct references) but this 
> > requirement is
> > never explicitly stated in the spec.
> 
> The way I've been thinking about this is that a reference is 
> a resource.
> When you create a reference to a collection you only create one new
> resource.  You do not create a new referential resource for 
> each member of
> the target collection.  But the consequence of creating that one new
> resource is that there is a new URI that can be used to 
> access each member
> of the target collection.  And yes, a GET on any member of 
> that collection
> through the new reference MUST succeed.
> 

HTTP defines that there are HTTP URLs and that HTTP URLs point to resource.
If creating a reference called http://foo/bar that points to a collection
with a member baz and thus I can perform a GET on http://foo/bar/baz, even
though the MKREF only created http://foo/bar, then there exists a resource
called http://foo/bar/baz and it is a reference. Thus the only possible
conclusion is that creating a reference to a collection causes references to
be created to all the members of that reference's entire tree. Otherwise
http://foo/bar/baz would fail.

> You can see us struggling with exactly this sort of example 
> in Section 4.16.
> 

Indeed, lack of sufficient clarity on this issue causes ramifications
throughout the entire specification.

> > 
> > (Issue #8) Section 4.5.5 - Requiring reftype and reftarget in 
> > COPY responses
> > - Why must one return the reftype and reftarget information for COPY
> > requests? As this information is available from a PROPFIND on the
> > destination requiring it to be supplied as part of the COPY 
> > response seems
> > redundant. Including this information also would most likely 
> > break RFC 2518
> > clients who would expect an empty body on a fully successful copy.
> 
> If you look at Appendix 2, you'll see that we're requiring 
> Ref-Type to come
> back in the response for any request made on a reference or through a
> reference to its target.  And we're requiring Ref-Target to 
> come back in the
> response whenever the target is affected by the request.  The 
> rationale was
> that the client should be able to tell for any request what 
> resource was
> actually affected by the request.
> 
> But you are right that the client can always find this out by 
> looking at
> properties on the reference.  And incorporating this information in
> responses to requests on collections is very messy -- LOCK is 
> even worse
> than COPY.  So we should probably revisit the issue of 
> whether to require
> any referencing headers to be included in responses, and make 
> a consistent
> decision across all cases, not just for the case of COPY on 
> collections.
> 

I understand why you wanted to do this but given that there are alternative
means and that your current proposal may or may not work with RFC 2518
clients it would seem reasonable to exclude the data.

> > 
> > (Issue #9) Section 4.6 - Banning passthrough behavior on 
> > DELETE and MOVE - I
> > have a direct reference http://foo/bar to http://bar/blaz. I 
> > can define a
> > MOVE on http://foo/bar to http://icky/bik as meaning that 
> > http://bar/blaz is
> > to be moved to http://icky/bik. Following the logic discussed 
> > in Issue #7 I
> > can even properly define how to move any children that 
> > http://bar/blaz might
> > have. With DELETE I can define a DELETE of http://foo/bar as 
> > meaning that
> > http://bar/blaz should be deleted. So there doesn't seem to 
> > be any technical
> > reason to ban passthrough behavior on MOVE and DELETE 
> involving direct
> > references.
> > 	I suspect the real reason that passthrough behavior is 
> > disallowed on
> > direct references is because of concerns regarding interactions with
> > existing WebDAV clients. In most file systems that support 
> > links if one
> > deletes or moves a link, only the link is deleted or moved. I 
> > suspect the
> > authors are concerned that allowing passthrough on DELETE and 
> > MOVE would
> > mean that if an existing WebDAV client asks for a DELETE or 
> > MOVE then the
> > resource being pointed at would be deleted/moved as well, 
> which wasn't
> > something the non-reference enabled client would have 
> > intended. As such I
> > think it is completely reasonable to design the protocol such 
> > that if a RFC
> > 2518 client issues a DELETE or MOVE then only the direct 
> > reference and not
> > the target is affected. In fact I would propose the design 
> > rule that when
> > moving/deleting a resource the resource should retain its 
> type unless
> > explicit instructions to the contrary are given. That is, 
> if you are a
> > direct reference at the source then you should be a direct 
> > reference at the
> > destination unless explicit instructions to the contrary 
> are provided.
> > 	The way to easily implement this rule is to give 
> > no-passthrough a
> > value and define the default for that value on a method by 
> > method basis.
> > PROPFIND and COPY, for example, would default to no-passthrough: f.
> > MOVE/DELETE would default to no-passthrough: t.
> > 	The previous all applies to direct references. Redirect 
> > references
> > are obviously a different animal since one can only directly 
> > manipulate the
> > reference and never the target. Thus the only issue is - 
> > should a 302 ever
> > appear in a COPY/MOVE/DELETE response? Following the law of 
> > minimum surprise
> > an RFC 2518 client getting a 302 would be very surprised. 
> > They ordered that
> > all resources be copied, redirect references are resources, 
> > therefore they
> > should be copied. As such I propose that redirect references on
> > COPY/MOVE/DELETE always behavior as if no-passthrough equals 
> > t, regardless
> > of its actual value.
> 
> We do think of the default behavior of most methods as 
> No-Passthrough: f.
> That is, if you want to operate on the reference rather than 
> its target, you
> have to use the No-Passthrough header to make that happen.  
> If you don't use
> the No-Passthrough header, the operation will get applied to 
> the target.
> 
> The exceptions are MOVE and DELETE, as you notice, and for redirect
> references also COPY, LOCK, and UNLOCK.
> 
> The stated rationale for preventing MOVE and DELETE from ever 
> getting passed
> through to the target is this: When a reference is added to a 
> collection,
> the aim is to make it look as if the target resource were a 
> member of that
> collection.  When the reference is removed from that 
> collection, the aim is
> to change the membership of that collection.  Membership of 
> the target in
> any other collections, either internally or by reference, 
> should not be
> affected.  So MOVE and DELETE are different from other 
> methods, which are
> intended to affect the resource identified by the request-URI 
> rather than
> the collection to which it belongs.
> 

I still fail to see why you are banning perfectly reasonable functionality
when there does not appear to be any technical or protocol related reason
for doing so. As my comments demonstrate it is completely possible to use
DELETE and MOVE to effect the targets and to do so in a manner which causes
the expected behavior with RFC 2518 clients. As such what possible reason
can there be for banning this logical and well defined behavior? Just
because it doesn't fit your conceptual model? This would argue for changing
your model not banning this functionality. Again, I call upon the general
HTTP rule "allow unless you have a damn good reason for not doing so."

> I'm sure it was also in the back of people's minds that they 
> would like to
> keep behavior similar to file system behavior to make life easier for
> servers based on file systems.
> 
> That said, we can certainly revisit the question whether to keep
> No-Passthrough: t as the default for MOVE and DELETE, but 
> allow them to be
> passed through by saying No-Passthrough: f.
> 
> COPY, LOCK, and UNLOCK for redirect references are more 
> complicated cases.
> I think there is really no consensus within the design team 
> on how we think
> COPY should behave for redirect references.  For LOCK (and 
> therefore for
> UNLOCK) on redirect references, we didn't want to allow 
> No-Passthrough: f
> because this would result in 302 responses, and so in the case of
> collections that contain redirect references would cause the 
> entire LOCK to
> fail.
> 
> > 
> > (Issue #10) Section 4.7.1 - Passthrough locks on direct 
> > references - I'm
> > thoroughly confused regarding why a passthrough lock doesn't 
> > lock both the
> > reference and the target. I read the spec a couple of times 
> > and I still
> > don't get it. For example, imagine an RFC 2518 client 
> issued a lock on
> > http://foo/bar which is a direct reference to
> > http://bar/blaz. To the RFC 2518 client it appears that they have 
> > locked http://foo/bar which will act exactly as 
> http://bar/blaz. This
> seems > the much more reasonable behavior.
> > I'm sure there is a scenario here I'm missing but whatever 
> it is, I wasn't
> > able to discern it from the spec. I apologize if the 
> scenario is staring
> me
> > in the face and I am just failing to see it.
> 
> At least one member of the design team agrees with you.  We 
> all agreed that
> the most intuitive behavior when LOCKing a reference would be 
> for both the
> reference and its target to get locked.  So that would argue 
> for making it
> the default behavior, at least for direct references.
> 
> However, we are balancing intuitive behavior against various 
> consistency
> considerations, and in this case consistency with the way 
> other methods
> behave for direct references won out.  The default behavior 
> for most methods
> on direct references is to pass the request on to the target, 
> so we decided
> to do the same for LOCK.  This at least prevents non-lock-holders from
> modifying the target, while there is still a risk that 
> someone might delete
> or replace the reference while the lock was in place.  We do 
> point out that
> if reference-aware clients are good citizens and check 
> whether the target is
> locked before mucking around with a reference, all will be well.
> 

The current situation is, in my opinion, a clear violation of RFC 2518. RFC
2518 is crystal clear that locking a resource means that the resource is
locked. Your current definition does not do this. In fact someone locking a
direct reference, which actually locks the target, could end up with the
reference changed underneath them, thus leading to the violation. Whatever
solution you choose the reference MUST be locked or this specification is in
violation of RFC 2518 and must therefore choose a new method name rather
than LOCK.

Personally I think you should just accept that LOCKing references can be a
bit wacky and will result in locking both the reference and the target.

> > Section 4.7.3 - Returning reftype and reftarget on LOCKs of redirect
> > resources - The text says that a reftype and a reftarget element is 
> > returned
> > in the result but the actual result does not contain them. 
> I'm not marking
> > this as an issue because I assume it is just a minor 
> editorial glitch.
> 
> Right.  Good catch.
> 
> > Section 4.7.4 - While there is a title for this section, there is no
> > example. I'm not marking this as an issue because I assume 
> it is just a
> > minor editorial glitch.
> 
> Right.  I ran out of time.
> 
> > (Issue #11) Section 4.7.5 - LOCK on a Collection that 
> contains a direct
> > reference and a redirect reference - First off, the format is not
> backwards
> > compatible with RFC 2518 and thus will break existing RFC 
> 2518 clients. 
> > Such
> > clients are expecting to get back a prop response, not a 
> multistatus. They
> > will ignore the multistatus (since they won't recognize it in this
> context)
> > and thus "see" an empty response body.
> > 	It would seem that returning reftype and reftarget 
> information is
> > not necessary. A client can retrieve this information from 
> a PROPFIND.
> This
> > is a similar argument to Issue #8. The best counter 
> argument is that not
> > returning this information would cause reference enhanced clients to
> always
> > perform a PROPFIND after a LOCK. While this is not 
> necessarily the end of
> > the world (LOCKs tend to be rare) it could still be bad. 
> However, if we
> > adopt the behavior proposed in Issue #10 then there won't 
> usually even be
> a
> > good reason to perform the PROPFIND. In addition, this behavior is
> > consistent with what an RFC 2518 client would expect.
> 
> We definitely don't want to break existing clients, so let's 
> see where we
> end up once we resolve issues 8 and 10.
> 
> Thanks again for (as always) prompt and insightful comments.
> 
> --Judy
> 
> Judith A. Slein
> Xerox Corporation
> jslein@crt.xerox.com
> (716)422-5169
> 800 Phillips Road 105/50C
> Webster, NY 14580
> 
> 
> 
> 
> 
>
Received on Monday, 22 February 1999 18:07:31 UTC