RE: Collections Protocol Review from Slein, Judith A on 1999-02-22 (w3c-dist-auth@w3.org from January to March 1999)

From: Slein, Judith A <JSlein@crt.xerox.com>
Date: Mon, 22 Feb 1999 16:27:55 -0500
To: "'Yaron Goland'" <yarong@microsoft.com>, "'ejw@ics.uci.edu'" <ejw@ics.uci.edu>, WEBDAV WG <w3c-dist-auth@w3.org>
Message-ID: <201BB34B3A73D1118C1F00805F1582E801BA4D02@x-wb-0128-nt8.wrc.xerox.com>
Thanks for giving us these comments so quickly.  I'll take a stab at
responding here, and hope that others on the design team will add their two
cents.  I'll also add all of your issues to our issues list to make sure
they get addressed in the spec as well as in the mail.

--Judy

> -----Original Message-----
> From: Yaron Goland [mailto:yarong@microsoft.com]
> Sent: Monday, February 22, 1999 1:36 AM
> To: 'ejw@ics.uci.edu'; WEBDAV WG
> Subject: Collections Protocol Review
> 
> 
> The following comments are based on
> http://www.ics.uci.edu/pub/ietf/webdav/collection/draft-ietf-w
> ebdav-collecti
> on-protocol-03.txt:
> 
> (Issue #1) Section 2 - Definition of the term Collection - 
> What does the
> word "contains" mean in the definition of a collection? Does 
> this mean if I
> perform a GET I will get a list of them?

The term Collection is intended to mean exactly the same as in the WebDAV
specification.  I think I copied the definition from the WebDAV spec.  Since
WebDAV does not specify what the results of GET on a collection will be, I
would be more inclined to define "contains" in terms of which URIs will be
included in the response to a PROPFIND with Depth = 1 on the collection.

> 
> (Issue #2) Section 2- Definition of the term Referential Resource - It
> should be explicitly pointed out that this is a new resource 
> type. I think
> this would make the definition clearer.
> 	The term "body" as used in RFC 2068 and 2518 refer 
> exclusively to
> messages, not resources. Hence the phrase "body of its own" 
> does not have a
> definition in either spec.
> 	In general I find this definition confusing as it 
> indicates that a
> reference is a resource but has some relationship with a 
> target somehow
> involving properties. I think the problem is that the authors 
> have cleaved
> too tightly to the file system heritage of this feature. 
> First off, I think
> we need to cease viewing direct references and redirect references as
> children of the same mother. They are two very different 
> creatures with
> largely unrelated functionality. It would probably be in 
> everyone's interest
> if they were given names which shared no common words.
> 	A direct reference is really just an HTTP object which practices
> inheritances. Its sole purpose being to passthrough any 
> methods it receives
> to its target. Thus if a PROPFIND comes the direct reference 
> will simply
> pass it through to the target. This behavior can then be stopped by
> explicitly marking the request as being addressed to the 
> reference resource
> itself. Describing direct references as "passing through" 
> methods lets us
> avoid discussions of undefined terms like "body" and 
> discussions involving
> vague references to properties.
> 	A redirect reference is just a 3xx machine. It is a 
> resource which
> always returns 3xx to all requests which are not appropriate 
> marked as being
> direct at the reference. By describing redirect resources in 
> such concrete
> terms we avoid the maddening ambiguities inherent to the current
> specification.

I agree that the definition of "referential resource" is pretty lame, but I
do think it should be possible to come up with a definition of reference
that encompasses both direct and redirect references.  They really were
intended to be 2 different implementatons of the same basic capabilities:
one puts more burden on the client, the other puts more burden on the
server.  They both allow users to construct new collections that include
(loosely speaking) resources that really live someplace else, without making
a physical copy of the resource in the new collection.  How about:

a resource that provides access to the content and properties of another
resource
 
> Section 4.1 - I would like to congratulate the authors for 
> the section on
> why referential integrity is not supported. It is a very well written
> explanation.
 
Thanks.
 
> (Issue #3) Section 4.2 - Problems with the Location header 
> and redirect
> reference responses - I never liked the location header very 
> much because it
> only allows one to return a single URI. Thus if the target 
> resource has
> multiple names by which it is accessible one can only return 
> one of them.
> This isn't terribly robust. As such I would propose that this 
> specification
> provide an extension header to Location which allows for 
> additional URIs to
> be returned.

Good idea.  We just have to make sure that whatever we do doesn't confuse
down-level clients.  The whole point of implementing redirect references
using 302 with Location was that this would allow down-level clients to use
redirect references.
  
> (Issue #4) Section 4.3.1 - The Ref-Integrity header and its 
> support for the
> enforce value - If I send in an "enforce" value for 
> Ref-Integrity, can I
> delete a target which still has references? If I don't send in a
> Ref-Integrity header, can I be sure if I can delete a target 
> which still has
> references? As beautifully explained in the same spec, the 
> answer currently
> is "we don't know." This means that creating a reference without a
> Ref-Integrity header or with the value "enforce" is a blind 
> act. You have no
> idea what the results will be and this very clearly violates 
> the Hardie
> Rule.
> 	The naïve solution would be to just rip out the ref-integrity
> header, however the current language in the spec says that in 
> this case it
> is a gamble as to what one will get. Again, this violates the 
> Hardie Rule.
> 	As such I propose that the Ref-Integrity header MUST be 
> included in
> all requests and that the only defined value be 
> DAV:do-not-enforce. A server
> receiving this header with an unrecognized value MUST fail 
> the request. We
> do not act in the interests of interoperability by allowing 
> referential
> integrity when as this spec so elegantly argues, no one can 
> even define what
> this means. As such defining an "enforce" header does not help
> interoperability and so should not be in the spec. For now let various
> implements put out RFCs defining their particular 
> implementation along with
> a URL for the Ref-Integrity header that requires their particular
> implementation. Hopefully at some point in the future there will be
> convergence and a standardized URL can be specified.

The design team will need to revisit this issue.  I think we need to take
your argument into account.  However, the consequences of following your
suggestion might be worse than our current situation.

Since we are ruling referential integrity out of scope, it is true that
clients won't know at the time they create a reference whether or how its
integrity will be enforced.  This is bad.  But they can at least examine the
value of the DAV:refintegrity property once the reference has been created
in order to find out.  (Of course, since we are also not defining any
standard values for DAV:refintegrity except DAV:weak, the client is unlikely
to understand the value unless it has been designed to work with the
particular server.)

If we take your suggestion and require clients to specify a particular
integrity policy whenever they create a reference, but decline to define
additional values of Ref-Integrity, we prevent clients from creating
references at all unless they have some private agreement with the server
about values or unless servers do publish values of Ref-Integrity that they
use -- so clients would have to be designed to work with particular servers,
and we get no interoperability out of this spec.

It's not really likely that clients get to choose what policy a server would
use on a per-reference basis in any case.  The server has a policy that it
uses to enforce referential integrity.  It may allow clients to say
"do-not-enforce", but it won't allow clients to choose how to enforce.  So
it seems reasonable to have just 2 values "do-not-enforce" and "enforce".
Clients will find out how referential integrity is being enforced by trial
and error or by examining the value of DAV:refintegrity.

> 
> (Issue #5) Section 4.3.1 - Why DAV:reftarget, reftype and 
> refintegrity are
> stand alone properties - I do not understand why 
> DAV:reftarget, DAV:reftype
> and DAV:refintegrity are all defined as properties. As these 
> values help to
> define the nature of the resource type should they not have 
> their values
> included inside the DAV:reference element in the 
> DAV:resourcetype property?

It does seem more elegant to make them part of a reference element of
DAV:resourcetype.  A practical reason for not doing that is that DASL
doesn't (yet) support searches on structured properties and these seem like
properties we would want to be able to search on.  Actually, I see that we
need to revisit the values of these properties in any case, since they are
all structured at present.

> 
> (Issue #6) Section 4.3.1 - MKREF and the use of bodies - Why 
> doesn't MKREF
> follow the same rules that MKCOL has regarding the possible 
> inclusion of a
> body? They provide for the inclusion of a body but specify that if the
> content-type of the body is not understood then the request MUST be
> rejected. I believe the same rational that merited the 
> inclusion of this
> language in MKCOL's definition applies here.

I'm not sure what the rationale for allowing MKCOL to have a body was.  My
own view was that it could be used to populate the collection with members
in a single request when the collection was created.  There's nothing
analogous to that for references.

I'd be glad to see scenarios for how a message body for MKREF might be used.

> 
> (Issue #7) Section 4.4.1 - Do references to collections have 
> to provide
> references for all the members of the collection? - If one creates a
> reference to a collection is one also required to create 
> references to all
> the members of that collection? I suspect the answer is yes 
> but this is not
> clear from the specification. For example, the referential 
> resource in the
> example is http://www.svr.com/MyCollection/tuva/ and it points to a
> collection which has a member called history.html. Does this 
> mean that a GET
> on http://www.svr.com/MyCollection/tuva/history.html MUST 
> succeed? In other
> words, that by creating a reference to a collection one is required to
> create references to its entire tree? Again, I suspect the 
> answer is yes
> (this would explain the PROPFIND response format for direct 
> references as
> well as the COPY behavior for direct references) but this 
> requirement is
> never explicitly stated in the spec.

The way I've been thinking about this is that a reference is a resource.
When you create a reference to a collection you only create one new
resource.  You do not create a new referential resource for each member of
the target collection.  But the consequence of creating that one new
resource is that there is a new URI that can be used to access each member
of the target collection.  And yes, a GET on any member of that collection
through the new reference MUST succeed.

You can see us struggling with exactly this sort of example in Section 4.16.

> 
> (Issue #8) Section 4.5.5 - Requiring reftype and reftarget in 
> COPY responses
> - Why must one return the reftype and reftarget information for COPY
> requests? As this information is available from a PROPFIND on the
> destination requiring it to be supplied as part of the COPY 
> response seems
> redundant. Including this information also would most likely 
> break RFC 2518
> clients who would expect an empty body on a fully successful copy.

If you look at Appendix 2, you'll see that we're requiring Ref-Type to come
back in the response for any request made on a reference or through a
reference to its target.  And we're requiring Ref-Target to come back in the
response whenever the target is affected by the request.  The rationale was
that the client should be able to tell for any request what resource was
actually affected by the request.

But you are right that the client can always find this out by looking at
properties on the reference.  And incorporating this information in
responses to requests on collections is very messy -- LOCK is even worse
than COPY.  So we should probably revisit the issue of whether to require
any referencing headers to be included in responses, and make a consistent
decision across all cases, not just for the case of COPY on collections.

> 
> (Issue #9) Section 4.6 - Banning passthrough behavior on 
> DELETE and MOVE - I
> have a direct reference http://foo/bar to http://bar/blaz. I 
> can define a
> MOVE on http://foo/bar to http://icky/bik as meaning that 
> http://bar/blaz is
> to be moved to http://icky/bik. Following the logic discussed 
> in Issue #7 I
> can even properly define how to move any children that 
> http://bar/blaz might
> have. With DELETE I can define a DELETE of http://foo/bar as 
> meaning that
> http://bar/blaz should be deleted. So there doesn't seem to 
> be any technical
> reason to ban passthrough behavior on MOVE and DELETE involving direct
> references.
> 	I suspect the real reason that passthrough behavior is 
> disallowed on
> direct references is because of concerns regarding interactions with
> existing WebDAV clients. In most file systems that support 
> links if one
> deletes or moves a link, only the link is deleted or moved. I 
> suspect the
> authors are concerned that allowing passthrough on DELETE and 
> MOVE would
> mean that if an existing WebDAV client asks for a DELETE or 
> MOVE then the
> resource being pointed at would be deleted/moved as well, which wasn't
> something the non-reference enabled client would have 
> intended. As such I
> think it is completely reasonable to design the protocol such 
> that if a RFC
> 2518 client issues a DELETE or MOVE then only the direct 
> reference and not
> the target is affected. In fact I would propose the design 
> rule that when
> moving/deleting a resource the resource should retain its type unless
> explicit instructions to the contrary are given. That is, if you are a
> direct reference at the source then you should be a direct 
> reference at the
> destination unless explicit instructions to the contrary are provided.
> 	The way to easily implement this rule is to give 
> no-passthrough a
> value and define the default for that value on a method by 
> method basis.
> PROPFIND and COPY, for example, would default to no-passthrough: f.
> MOVE/DELETE would default to no-passthrough: t.
> 	The previous all applies to direct references. Redirect 
> references
> are obviously a different animal since one can only directly 
> manipulate the
> reference and never the target. Thus the only issue is - 
> should a 302 ever
> appear in a COPY/MOVE/DELETE response? Following the law of 
> minimum surprise
> an RFC 2518 client getting a 302 would be very surprised. 
> They ordered that
> all resources be copied, redirect references are resources, 
> therefore they
> should be copied. As such I propose that redirect references on
> COPY/MOVE/DELETE always behavior as if no-passthrough equals 
> t, regardless
> of its actual value.

We do think of the default behavior of most methods as No-Passthrough: f.
That is, if you want to operate on the reference rather than its target, you
have to use the No-Passthrough header to make that happen.  If you don't use
the No-Passthrough header, the operation will get applied to the target.

The exceptions are MOVE and DELETE, as you notice, and for redirect
references also COPY, LOCK, and UNLOCK.

The stated rationale for preventing MOVE and DELETE from ever getting passed
through to the target is this: When a reference is added to a collection,
the aim is to make it look as if the target resource were a member of that
collection.  When the reference is removed from that collection, the aim is
to change the membership of that collection.  Membership of the target in
any other collections, either internally or by reference, should not be
affected.  So MOVE and DELETE are different from other methods, which are
intended to affect the resource identified by the request-URI rather than
the collection to which it belongs.

I'm sure it was also in the back of people's minds that they would like to
keep behavior similar to file system behavior to make life easier for
servers based on file systems.

That said, we can certainly revisit the question whether to keep
No-Passthrough: t as the default for MOVE and DELETE, but allow them to be
passed through by saying No-Passthrough: f.

COPY, LOCK, and UNLOCK for redirect references are more complicated cases.
I think there is really no consensus within the design team on how we think
COPY should behave for redirect references.  For LOCK (and therefore for
UNLOCK) on redirect references, we didn't want to allow No-Passthrough: f
because this would result in 302 responses, and so in the case of
collections that contain redirect references would cause the entire LOCK to
fail.

> 
> (Issue #10) Section 4.7.1 - Passthrough locks on direct 
> references - I'm
> thoroughly confused regarding why a passthrough lock doesn't 
> lock both the
> reference and the target. I read the spec a couple of times 
> and I still
> don't get it. For example, imagine an RFC 2518 client issued a lock on
> http://foo/bar which is a direct reference to
> http://bar/blaz. To the RFC 2518 client it appears that they have 
> locked http://foo/bar which will act exactly as http://bar/blaz. This
seems > the much more reasonable behavior.
> I'm sure there is a scenario here I'm missing but whatever it is, I wasn't
> able to discern it from the spec. I apologize if the scenario is staring
me
> in the face and I am just failing to see it.

At least one member of the design team agrees with you.  We all agreed that
the most intuitive behavior when LOCKing a reference would be for both the
reference and its target to get locked.  So that would argue for making it
the default behavior, at least for direct references.

However, we are balancing intuitive behavior against various consistency
considerations, and in this case consistency with the way other methods
behave for direct references won out.  The default behavior for most methods
on direct references is to pass the request on to the target, so we decided
to do the same for LOCK.  This at least prevents non-lock-holders from
modifying the target, while there is still a risk that someone might delete
or replace the reference while the lock was in place.  We do point out that
if reference-aware clients are good citizens and check whether the target is
locked before mucking around with a reference, all will be well.

> Section 4.7.3 - Returning reftype and reftarget on LOCKs of redirect
> resources - The text says that a reftype and a reftarget element is 
> returned
> in the result but the actual result does not contain them. I'm not marking
> this as an issue because I assume it is just a minor editorial glitch.

Right.  Good catch.

> Section 4.7.4 - While there is a title for this section, there is no
> example. I'm not marking this as an issue because I assume it is just a
> minor editorial glitch.

Right.  I ran out of time.

> (Issue #11) Section 4.7.5 - LOCK on a Collection that contains a direct
> reference and a redirect reference - First off, the format is not
backwards
> compatible with RFC 2518 and thus will break existing RFC 2518 clients. 
> Such
> clients are expecting to get back a prop response, not a multistatus. They
> will ignore the multistatus (since they won't recognize it in this
context)
> and thus "see" an empty response body.
> 	It would seem that returning reftype and reftarget information is
> not necessary. A client can retrieve this information from a PROPFIND.
This
> is a similar argument to Issue #8. The best counter argument is that not
> returning this information would cause reference enhanced clients to
always
> perform a PROPFIND after a LOCK. While this is not necessarily the end of
> the world (LOCKs tend to be rare) it could still be bad. However, if we
> adopt the behavior proposed in Issue #10 then there won't usually even be
a
> good reason to perform the PROPFIND. In addition, this behavior is
> consistent with what an RFC 2518 client would expect.

We definitely don't want to break existing clients, so let's see where we
end up once we resolve issues 8 and 10.

Thanks again for (as always) prompt and insightful comments.

--Judy

Judith A. Slein
Xerox Corporation
jslein@crt.xerox.com
(716)422-5169
800 Phillips Road 105/50C
Webster, NY 14580
Received on Monday, 22 February 1999 16:28:42 UTC