- From: Yaron Goland <yarong@microsoft.com>
- Date: Wed, 30 Oct 1996 23:20:12 -0800
- To: "'Daniel W. Connolly'" <connolly@beach.w3.org>, "'Jim Whitehead'" <ejw@rome.ICS.UCI.EDU>
- Cc: "'w3c-dist-auth@w3.org'" <w3c-dist-auth@w3.org>
-----Original Message----- From: Daniel W. Connolly [SMTP:connolly@beach.w3.org] Sent: Saturday, October 26, 1996 12:16 PM To: Jim Whitehead Cc: w3c-dist-auth@w3.org Subject: Re: Prelim. DAV spec. Great stuff... comments as I read it. In message <9610251749.aa07659@paris.ics.uci.edu>, Jim Whitehead writes: ><draft-ietf-webdav-v1-spec-00> October 25, 1996 >1.2 Terminology > >Unless otherwise noted below, the use of terminology in this document is >consistent with the definitions of terms given in [HTTP11]. > >check in > A Check In is a declaration that the client no longer intends to edit a > representation(s). Of "entity" and "representation": pick one and stick with it. But in this case, I think you meant resource, i.e. the gizmo associated with a URL. No we mean representation(s), meaning one or more than one representations of a resource. Where representation is defined as in HTTP 1.1. representation An entity included with a response that is subject to content negotiation, as described in section 12. There may exist multiple representations associated with a particular response status. The key here is that we are not just talking about an entity, we are talking about a content negotiated entity. I have removed all references to entity and replaced them with representation. Of edit and update, pick one. So I think it should say "... no longer intends to update a resource." Update is a loaded term. I prefer edit. I have modified the draft to use edit. >history > The history of a URI is a list of all the versions of the URI along > with related information. "Versions of a URI"? URIs are immutable. They're just strings, kinda like large integers. I suggest: The history of a URI U* is a list of URIs Ui where each Ui refers to some version of the U* resource, plus some attribues related to each version. This goes into the same bag as Destroy which is now defined as: destroy To destroy a resource is to request that the resource be permanently removed from storage. This differs from delete in that some versioning systems handle delete as a request to no longer make the specified resource editable. I have also changed history to: history The history of a resource is a list of all the versions of the resource along with related information. Of course each version is also a resource. However I don't want to define version as it would be nothing but an academic exercise. >merge > A merge is the process whereby a resource represented by one URI is > combined with a resource represented by a second URI. Merges can occur > at the client or the server. @@hmmm... merge A merge is the process whereby information from one or more resources is used to produce a new resource that represents the content of the component resources. Merges can occur at the client or the server. BTW, I know you can nit pick the merge definition. Please don't. I really don't want to spend an hour finding just the right adjective. >no-modify lock > A no-modify lock prevents a locked resource from being altered until > all no-modify locks are released. s/altered/updated/. Be consistent. Fair enough, all the "alter" have been changed to "edit". >notify request > A notify request instructs the recipient to send update information > regarding the progress of a request. If you use update in the sense that I'm suggesting, don't use it for this purpose as well. I suggest: s/update/status/. notify request A notify request instructs the recipient to send information regarding the progress of a request. >server diff > A server diff is a mechanism whereby the server compares two or more > representations, and sends the client a message containing a summary of > the differences between the entities. s/representations/entities/. server diff A server diff is a mechanism whereby the server compares two or more representations and sends the client a message containing the differences. >2. Attributes > >It is often necessary to record meta data about a resource. The natural >place to store such meta data is in HTTP headers however this presents a >problem. HTTP headers, as defined in [HTTP11], are used either for content >description or communication. While communication headers can be retrieved >with the Option method, content description headers can not. Content description headers can be accessed with HEAD. Yes, but one can not specify which content headers will be transmitted. Where we to use content headers for attributes and were we to use HEAD to discover them, then we would have to transmit all the content headers with each HEAD request. As the number of attributes will be very large and as the size of each attribute is likely to also be large, this is not acceptable. I have changed the paragraph to read: It is often necessary to record meta data about a resource. The natural place to store such meta data is in HTTP headers however this presents a problem. HTTP headers, as defined in [HTTP11], are used either for content description or communication. Neither of these header types were designed for remote setting and on-demand retrieval. > In addition >neither of these headers were designed for remote setting and on-demand >retrieval. I disagree. Jigsaw supports setting and on-demand retrieval of entity headers, for example. How? Does it allow me to selectively specify which header I want? Will it let me perform a method on a header? >What is needed is a third type of header, a header which provides >information about the resource's nature, not its content or transmission >state. This type of header will be referred to as an attribute header. I suggest you merget the concept of attribute header into the existing HTTP entity header concept. That is unfortunately not practical. The semi-practical solution to the attributes problem is to define a new entity-header "attributes" whose value is a URL. That URL then points to a file which contains a whole list of attributes, each of which contains a value, which could potentially also be a URI. If I wanted to find out what the name of the author of a document is I would send a HEAD request, then do a GET on the attribute URI. I would then read the response and see if the Author attribute is present. If so I would either read the associated value or, if the value is a URI, do a GET on the URI to get the Author information. This means that at a minimum every request for attribute information would involve at least two HTTP requests and would require the downloading of a potentially very large file. In all other respects this new attribute means is as powerful as the one currently proposed. The problem is the two requests. That is too high an overhead to pay. >2.1 Attribute Syntax > >To support requests for attributes the definition of a URI must be altered >as follows: > >URI = ( absoluteURI | relativeURI ) ["<" Attribute ">"] ["#" fragment] >Attribute = field-name ; See section 4.2 of [HTTP11] Interesting... can attributes be links as well? In a draft I'm working on: ABSOLUTELY!!! This has been one of Jim's basic requirements since day one. He wants N-Ary links. We have a small disagreement on syntax but we are in firm agreement regarding semantics. Describing and Linking Web Resources W3C Note @@Date This version: (unpublished) $Id: NOTE-link.html,v 1.5 1996/10/20 20:49:49 connolly Exp $ Latest version: http://www.w3.org/pub/WWW/Architecture/NOTE-link I use the notation R(A,B) to refer to any number of similar structures: link-relationship(source-anchor-address, target-anchor-address) e.g. stylesheet(http://www.w3.org/, http://www.w3.org/houststyle.css) header-field-name(URL, header-field-value) e.g. Expires(http://www.w3.org/TR/WD-xxx, "Wed Oct 19 15:34:54 1996") attribute-name(URL, attribute-value) e.g. hair-color(http://www.w3.org/People/Connolly/, "brown") two-place-predicate(arg1, arg2) e.g. member(http://www.w3.org/People/Connolly/, http://www.w3.org/People/) This URI notation combines R and A into one. So I might denote the above examples by: http://www.w3.org/<stylesheet> = http://www.w3.org/houststyle.css http://www.w3.org/TR/WD-xxx<Expires> = "Wed Oct 19 15:34:54 1996" http://www.w3.org/People/Connolly/<hair-color> = "brown" http://www.w3.org/People/Connolly/<member> = http://www.w3.org/People/ The last example reminds me of an issue: Header fields and link relationship are usually 1-1, but not always: the member relationship is obviously many-many. I think HTTP header fields are multiply valued; e.g. Accept: x,y is the same as Accept: x Accept: y. I can't think of any multiply valued entity headers though. This brings us to N-Ary Links. When one does a request on an attribute header the response is in the entity-body. As such the value can be ANYTHING. In the case of N-Ary links the response would be something like: Random-Headers1 #(URI Random-HeadersN) I say "like" because Jim and I still need to talk. The idea is that you would have a bunch of URIs, each of which would explain its particular relationship. If all the URIs have the same relationship then Random-Headers1 can be used to describe it. If one wants to specify special aspects of the N-Ary relationship then one can use Random-HeadersN. It is totally generic. Furthermore I added a paragraph explaining why I put in the pseudo-hierarchy: By convention an attribute request which ends in a "_" and which does not resolve to a specific attribute name SHOULD be treated as a request for a list of all attributes in that hierarchy. BTW, you should come out with a list of all the types of links you think we will need. That way we can include them with the document. >Thus attributes can now be used in any context where a normal URI would be >used. Nifty. Hmmm... I wonder if this could be used as a general URL syntactic composition mechanism. That issue was raised at the DOMC workshop. (see http://www.w3.org/pub/WWW/OOP/). I read the page but I'm not sure what you mean. So I'll bite... "What is a Syntactic Composition Mechanism?" >When a resource is copied, moved, or otherwise manipulated, its attributes >are equally effected. s/effected/affected/. Does this mean I have to give back my college degree? Oh wait, I'm an engineer, we aren't supposed to be able to write well. =) >In order to prevent name space collisions both headers and header prefixes >should be registered with a central authority. :-{ We hate this. But I don't see any alternative in this case, at least right now. Oh there is an alternative but you won't like it. GUIDs!!!!!!!! Now, doesn't IANA sound down right warm and cuddly? =) >2.2 Standard Attributes > >AttributeDirectory > The attribute "AttributeDirectory" returns a list of all attribute > headers on a resource. Subject to access control? Seems like it could be sensitive information. But of course. All URLs are subject to access controls and attribute-headers are retrieved using URLs. > To retrieve a list of attribute headers > associated with the URL http:\\foo\bar one would send a GET request s,\\foo\bar,//foo/bar, you Windows freak! ;-) Oh NO!!! You don't understand. I come from a long history of UNIX! I guess this means, yes.. I'm afraid it does. I HAVE BEEN ASSIMILATED!!!!!!! > with a request-URI of \bar<AttributeDirectory>, where Host would equal > foo. >Link > This attribute header contains information about resources that are > associated with this resource. A SiteMap representation SHOULD be > available. > [TBD: Review the SiteMap format and figure out tag formats to define > source links.] Hmmm... this answers my question above: in this design, links are subordinate to attributes, rather than being at the same level. That rubs me the wrong way. For example, to get the stylesheet of a document at U, I'd rather GET U<stylesheet> than GET U<link> and parse the results, looking fro a stylesheet link. IWhat stops you from using GET U<stylesheet>? It can either return a URL which points to the style sheet or it can return the stylesheet itself (depending upon how the attribute is defined). Attributes are links, links are attributes. >Source > The exact contents of the resource as stored, without any processing by > the server (e.g. without processing of server-side includes). This should return the address of the source, if you ask me. It's just a typed link, ala stylesheet. I agree: Source The URI of the resource as stored, without any processing by the server (e.g. without processing of server-side includes). >3. Lock/Unlock > >Locks come in three types, write, read, and no-modify. Logically a write s/types,/types:/ Fixed. >lock and a read lock can co-exist on a single resource. This means that one >set of clients can alter the resource and another set are the only ones >allowed to read it. This may seem silly but is actually used in Orange book >compliant environments. A write lock and a no-modify lock can not be used Make "This may seem..." parenthetical. And do you have a citation for the orange book? Why should it be made parenthetical? Also the best reference I currently have for the Orange book is [ORANGE] DoD 5200.28-STD, "Department of Defense Trusted Computer System Evaluation Criteria", December, 1985. Do government standards have authors? >Locks are assigned to a subset of the representations available for a >resource. If the lock only applies to a single representation then the lock >may be further restricted to only a particular range of the representation. This implies that there are "representations" that both the client and the server can refer to, but not by name. Seems to me that in the same way you consider attributes addressable, "representations" should be considered addressable. In fact, it seems to me that in the HTTP spec, representations _are_ considered addressable. For example, if /foo is available in GIF and PNG, and the server ever needs to export that fact to the client (e.g. for caching purposes), it should make up names for them (for use in the Alternates: header, for example). By convention, the server would probably pick /foo.gif and /foo.png. The bottom line: I suggest A lock is assigned to a set of resources; one of the resources in the set is the _primary_ resource; the others are _alternates_. All resources in the set, including the primary, are _variants_ of the primary. In the case of zero alternates, the lock may be further restricted to only a particular range of the resource. Using the above example, a lock could be assigned to the set: {/foo, /foo.gif, and /foo.png} with /foo as the representative. (Note that /foo.gif could be used as the representative as well. It doesn't matter.) This lock couldn't be a byte-range lock. But one could assign a lock to the set {/foo.gif} and restrict that lock to a byterange. It's sensible (aka well-defined) for a client to ask for a byterange lock on {/foo}, but depending on how a server manages the content, it might result in a runtime error. We already provide this type of semantics but instead of using the exported names we use the headers. I understand your point regarding exported headers but this isn't common practice. If it becomes common practice then the clients are free to discover the exported headers and place their locks on those. Until then we need to provide the content-negotation mechanism. >Locks may be taken out either in exclusive or shared mode. Add "shared mode" to the terminology section. SIR, YES SIR! shared mode Shared mode modifies a lock request such that the lock may be shared between multiple requestors. > In shared mode >anyone with proper access may take out a lock. In exclusive mode only the >user(s) What term is used in the HTTP spec for thingies that participate in authentication? They're called principals in a lot of the security literature. But "user" is OK, as long as it's used consistently. I don't like using the term user because sometimes the request may come from an automated process. It seems like you use client sometimes and user others. The notion of client is definitely different from the notion of user/principal: consider the case of the AOL proxy: it's one client making requests on behalf of zillions of users/principals. Sigh... I will change user and client to principal if you provide a definition of principal for the terminology section. > who originally took out the lock may alter the lock. However a new >user can be added to an exclusive lock if the addition is performed by one >of the locks current owners. The word "current" gives me the willies. Unless you're in the context of discussing a particular timeframe, strike it. I changed the language to "However a new user can be added to an exclusive lock if the holder of the lock token performs the addition." >If an entire resource is write locked and a lock owner deletes the resource >then the write lock remains. So long as the write lock remains the URI can >not be reused. "reused" in what sense? I think you can be more precise by saying "can not be updated". I changed it from reused to edited. >Locks also have time outs associated with them. If no time out value is >associated with a lock then the lock will never time out. Otherwise the lock >will expire if a number of seconds equal to the time out value passes >without the resource being accessed by a lock owner. seconds by whos clock? We're in a distributed system here. If you stick to the time-frame-of-reference of the server, I think you'll be safe. The implication is that clients can't exactly determine when a lock times out. But I think that's just a fact of life. Yup. >Finally, locks may be taken out for multiple clients in a single request. That seems fuzzy to me: a request by definition involves exactly one client. How do the other clients get involved? Or do you mean multiple users? >The Lock_Owners field allows for tokens to be used to identify multiple >clients who are considered owners of the lock. Ah... looks like you mean users. Be consistent. As I said, I will fix client/user if you provide a definition of principal. =) >Locks will be implemented using POST. Nifty. Finally, someone who agrees with this position. Please go talk to Roy Fielding for me. >4. Name Space Manipulation > >4.1 Copy > >A copy performs a byte-for-byte duplication of a resource, making it >available at both the original and new location in the URI namespace. There >is no guarantee that the result of a GET on the URL of the resource copy >will be identical to a GET on the original resource. For example, copying a >script to a new location will often remove it from its intended environment, >and cause it to either not work, or produce erroneous output. It seems to me that a smart server could detect this sort of thing. If so, should it (1) copy it anyway, cuz that's in the spirit of this spec, or (2) signal an error to the client that requested the copy, in order to prevent this sort of frotzed situation. I think a smart server SHOULD to (2) if it can, and that we should give that hint in the spec. I also think we should hint at whether a COPY is indented to be a deep copy or a shallow copy, or explicitly say that it may vary from resource to resource. In short, I think the intent is to create a new resource that acts like the old one, but doesn't share state with it (i.e. is updated independently). Well (2) should be handled in the return message. I hearby nominate you to write that part of the spec, all in favor say "AYE". "AYE!" Well that takes care of that. As for deep copy, shallow copy, please be more specific. I am only familiar with that term in reference to recursive copying. Gotta go... more comments when I get a chance to read the rest. Dan Promises promises... =) Thanks for the great comments, even if it did take me two hours to make it through. Please also get me the rest of your comments ASAP. I would like them to be in the next release. Yaron
Received on Thursday, 31 October 1996 02:20:12 UTC