Re: DAV-Enabled field (was RE: A case for GETSRC) from Eric Sedlar on 2002-03-05 (w3c-dist-auth@w3.org from January to March 2002)

From: Eric Sedlar <eric.sedlar@oracle.com>
Date: Mon, 4 Mar 2002 17:37:45 -0800
To: <w3c-dist-auth@w3.org>
Message-ID: <004a01c1c3e6$59a7fb40$3bab2382@us.oracle.com>
My comments inlined, with longer response at the end...

----- Original Message -----
From: "Julian Reschke" <julian.reschke@gmx.de>
To: "Eric Sedlar" <Eric.Sedlar@oracle.com>; <w3c-dist-auth@w3.org>
Sent: Monday, March 04, 2002 2:27 PM
Subject: RE: DAV-Enabled field (was RE: A case for GETSRC)


> Eric,
>
> thanks for taking the time to write this long reply. I am putting some
> comments inline...
>
> > From: Eric Sedlar [mailto:Eric.Sedlar@oracle.com]
> > Sent: Monday, March 04, 2002 8:48 PM
> > To: Julian Reschke; CJ Holmes; w3c-dist-auth@w3.org
> > Subject: RE: DAV-Enabled field (was RE: A case for GETSRC)
> >
> >
> > It's not clear to me that the generated output from GET on executables
> > actually is a different resource.  First of all, it doesn't make sense
to
> > think of it as having properties separate from that of the
> > script.  I doubt
>
> It does. For instance, DAV:getcontenttype will almost certainly have a
> different value, so will DAV:getcontentlength.
>
> > if anyone actually doing the URL space mapping proposed would
> > keep around a
> > set of dead properties for the generated output, but the case becomes
> > clearer when looking at the live properties.
>
> Well, I wouldn't expect a server to store dead properties for a "output"
> resource, but this is really an implementation detail.
>
> > If I have a directory full of .ASPs, or .JSP files, and I do a PROPFIND
on
> > that directory, will the server actually execute all of the scripts in
the
> > directory in order, dumping their output to a temp file, to ensure that
> > dav:getcontentlength is correct?  I bet nobody actually does that
>
> A conforming server will either report these properties as empty or as
> non-existant (unless it's actually able to compute these values cheaply).
>

The point is that you will not be able to get <dav:getcontentlength> and
<dav:getetag> that have the correct values, which the spec pretty clearly
states should be returned if it is a DAV-enabled resource.

> > (it would
> > be "considered harmful" like PROPFIND depth infinity), and if you did it
> > would be stupid.  I haven't checked this, but I bet that most
> > servers return
> > the length of the underlying script as the length of the
> > generated output on
>
> IMHO this would be a bug.
>

Whatever.  They return something other than the correct value.

> > a PROPFIND depth 1.  The same difficulties apply to <get-etag> and many
> > other live properties.  The reason this doesn't matter in practice is
that
> > returning the wrong etag or wrong content length for generated output is
> > irrelevant, since in the general case it cannot be cached, and it works
>
> Why do you say that generated output can not be cached? A big part of the
> HTTP spec treats this very problem.

I say that in the GENERAL case it cannot be cached.  For example, a script
that returns the value of a row in a database table (say the current
inventory total on Amazon.com of Harry Potter books) cannot be cached, since
the return value is not simply a function of the input arguments in the
request method.  Those generated output entities that are strictly functions
of the input arguments can be cached, by using the techniques in HTTP/1.1.

>
> > functionally just as if the resource was changed before the
> > subsequent GET,
> > so it's ok if the value of <getcontentlength> is different in a
> > PROPFIND on
> > its parent and the actual content length returned when you do a GET.
> >
> > I think that the better model is if you think of the GET method more
like
> > the "OPEN" action on Windows or Macintosh, which means to execute
> > this file
> > if executable, or else display the contents of the document.  GET
> > is really
> > like a simple case of the POST method (which was originally how it was
> > conceived).  To illustrate this point, let's say that a particular
server
> > said that POST on a non-executable resource (like a static file)
> > would dump
> > its contents (like a GET).  Now we can clearly see GET on a script as a
> > simple case of POST.
>
> GET doesn't dump "the content" of a resource. GET returns an entity which
is
> one of many possible representations of the resource, which may depend on
> non-URL based information in the headers (in which case these headers
should
> be listed in the "Vary" header).
>
> > Julian's argument below gets at the heart of the problem--is each set of
> > possible output generated by a script a different resource?  First of
all,
> > there are many different things that can affect the output of a
resource,
> > and most of them are not in the URL (which is the only thing that can be
> > used to differentiate two resources).  While the URL arguments in
> > the query
> > string are one set of arguments to the script, clearly information in
the
> > header may also be used (even in a GET method).  For example, the
>
> Yes.
>
> > authentication headers may be used to choose whose shopping cart will be
> > displayed, or in POST, the form values appear in headers.  If, as Julian
> > suggests: GET /foo/my.jsp?arg1=val1  is a different resource than GET
> > /foo/my.jsp?arg1=val2, then POST /foo/my.jsp with a header Arg1:
>
> GET ... is not a resource, it's one representation of a resource.
>
> URLs (as they are URIs as defined in RFC2396) identify resources. If they
> differ, this is a good indication that the resource they identify differs
as
> well.
>
> > val1 should
> > be different than POST /foo/my.jsp with header Arg1: val2.  This
obviously
> > makes no sense.
>
> I don't understand the analogy to POST. POST is very different from GET in
> it's semantics (it's not just a distinction in marshalling).
>

Why is POST different in semantics from GET?  Since we are confining
ourselves to discussing generated output, I see them as just having
different ways of marshalling arguments.

> > So, here are my arguments for why each possible output generated
> > by a script
> > should not be a separate resource:
>
> I didn't say that. What I'm saying is that the set of outputs (that can be
> generated by a GET on a URL) is a different resource than the source
> resource.
>
> > * they may not have separate URLs, since not all of the arguments
> > that cause
> > different output may be in the URL string--they may be in headers, or
may
> > depend on other server state
>
> That's fine. So all of them are the same abstract resource.
>
> > * it is impractical to manage dead properties for each possible output
>
> That's no problem. If you don't want to, don't.
>
> > * it is computationally difficult to return dead properties (such as
> > dav:getcontentlength) for each possible output
>
> Again fine. Don't return them.
>
> > Therefore, I conclude that generated output is not a resource, and
>
> You can't conclude that, because the generated output *has* a URL, so it
> *must* be a representation of an abstract resource.
>
> > especially not a dav-enabled resource.  I would recommend that server
> > implementers NOT return script output as dav enabled.
>
> That's fine. You don't have to.
>
> > Now, on to the practical argument, which will demonstrate that
> > dav:source is
> > a lot less usable than GETSRC/Translate:
> >
> > let's say that I have a directory full of scripts and static
> > files.  I want
> > to manage them from a authoring point of view.  If I want to implement
> > dav:source, I need to create some virtual URL space that
> > corresponds to the
> > physical URL space where the actual script source is stored.  I
>
> Actually, I'd say that you should create mappings for the outputs, not for
> the sources, but this is just a matter of taste, I guess.
>

That option is covered in case #2.

> > can make the
> > virtual URL space return either the generated output, or the source, at
> > least from an implementation point of view.  For example:
> >
> > Contents of /mydir:
> >
> >    foo.jsp
> >    bar.html
> >    bar.gif
> >
> > In case 1, let's say I make /scripts/src/<real-path> a virtual URL space
> > that returns the source of any executables (for DAV clients), so where
GET
> > /mydir/foo.jsp will return the generated output,
> > /scripts/src/mydir/foo.jsp
> > will return the source.  dav:source on /mydir/foo.jsp will point to
> > /scripts/src/mydir/foo.jsp.  The DAV client doesn't actually want to
mess
> > with /mydir/foo.jsp since it is generated, has bogus live property
values,
> > doesn't respond to things like PUT, and is generally not very
> > helpful.  Now
> > I'm forced into a separate tree, which is certainly annoying from a
>
> That's something the client should do for you.
>

It can't do this easily, because it doesn't know the general rule of mapping
the generated output URL to the source, it just knows the answer on a URL by
URL basis.

> > useability point of view, and extremely annoying when doing things like
> > copying with Web Folders, since when I drag & drop my app from the local
> > filesystem to the server, my .jsp files move to another place, and when
I
> > copy the application back down to my local PC, the .jsps were
> > moved (even if
> > I was smart enough to realize where they magically went to).  So,
> > this is a
> > bad solution.
>
> Why don't you just edit *all* files in the source directory and leave the
> output resources alone?
>
See case #2, below, again.

> > In case 2, let's say I make /script/output/<real-path> a virtual URL
space
> > that returns the generated output for the executables.  Now the DAV
client
> > is happy, since the server space looks like the space on the client.
> > Unfortunately, all of the <href> tags in any HTML output are
> > screwed up vs.
> > the local copy (e.g if bar.html wanted a link to foo.jsp, or vice
versa),
> > since neither a relative nor an absolute URL will work, and the
> > URLs needed
> > are server implementation-dependent.
> >

Julian, you have suggested that case #2 is the preferred solution (creating
the virtual URL space for the generated resources), but you haven't
addressed the linking problem between the static & generated content.  You
have not given a complete suggestion.

> > The reality of the world is that people want to be able to move their
> > content creation environment back and forth between local filesystems
and
> > DAV servers.  Any usage of dav:source would break that abstraction,
since
> > local filesystems aren't capable of implementing virtual URL spaces.  I
>
> Again, that's not true. You can author the *source* URL space just like
> always. All you need the DAV:source property for is to discover where
those
> files sit.
>

Are you suggesting that when dragging & dropping some Web Folders down from
a server, the client should examine the dav:source properties, and copy down
the entities returned by GET on those URLs those represent, rather than the
entities returned by the original URLs?  While this could be done, it
certainly breaks backwards compatability with almost all existing DAV
clients.  Plus, it could result in multiple copies of the source after the
download.

> > think these are the practical reasons why nobody has implemented
> > dav:source
> > in the past 3 years.
>
> The DAV:source property is underspecified in the spec, that's the reason
> it's not implemented. That doesn't make the concept itself bad.

What part of dav:source is underspecified?  I think it is pretty clear from
the spec.  It just doesn't solve the problem in a usable way.

>
> > So, it is clear that:
> >
> > * generated output (from stuff like .jsps, .asps, cgi) does not make
sense
> > as a DAV-enabled resource, since the only methods it is likely to
> > respond to
> > intelligently is GET.
>
> I don't have any problem with this. If output resources aren't WEBDAV
> enabled, you just lose the ability to use DAV:source to discover their
> sources.
>
> > * separating the URL space of generated output and source causes
usability
> > problems, making life awkward either for the DAV client, or for the HTML
> > linking
>
> I don't agree.
>

That's because you aren't considering backwards compatability.

> > I think the argument for a GETSRC method becomes clear if you mentally
> > rename GET to LIMITED-VERSION-OF-POST.
>
> GET and POST are fundamentelly different things.
>

Why are they so fundamentally different?

> > Reponses to a couple of other points in more recent emails:
> >
> >   Alan Babich says:
> >
> > "Is the source the same entity as its output?" Julian and I agree
> > that it is
> > not. RFC2616 section 9.3 says it is not.
> >
> >   Response:  the issue is not whether or not the source and the output
are
> > the same "entity", but the same DAV resource.  Entity and resource are
>
> ...the same HTTP resource...
>
> > different things.  There are three general categories of things that can
>
> Yes.
>
> > cause the output of a generated script to vary:  URL information, e.g.
the
> > query string; header information; or server-state (like in a database).
> > Clearly the URL isn't sufficient to differentiate between all the kinds
of
> > server-generated output, so the various entities that can be
> > generated by a
> > script cannot be mapped as separate resources, since they won't have
> > separate URLs.
>
> If they don't have separate URLs, they aren't different abstract
resources.
> In the case of the shopping cart mentioned earlier, the resource
identified
> by the URL is "the shopping cart of the currently logged in user". It's no
> problem if it's different for different users.
>
> >   Roy Fielding says:
> >
> > Any server that allows clients to vary GET responses based on the
> > Translate
> > header field MUST include "Vary: Translate" in every response to the GET
> > method in order to remain compliant with HTTP/1.1.
> >
> >   Response:  Actually, RFC2616 says: "An HTTP/1.1 server SHOULD include
a
> > Vary header field with any cacheable response that is subject to
> > server-driven negotiation."  So, clearly, Microsoft is in compliance
with
> > the HTTP specification since this is not a MUST requirement.
>
> SHOULD (as defined in RFC2119):
>
> "This word, or the adjective "RECOMMENDED", mean that there
>    may exist valid reasons in particular circumstances to ignore a
>    particular item, but the full implications must be understood and
>    carefully weighed before choosing a different course."
>
> To still claim conformance, one would have to specify these valid reasons.
> Can you think of any?
>

I'm sure MSFT could come up with some valid backwards-compatability reasons
or something.  The point is that an HTTP/1.1 client cannot expect anything
with SHOULD to always be the case

> > Clearly it is
> > acceptable to use header values to vary the content of generated state,
>
> Yes, but by doing to you state that all entities that you return are
> representations of the same resource. Are they?
>
> > whether it be the Translate header or the Authentication header,
> > and whether
> > or not the Vary header is supplied is a completely orthogonal issue.
>
> Yes. But you SHOULD indicate it in the "Vary" header.
>

Great.  We can agree that the Vary header is a side issue and not relevant
for the purposes of this discussion.

> > Here's another way to look at the categorization of response entities
for
> > executables:
> >
> > 1) the source
> > 2) generated output that does not vary based on input arguments
> > from client
> > 3) generated output varying with input arguments from client encoded in
> > query string
> > 4) generated output varying with input arguments from client encoded in
> > request headers
> > 5) generated output varying with server-side state (like information in
a
> > database)
> >
> > I don't see why case #2 and case #3 (in Julian's example) should
> > be awarded
> > its own status as a "real dav resource" when cases #4 and #5 are not.
>
> Because resources are *identified* by their URLs. If the URL is the same,
> it's *by definition* the same resource.
>

Now you are arguing circularly.  You say that because the source and
generated output are different resources, they should have different URLs.
Now you are saying that because they have different URLs, they are different
resources.

> > I'd like to see someone propose a scheme for implementing dav:source
that
> > solves the usability problem when mapping to the local filesystem, if
they
> > believe in dav:source.
>
> First of all, I strongly object to this line of argument. Even *if* you
can
> demonstrate that DAV:source as it is doesn't work, this doesn't mean that
> this makes the Translate header an acceptable solution.
>
> This having said, I don't see any problem having two separate URL spaces,
> and to only author the source space. The output space *may* be DAV
enabled,
> and if it is, it should use DAV:source or something similar we'll come up
> with to indicate where the source is.
>

My line of argument is that having a separate URL space for generated output
from the underlying executable doesn't work well because:

  * the generated "resources" won't have the correct values for many live
properties (other than dav:source!!!), and will probably (your words) not
support dead properties
  * the generated "resources" will only respond to the GET method, not other
WebDAV methods
  * creating a virtual URL space for generated outputs (as you suggest)
makes the <href> linking unworkable when doing the 1-1 sitecopy semantics
between local filesystems & servers that is common practice today, thus
making the separate URL space idea unusable if backwards compatability is
maintained.
  * just because the response body is different based on the Translate:
header isn't really a different argument than having a different response
body by varying the authentication header.  By this argument, each possible
response body that can be altered by varying a header is a different
resource, which clearly does not make sense.

dav:source is an implementation suggestion that requires separate URL
spaces.  GETSRC/Translate are various ways of handling the generated output
as the same URL.  My argument is against the separate URL space, not the
dav:source header as specified by RFC2518, which most people seem to agree
is broken (you believe it is underspecified, and I believe it is the wrong
architecture).

Julian, fundamentally your response is self-contradictory.  At one point,
you say:

1) "the set of outputs (that can be generated by a GET on a URL) is a
different resource than the source resource".

but later you say that:

2) "Because resources are *identified* by their URLs. If the URL is the
same, it's *by definition* the same resource".

If a set of generated outputs is generated differently by varying parameters
in the query string, statement #1 would imply that the entire set of valid
outputs is one resource.  Statement #2 implies that this set is a bunch of
different resources, since their URL is different.  You are using a
different number of resources per executable at different points to reply to
different points in my argument.  I'm saying there is one resource, used for
the script & generated output.  At first you are proposing 2, and later you
propose n, where n is the number of possible combinations of legal input
querystring values.  Which is it?

Julian's other argument is that you might want to have existing clients be
able to edit the source just by pointing to some URL.  Well, without
understanding dav:source they won't know what the URL is to use for the next
GET method.  So, clearly clients will need to be upgraded either to handle
Translate:/GETSRC or to handle the dav:source header.  Existing clients
won't cut it, so I don't by this reasoning, anyway.

--Eric
Received on Monday, 4 March 2002 22:43:49 UTC