RE: Issue: URI_URL, proposed changes from Clemm, Geoff on 2002-07-14 (w3c-dist-auth@w3.org from July to September 2002)

From: Clemm, Geoff <gclemm@rational.com>
Date: Sun, 14 Jul 2002 11:45:26 -0400
To: "'Webdav WG (E-mail)'" <w3c-dist-auth@w3c.org>
Message-ID: <3906C56A7BD1F54593344C05BD1374B1077619AC@SUS-MA1IT01>
On the off chance that the details of this particular thread 
are probably becoming less interesting to anyone other than 
Julian and myself (:-), I'll just summarize my position:

- We should adhere to the RFC-2396 definitions of URI, URL, and URN,
since no problems with those definitions have been identified.  I do
not consider the fact that many people don't know/use those
definitions to be a problem, since the "commonly held" definitions
incorrectly consider URL and URN as partitioning the URI space
(i.e. problems have been identified with those "commonly held"
definitions).  We could try to make up yet another definition (that
is different from both RFC-2396 and the "commonly held" definitions)
but I see no point in doing so.

- We should use "URL" or "URN" in 2518bis whenever we are defining a URI
that has the semantic properties of a URL or a URN (as defined by 2396).

Specific responses to Julian's most recent comments embedded below.

Cheers,
Geoff


   From: Julian Reschke [mailto:julian.reschke@gmx.de]

   >    >    > URL-references).  We also have some URIs which name a
resource
   >    >    > (in particular, lock tokens).  These should be called URNs.
   >    >
   >    >    I disagree. URLs are just a subset of URIs.
   >    >
   >    > We don't disagree there.
   >
   >    RFC2518 has defined that lock tokens are identified by URIs with no
   >    restriction.
   >
   > That is incorrect.  The first line of section 6.4 (that defines a lock
   > token URI) states: "The opaquelocktoken URI scheme is designed to be
   > unique across all resources for all time."  That makes an
   > opaquelocktoken URI a URN, and disallows using non-URN URIs.

   Wrong. Lock tokens are defined in section 6.3.

It is true that I quoted the wrong section.  But if you read section
6.3, it says the same thing, i.e.: "Lock token URIs MUST be unique
across all resources for all time."  That requires all lock token
URIs to satisfy the definition of a URN, which defines each lock 
token URI to be a URN.

   Section 6.4 defines a particular URI scheme that *can* be used to
   identify locks. It doesn't say that you need to use it to represent
   lock tokens :"This specification provides a lock token URI scheme
   called opaquelocktoken that meets the uniqueness
   requirements. However resources are free to return any URI scheme
   so long as it meets the uniqueness requirements."

And "meeting the uniqueness requirements" makes it a URN.

   >    An attempt to restrict lock tokens to specific URI types (are
   >    you trying to do that?)
   >
   > No, I have (several times :-) emphasized that whether or not a URI
   > is a URL or URN does not require that the URI belong to a specific
   > URI type.  A URI in any scheme can be a URL, if it can be used to
   > apply a method to the resource identified by that URI.  A URI in any
   > scheme can be a URN if it satisfies the semantics requirement of a URN.

   I don't have any problem with that definition, however I'm sure
   that 99% of the readers will have the same definitions of URL/URN
   in mind when they read the spec, thus I fear unnecessary confusion.

Then all we have to do is explicitly repeat this definition in the
terminology section of the revised version of 2518, supplemented
with a reference to 2396.

   > This is an excellent example of why we should carefully use URI,
   > URL, and URN in the spec.  The statement in section 13.10 of the spec
   > is potentially ambiguous.  In particular, it states: "there is
   > typically only one destination (dst) of the link, which is the URI
   > where the unprocessed source of the resource may be accessed."  I
   > interpreted the "typically" to just be a qualifier of "only one".  It
   > sounds like you interpreted "typically" to also be a qualifier of the
   > phrase following the comma (while I interpreted it as a definition of
   > "link").  I believe my interpretation is correct, because without
   > that, the source link is basically useless ... "here's is a URI of the
   > source, but you can't use it for anything of interest".

   I have stopped thinking about the current wording for the source property
a
   long time ago. It doesn't make sense. It defines a resource property
   (DAV:source), which contains the name of the resource itself as an
element
   (DAV:src). It just doesn't make sense (one of the reasons I raised the
issue
   in the issues document).

I don't see that this needs to be a complicated issue.  The DAV:src
field allows you to indicate that multiple resources depend
on this link, to give the client an indication of all the other
resources that would be affected by authoring the DAV:dst of that
link.  For Web content management, an example of this would be a
"template" resource from which a large number of other resources are
derived.  In this particular example, the number of other resources is
so large that I think it unlikely that a server would ever enumerate
them in the DAV:src field.  Because I think this will commonly be the
case, I'd be happy to deprecate the DAV:src field in the revision of
2518.  I personally would also like to rename the DAV:dst element to
just be a DAV:href element (for interoperation with the
DAV:expand-property REPORT), but that depends on whether any clients
are really using DAV:link and therefore expects DAV:dst (current
indications are that there are no such clients, so it would be
safe to make this change).

   >    Example: I might want to have a source link from
   > 	   http://greenbytes.de/tech/webdav/rfc3253.html
   >    to
   > 	   urn:ietf:rfc:3253
   >    Do you say this is wrong?
   >
   > Yes, it is wrong if urn:ietf:rfc:3253 is not a URL (i.e. cannot be
   > used to apply methods such as GET and PUT to the resource that it
   > identifies).  But of course, urn:ietf:rfc:3253 easily could be a URL,
   > in which case it would be fine.

   I disagree. "urn:ietf:rfc:3253" is a URN for a resource. Actually,
   it is *the* URN endorsed by the IETF to identify the versioning
   spec.  If I want to signal that a document depends on a source
   which is a RFC, it completely makes sense to refer to it this way.

I believe the central value of the DAV:source field is that it allows
the client to perform methods on the resource identified by the
DAV:dst field to cause changes to the content of the resource.  This
value is not provided if the URI in the DAV:dst field is not a URL.
Having a URI that is not a URL would not provide a mechanism for
the client to effectively author the resource, which leads me to 
conclude that the spec should require the DAV:dst field to be a URL.

   BTW: Can you define what "applicable to methods such as GET and
   PUT" means?  Is "mailto:" a URL scheme?

It probably would be valuable to identify a minimal set of methods
that SHOULD be applicable to the DAV:dst URL, in order to provide for
interoperation.  Perhaps: "Either GET/PUT or PROPFIND/PROPPATCH SHOULD
be supported on the DAV:dst URL".

   >    >  "there is typically only one destination (dst) of the link,
   >    >   which is the URI where the unprocessed source of the resource
   >    >   may be accessed."

   ... it probably makes little sense to argue about this paragraph
   unless we can find somebody who can explain what was intended. The
   DAV:source property *as defined* doesn't make sense (see above).

Do you just mean the lack of utility of the DAV:src element?  That's 
easily resolved by deprecating the DAV:src element.  Then we just
rewrite that section as:
"The destination (dst) of the link is the URL where an unprocessed
source of the resource may be accessed.  Typically there is only one
DAV:dst element, but there could be multiple if the resource content
is derived from multiple inputs."

   > Whatever spec is defining the semantics of where that URI appears.  If
   > it is required to have the semantics of a URL, it is a URL.  If it is
   > required to have the semantics of a URN, it is a URN.  This gives the
   > client a vital piece of information about how it can use that URI.

   I agree that it makes sense to give the client this piece of
   information. I don't agree that just saying "URL" or "URN" will
   achieve this goal.

Think of it as a "public service" (:-).  If a large number of people
are using the terms URL and URN in confusing ways, we'll be a shining
example of useful and consistent usage of those terms.  The W3C "URI
interest group" can then focus on debating details of urn: scheme
registration (:-).

   > As Larry indicated, the concepts of a URL and URN have fuzzy
   > boundaries.  If something in theory would allow you to locate
   > something, but you don't have the locator software installed on your
   > machine, is it still a URL?  (I say, yes).  If something in theory

   OK, can you give an example for a URN-typed URI which in *theory* can't
be
   located by some dictionary mechanism? It's not globally unique by magic
--
   it usually depends on naming authorities (DNS, phone numbers),
dictionaries
   (urn scheme, ISBN) and the like.

Exactly.  As indicated earlier, something doesn't become a URN or a URL
by some syntactic magic.  It becomes one because the implemented client
and server mechanisms make it one.

   > could be located, but no software has ever been written to locate it
   > using that URI scheme, is it a URL?  (I say, no).  But most cases are
   > clearer.  The point is, what can a client assume.  Can it assume that
   > it should be able to apply methods to the resource through that URI?

   But what is this knowledge good for? Unless the client happens to
   *know* the URI scheme, it won't be able to do anything useful with
   the URI (other than use it as identifier), even if it knows it to
   be a "locator".

Of course a particular client may not have the software to access that
URL, or the server that should be responsible for applying methods to
that resource may not actually do so.  What we are trying to avoid is
having implementors misunderstand the semantics of a particular field.  This
gives the client and server writers essential guidance wrt what kind
of URI they should allow in a particular field.  With the example above,
if we decide that the DAV:dst field should contain a URL, then a server
should not store a urn: URI that they know is not commonly supported
by clients and servers to apply methods to that resource.

   > Neither RFC-2396 nor I have ever maintained that the URI space is
   > *partitioned* into "locatable" (URL) and "naming" (URN).  In each of
   > my messages in this thread, I have emphasized the opposite, i.e. that
   > a URI can be a URL, a URN, both, or neither (and quoted from 2396
   > where it explicitly states that these concepts overlap).

   Yes. However, as the ID mentioned states, people *think* that there
   is a partitioning. That's a problem that RFC2396bis should
   solve. We should stay out of this business.

I disagree.  If some people are using a common term in a confusing and
logically inconsistent way, and if there is a standard (that was
reviewed and accepted by the IETF), then we should just emphasize in
the revision of 2518 that we are using the standardized definitions
(and repeat those definitions in our spec).  An analogy is the term
"or".  In popular culture it is ambiguously used to mean either
"exclusive or" or "inclusive or".  This doesn't mean we can no longer
use the term "or" in formal documents.  It just means that we should
emphasize that we are using the standard formal meaning, not the informal
meaning.

   > I am using "locate" as it is defined for URL in 2396, and "name" as
   > it is defined for URN is 2396.  These are well defined terms in this
   > context (i.e. "locate" means "can be used to apply a method to the
   > resource identified by the URI", and "name" means "provides a globally
   > unique name for the resource identified by the URI").

   Again: what's the difference between "naming" (URN) and "identifying"
(URI).

Again: I am just using the terms as they are used in 2396.  All UR*I*s
"identify" a resource.  All UR*N*s "name" a resource.  All UR*L*s "locate"
a resource.  And of course, these are not the universal definitions
of these terms.  They are just how they are used in 2396.

   Can you give an example of a URI that identifies, but not "names" a
   resource?

Just repeating the example from the previous messsage,
http://www.webdav.org/deltav/index.html identifies a resource,
but does not name ("provide a globally unique name for") that resource,
because I have used MOVE to cause that URI to identify different
resources at different times.

   >    Resources never are *available* at a URI.
   >
   > That will come as a surprise to the authors (and reviewers) of
   > RFC 2518 (:-).  In the section we have been discussing, and which
   > I have been quoting) states: "A resource may be made available
   > through more than one URI."

   Another thing we may want to fix.

If it ain't broke, don't fix it (and in this case, if it ain't broke,
don't break it :-).

   >    *Representations* of resources may be available. I think you are
   >    saying that any URI that can be used to locate a representation of
   >    a resource is a URL.
   >
   > I am just using terminology in the same way it is used in
   > 2518.  One can certainly make the statement more detailed in
   > the way you have above, but I think the 2518 usage is fine.

   Probably not. People may get the impression that WebDAV somehow
magicallly
   allows to do something HTTP can't (and doesn't attempt to). HTTP's GET
   retrieves *representations* of resources. WebDAV doesn't change that.

But WebDAV isn't just talking about GET ... it also talking about
PROPFIND and others.  Saying that a resource is available says that
there are methods that can be applied to that URI that will not fail.
In particular, some resources that are "available" do not support
GET, so we are not talking about the "availability of the representation".

   For instance, GET/PROPFIND followed by PUT/PROPPATCH is not
   guaranteed to do the same thing as COPY (even when not taking
   versioning/acl into account).

"Availability" says nothing about the semantics of successive operations.
It just says that a method can be applied to that URI and succeed.

   >    That may be technically correct (so *I* can basically turn every
   >    URN into a URL by implementing a resolver for the scheme).
   >
   > And if such a resolver becomes so widespread that a client can
   > reasonably count on it being available, it is a URL.

   Who is the authority I can ask whether a particular URI scheme can
   "now" be considered a URL? Actually, if Microsoft decides not to
   support "gopher:" anymore -- does the gopher URI loose it's
   URL-property?

Is indicated earlier, this is not a formally demonstrable property
of a URI (such as its syntax).  It is a qualitative property, and there
will be grey areas.  And yes, if a URI scheme is no longer widely
implemented, the URIs in that scheme are no longer URLs.  And similarly,
if a urn: scheme is widely implemented to not provide globally unique
naming, then a URI in that urn: scheme would not be a URN.  The concepts
of URN and URI are worthless if they just refer to some syntactic
notion (e.g. "has a scheme name that begins with the characters 'u',
'r', and 'n').

   >    If you agree that a HTTP URL can be used to identify a lock token,
   >    thus it *is* both a URN and a URL, could you please give an example
   >    of a HTTP URL that doesn't share these properties (thus *isn't* a
   >    URN)?
   >
   > Sure.  http://www.webdav.org/deltav/protocol/index.html.  I personally
   > have associated that URL with different resources over time, because
   > of various bad interactions with locking (at least, I think that was
   > the problem).  If that resource had a URN property, such as a
   > DAV:version-history, you would have been able to notice as much.
   > The same is true for most (probably all) resources on
   > http://www.webdav.org (although there are some links to some http
   > URNs in the http://www.ietf.org/ namespace.

   Well, that's partly a question of how you define a resource. Note
   that even though a HTTP resource is implemented inside a specific
   HTTP server, that doesn't necessarily mean that it changes just
   because internally the implementation of the resource changed.

That is of course true.  Just like the concepts of a URN and URL, the
concept of "object identity" has fuzzy boundaries (in all domains,
not just IETF protocols).  As we introduce more advanced protocol
funtionality (versioning, binding), these boundaries start to get
firmed up though.  In particular, those protocol extensions make it
clear that MOVE associates an existing resource (with a given identity)
with a different URI, while COPY creates a new resource (whose identity
is different from the resource that was the source of the COPY).

Cheers,
Geoff
Received on Sunday, 14 July 2002 11:46:09 UTC