Re: now://example.org/car (was lack of consensus on httpRange-14) from Roy T. Fielding on 2002-10-11 (www-tag@w3.org from October 2002)

From: Roy T. Fielding <fielding@apache.org>
Date: Fri, 11 Oct 2002 00:27:20 -0700
To: "David Orchard" <dorchard@bea.com>
Cc: <www-tag@w3.org>
Message-Id: <E09CCAC6-DCEA-11D6-A593-000393753936@apache.org>
> I apologize in advance for being clearly dense on this subject.  It
> certainly is helpful for me to understand this area a little better.  And 
> I
> think we're getting close to the areas of my misunderstanding.

Oh, bugger -- I guess I have to respond to this one because you obviously
spent a long time writing it while I was writing the last one saying that
I wouldn't respond any further.  The chances of me ever having time to
work on the architecture document are approaching nil.

> I know you have said it plenty of times, that identification and access 
> are
> separate, but for the longest time http: things were access oriented.  
> Heck
> that's why the web is the set of network available information items.

That simply does not follow.  For the longest time, people drive cars.
That doesn't mean that cars cease to exist outside the act of driving.
There are thousands of people who make their entire living based on
designing cars, photographing cars, talking about cars, etc.  Consider
the VIN to be a reasonable identifier for a car.  Is the fact that a VIN
identifies a car in any way restrictive of the ways that people exchange
or use VIN numbers?  No.  When you use a VIN number for the purpose of
arranging financing and registration, the car is not accessed in any way.
Likewise, when a car is parted-out and found by the police, they use the
VIN to perform a registration look-up on the owner.  The identifier just
identifies the car, but it can be used to identify the owner, the leasing
agreement, the manufacturer, the creation date, and even a police record.

I don't understand this notion of access-oriented.  "http" URIs are used
as cache keys more often then they are used to access the resource.  It
is therefore a fact of life that "http" URI are more often used for the
purpose of identification than for access.  Likewise, authors use them
as identifiers when they put them in links -- they have no way of knowing
what mechanism will actually be invoked by the browser when it is told
to traverse the link.  Is the entire mechanism access-oriented?  No,
only the part where the furthest downstream client cannot respond to
the request on its own and makes an HTTP request on the origin server
is an access.

> I really do think that somebody seeing an http: thingy gets a strong
> implication that they ought to be able to have access to a representation.

That's a separate issue.  When I see any URI, including URN, I think I
ought to have access to a representation.  Not because I'm special and
deserve it, but because I might find it useful.  That's one reason why
HTTP is a universal proxying protocol -- it can attempt to obtain a
representation of any resource, regardless of URI scheme, using the
proxy mechanism.  URN, "now", "tdb", or whatever someone comes along
with next doesn't change that; it merely makes it more expensive.

> And I did read through almost all of the uri mailing list last night and
> today.  Some of the fun things I found: 1) the message from Marc 
> Andreessen
> about wanting to wrap up URLs as back end and out of user-site mechanisms 
> so
> he could start working on URNs; 2) the discussions about whether urls 
> should
> be prefixed with url: in plain text like email to indicate the presence 
> of a
> link.  Reminds me of our 4 year old xlink and html discussions on link
> identification.  3) I was briefly confused about the discussions around 
> SOAP
> that happened in 1994.

That's good -- I encourage it of anyone who thinks that they have some new
argument about this subject.  I am not kidding when I say that this
discussion has come up many, many, many times and the result is always
the same.

> Exploring this a bit further, I'll quote a part of 2396.  I know that you
> know it off by heart so I'm not trying to be cheeky, but it helps me in
> expressing my position.  "A URI can be further classified as a locator, a
> name, or both.  The term "Uniform Resource Locator" (URL) refers to the
> subset of URI that identify resources via a representation of their 
> primary
> access mechanism (e.g., their network "location"), rather than identifying
> the resource by name or by some other attribute(s) of that resource."
>
> When I take an http: URI and then classify it as a locator, name or both,
>  it
> turns out it is a locator because http: is the primary access mechanism, 
> eg.
> network location, for the URI.

Is that because of the scheme name, or because you happen to control
the naming authority and have placed a server at that location that
is able to respond to access requests?

> So saying that an http: uri does not imply
> access in any way just blows my mind.

All locators are also identifiers.  Whether something is a name or a
locator only impacts its ability to be directly used to locate the
resource, not its ability to identify the resource.

> The only reason the xmlns identifier
> doesn't imply access is because the namespace spec said so, and still 
> people
> keep on thinking or implying it does.  Namespaces had to specifically
> mention the lack of access, and lots of people said it over and over 
> again,
> and still it didn't stick.

What do you mean it didn't stick?  Do you have examples of XML namespace
parsers that access the namespace URI every time parsing takes place?
If not, then clearly it is being used as an identifier, and therefore
this argument is specious.  Yes, there is a small community of people
that populate the W3C mailing lists who refuse to accept NO for an
answer, but I don't think that changes the design one iota.  Whether
or not there is a representation available for a resource is a decision
of the people providing representations, not an aspect of the scheme.

> I want to re-emphasize that I agree totally that http: URIs do not REQUIRE
> access.  I just don't seem to understand why you don't think it implies
> access.

Because I implement HTTP client and server software and know from personal
experience that most traversals and use of http URI do not result in
access of any kind.  What you probably meant to say is that there exists
some implication that, given an http URI for identification, there is a
separate belief among users that some means of obtaining a representation
of the resource identified by that URI exists.  Damn right!  All important
resources should have URI, and all URI are dereferenceable (to varying
degrees of cost), so therefore it makes sense that people believe ANY URI
to be capable of producing a representation.  The only difference between
an http URI and a new scheme is the expense of deploying the dereference
mechanism, and people who don't understand that simply haven't worked
with HTTP proxies.

> Cuz it sure does to me and apparently a whole bunch of other
> people.  And having every spec that uses URIs have to say whether or not
> that access should be done (like namespaces), when it could be (at least
> from my pov) more easily expressed in having different schemes, seems to
> place an undue burden on developers and software.

How is having different schemes going to change the definition of a
protocol element?  Are you suggesting that these protocol elements that
are defined to be scheme-agnostic should instead be scheme-specific?
That maybe the specification would be less wordy if, instead of saying
that the element is used for identification (not dereference), that it
rather list the schemes that can or cannot be used within that element?
Or maybe you would prefer that the element not be a URI at all, since it
should be clear by now that any URI scheme is inherently dereferenceable?

> On to the first assertion.  I'm still confused about why you don't see the
> utility in changing behaviour.  I think this is my central 
> misunderstanding
> of your position, because it I guess I'm too uninformed to figure why we
> wouldn't want software to do something different.

Because that would change the definition of the protocol element, and
result in severely sucky performance.  I don't know of any example where
the decision of whether the protocol/language is using the URI for
identification or access should be subject to the URI scheme.  That's
just plain nuts -- I can't build reliable systems that way.

> I picked up the idea
> because people starting talking about how great the world was going to be
> when things that were only for identifiers could now be used for access, 
> and
> the gloating over those poor folks who were using urns commenced.

No, no, no.  What they are saying is that, given an important resource
and an identifier for that resource in the form of a URI, people will
eventually want to use it for access to representations of the resource.
Why?  Because people are curious.  This has nothing to do with changing
the software to perform some special access magic, nor is it prevented
by creation of an obscure URI scheme.  If it is important, then somebody
will provide a proxy for dereferencing that URI, either by proxy or by
URI rewriting, such as was described in 1995:

http://www.apache.org/~fielding/uri/drafts/draft-ietf-uri-roy-urn-
urc-00.txt

> So perhaps you can explain to me why software wouldn't need to change 
> when a
> URI went from having no representations available to having 
> representations
> available.

I have an infinite number of http URLs at my disposal that already have
that property.  Do you see any software changing because of it?

My definition of resource is a discontinuous, multi-valued mapping
function of representations over time.  The presence or absence
of representations at any point along the curve(s) does not alter the
function -- only the result of evaluating the function.  A URI is an
identifier for such a function, so saying that a URI scheme implies
access is equivalent to saying that I can't talk about the Pythagorean
theorem without performing multiplication.  It just isn't true.
I don't care how many times that the squares and square roots happen
in geometry classrooms around the world, it is still necessary for the
teacher's sanity to be able to identify the function separately from
using the function.

Can we please stick a fork in this issue?



Cheers,

Roy T. Fielding, Chief Scientist, Day Software
                  (roy.fielding@day.com) <http://www.day.com/>

                  Co-founder, The Apache Software Foundation
                  (fielding@apache.org)  <http://www.apache.org/>

Meet me at ApacheCon 2002, Nov. 18-21, Las Vegas <http://www.apachecon.com/
 >
Received on Friday, 11 October 2002 03:27:29 UTC