Re: Clarifying what a URL identifies (Four Uses of a URL) from Tim Berners-Lee on 2003-01-24 (www-tag@w3.org from January 2003)

From: Tim Berners-Lee <timbl@w3.org>
Date: Thu, 23 Jan 2003 22:48:51 -0500
To: "Roy T. Fielding" <fielding@apache.org>
Cc: Sandro Hawke <sandro@w3.org>, www-tag@w3.org
Message-Id: <C048C9BD-2F4E-11D7-B288-000393914268@w3.org>
>
>>> A resource, thus defined, has access mechanisms whereby you can 
>>> retrieve
>>> and update representations.  This formalism is complete, consistent, 
>>> and
>>> highly robust in practice, underlying the construction of the most
>>> succesful information system in history.
>>
>> In fairness, I think this only applies to HTTP 1.1, not the entire 
>> web.
>
> No.  Go look at the code and see how it handles all URIs.  HTTP is
> an extension of that interface across the Internet.
>

Ok, here is one hook to a difference in the model you and I have,
Roy.  You point out that the API in libwww basically provides
the functionality of HTTP, and at the same time gives access
to FTP and so on.  You use this an an illustration of a theory that
all URIs have the same interface as HTTP, that HTTP
extends over the web the interface of libwww in a quite generic
way, while other protocols only support some of the features.
Hence the ability of HTTP proxies to provide access to FTP and
Gopher.

Which is is logical. However, it does not address the range of all
URI schemes, and of course as HTTP basically doesn't play with
the fragid, it doesn't involve that at all.

It is a reasonable bit of software design for libwww to generalize
where generalization can be done, and it is not surprising that
HTTP, as a later design, "embraces and extends"  FTP.
And HTTP is in fact a good model for the Web, and the category of
URIs for which this model holds (http, https, ftp, gopher)
are important, because they form a web of network information
objects.  (I'm happy to call that the Web, and exclude "Web" Services,
by the way. We can call them "Internet Services" if you like.
I think this so far if what you call the REST model.).


But other URIs don't fall into that scheme.  mailto: URIs
identify mailboxes, and to say that you can make an HTTP proxy
represent a mailbox is a kludge.  A web site can have various
pages which give various sorts of information related to a
mailbox, but conceptually a mailbox is a delivery point
not an information object.  You could map HTTP's POST to it
but not HTTP's GET.

Similarly, telnet: URIs are end points for interactive sessions.
You can connect to one by a java obect in a web page, but
that doesn't mean they are like web pages any more than
a flower pressed in a book is a piece of paper.

So that is I think one way in which our formalizations of URIs
differ.

[..expletives deleted...]

Working *perfectly* for HTTP is not evidence that it works anywhere
>> else.   (other people have cited the parable of the blind men and the
>> elephant.)   And the success of the Web is of course due to many, many
>> factors.
>
> I have seen no evidence that it doesn't work, anywhere.  Some SW folks
> *claim* that if you allow the RDF producer to make ambiguous statements
> about both representations and the resource using only the URI as the
> target subject, then it results in ambiguity.  Well, of course that
> would cause ambiguity, which is why they are NEVER THE SAME THING on
> the Web itself.  The answer is: DON'T DO THAT.

RDF people do not in my experience use a URI to represent
both the resource and a representation. Well, I don't.
(Cwm has, for example,  a relationship -- a built-in function -- 
log:semantics
which relates a resource to what you get from retrieving a 
representation and
parsing it, and another, log:contents which relates a resource to
the bits of any representation of it)

If you assumed that is what people are doing , it may be because you
are mapping their words onto your concepts, not theirs. You maybe
forget that for me, for example, the car and the picture of the car are
distinct. It is the confusion between those which causes a problem.
Now, you don't write RDF so I am not sure how I discuss this with you.
I've written a lot of http://www.w3.org/DesignIssues/HTTP-URI 
specifically
about this and I don't know where to start.

I think you must agree that once my program accesses the web page
which we will say is a picture of a car, then it has a representation of
a picture on bits. It has therefore a concept of the picture.
The picture itself has important properties such as who owns it
and made it, and what its copyright information is.
You say that that is information about the representation, but I would
point out that a picture can have many representations, in JPG PNG
and GIF at various levels of resolution. They share owner,
copyright, date of creation, creator, focal length, genre, exposure,
orientation, and so on, because they are all what I would call 
representations
of the same picture, the same conceptual work.
This commonality is very strong, and points to the value of
being able to identify the thing they have in common: the picture.
And normally, when I want to make  a hypertext link to that
it is to the picture, not to a representation, that I want to make
the link.  So the argument that we are "just talking about
representations" doesn't fit the bill. It doesn't meet the
requirements to be able to talk about the picture as a conceptual
work.

Now, you say the owner of the HTTP URL can declare that it actually
identifies the car. I say that messes things up.   Suppose the owner 
does
that -- suppose they mark up the JPEG with a comment field indicating
that.  Now my client program has no ID for the picture.

Now here's the rub. When the URI was for the picture, then I
can indirectly identify the car with it, as "x, where <car.jpg> is a 
picture of x".
In N3 that looks like  "That which has picture car.jpg".

  [ has  :picture  <car.jpg> ].

That's cool.  Its what we do all the time to identify things for example
people by SSN. "The car whose picture hangs above your mother's 
fireplace"
and stuff.  KR sytems thrive on it.  What doesn't work is if we
say that <car.jpg> actually is an identifier for the car.
Because "the picture of the car" doesn't identify the picture  - it 
identifies
any picture of the car.

[ is :picture of <car.jpg> ]

You can write it but it doesn't work.  Its not a bug in RDF. It is a 
fundamental problem
with the URI system we assume that you don't have an identifier for the 
conceptual work.

An example you give often is a robot.  To an RDF system, a robot
which can be driven by the control panel at <robot.html>
can be formally referred to in just the same way
as  [ :controlPanel <robot.html> ].    (That which has control panel 
<robot.html>)
This works.

Let me summarize

- Web software needs to be able to express things about conceptual works
  They are a big part of the web system and of our society.
- When you identify a conceptual work, you can retreive representations
   and you can indirectly identify abstract things.
- If you say that the URI identifies an abstract thing you cannot refer 
tothe
   conceptual work.

Of course,  to use the same identifier for two different things in a 
formal
  system is a contradiction of the term "identifier".  The power of URIs 
is that
they are context-independent identifiers.

So I say that necessarily HTTP URIs directly formally identify 
conceptual works.
Indirectly they are used to identify consortia and cars and things.
Mailto: URIs do not identify conceptual works. They are not part of the 
rest model.

I think that you will find that the REST model is not harmed in any way
by introducing an extra concept of the conceptual work betwen
"representations" in  and what you used to call the resource.
I think you will find it has a nice consistency and solidity.

You asked for examples, by the way. I could give you some.
Some are linked from the N3 primer, and one which uses
concepts of resources and representations explicitly
to make rules for a trusted system where information is trusted
only when it is derived from a representation which is signed
is at http://www.w3.org/2000/10/swap/test/crypto




>> Once you step outside the formalism, not only do you want to know what
>> kind of thing a specific Resource is, but you notice that everone is
>> using each URI to identify several distinct things.   So the
>> fundamental premise of 2396 breaks as soon as you step outside the
>> formalism.
>
> Nothing in 2396 breaks because of that. 2396 defines the syntax for
> identification.  It doesn't define how URIs are used.  It doesn't even
> define how they are used on the Web.  What it does define is that they
> are identifiers and they identify resources and they do so using a
> uniform syntax.  Resources in RFC 2396 are not even limited to 
> information
> objects, since they are specifically intended to include the naming of
> physical things and do so quite well.  The scope of the REST model,
> for example, is more restricted than the scope of 2396.
>

Yes.  HTTP is basically a REST protocol.

> Regardless, how people use URIs (how a URI can be used to identify
> something indirectly, including those things other than the resource)
> is an entirely separate issue from the identity of a resource.  If the
> Semantic Web is only interested in identity, then it doesn't matter
> how many other ways that the URI is being used.

?

>  Likewise, regardless
> of how many new terms are invented to redefine the holy grail, there
> is no way to stop people on the Web from using a URI (any URI,
> regardless of scheme) in ways that the originator did not intend,
> and thus indirectly identifying things other than the originally
> intended resource.

These we call "errors".   If you own a URI you have the right
to say what it identifies. Other people can lie about it, but they are
lying if they do.

> The problem occurs when we face up to the fact that the Semantic Web
> is not just a generic KMS, and in fact is very interested in the Web
> and what people identify when they create anchors.

I think you are saying that the semantic web must use URIs
to identify exactly the same thing as anyone else. Absolutely.

> Once there, we must
> accept the fact that the Web defines URIs and methods as two separate
> protocol elements, and therefore it would be incorrect to define
> resources other than how they are defined in 2396 and used on the
> existing Web interfaces described by HTTP and implemented in dozens
> of independent open source projects that you are free to inspect.

Yes.

> URIs alone are not sufficient to target assertions about content on
> the Web, even if we restrict our discussion to resources that act
> like information repositories.

?

> It therefore behooves the Semantic Web to adapt to the Web as it
> exists and works in practice, not try to force new definitions on
> it that are misdirected and impotent.

The semantic wen system I have outlined above is neither misdirected
or impotent.  It is potent in that it can do everything it needs to:
identify what it needs to directly or indirectly.

It is not misdirected, because it fits in with the way HTTP works.
While it may not seem intuitive to you, I think you will find the
idea that the URI identifies a web page is not alien to most people.
Not that most people have to worry about the formalism, but
even then most folks understand that a URI at Amazon is
primarily that of a page at Amazon, and only identifies the
book indirectly.

People give the URI of someone's home page and know that
is what they are doing - they don't really feel they are giving the
person's URI.  I suppose one can bicker about what other people
conceptualize things and it isn't helpful.  Sometimes people have
used that sort of argument here.

The system I have is consistent and meets the requirements.
I haven't seen an alternative system which clearly defines
what a URI identifies, allows conceptual works to be referred to
and allows arbitrary things to be given identifiers indirectly.

Tim

PS: I haven't talked about the # in this message, and the RDF fragment 
ID
That is closely related but is a different set of issues.  You do also 
need
RDF to be able to claim foo#bar as an arbitrary identifier of an 
arbitrary thing
for the semantic web to work.
Received on Thursday, 23 January 2003 22:48:38 UTC