Re: Are URI-References bound to resources? from Roy T. Fielding on 2001-05-14 (uri@w3.org from May 2001)

From: Roy T. Fielding <fielding@ebuilt.com>
Date: Mon, 14 May 2001 05:40:35 -0700
To: Dan Connolly <connolly@w3.org>
Cc: Aaron Swartz <aswartz@upclink.com>, uri@w3.org
Message-ID: <20010514054035.A1098@waka.ebuilt.net>
Getting a bit long in this response -- I'll try to trim the old bits without
losing too much of the context...

On Mon, May 14, 2001 at 01:31:18AM -0500, Dan Connolly wrote:
> "Roy T. Fielding" wrote:
[...]
> > Ooops, missed that.  RFC 2396 treats those as two separate identifiers
> > within the same syntactical protocol element.  I could have sworn that was
> > in the spec.
> 
> Sure; but now let's take the two separate identifiers and look
> at them as one. For example, In HTML 2.0, it was called an
> "anchor address" and it was said to refer to an "anchor":
> 
> [[[
> A hyperlink is a relationship between two anchors, called the head and the
> tail of the hyperlink[DEXTER]. Anchors are identified by an anchor address:
> an absolute Uniform Resource Identifier (URI), optionally followed by a '#'
> and a sequence of characters called a fragment identifier. For example: 
> 
> ]]]
> 
> --        Hypertext Markup Language - 2.0 - Hyperlinks
> http://www.w3.org/MarkUp/html-spec/html-spec_7.html#SEC7
> Wed, 18 Oct 1995 06:29:36 GMT

The hypertext research that led to the Dexter reference model
[Commun. ACM, 37(2), Feb. 1994] defines anchors as that portion of the 
node component's content that is the head or the tail of traversing the
link -- in other words, the content enclosed within the "A" element of HTML.

If you wish to argue that RFC 2396's "URI-Reference" is, in fact, an
anchor reference as described by the Dexter model, I would agree.  The
reason it isn't called that is because I thought that one BNF rule was
good enough for several purposes and so chose a less-loaded term.
There has already been several requests to separate the BNF rules along
the dimensions of (relative x absolute x allows-fragment) in the next
revision of the URI spec, since some protocol elements don't allow
relative forms, others don't allow fragments, and some disallow both.
However, this doesn't change the fact that an anchor is not a resource as
defined by the Web architecture that has actually been implemented by us
over the past ten years.

> The editors of the RDF spec chose to use the term 'resource'
> for not only those things that RFC2395 calls 'resource', but
> for the whole set of things that the HTML 2 spec
> calls 'anchor'. That's not wrong; it's just different.

The fact that the RDF authors chose to use the term 'resource' to refer
to anchors is precisely what makes it wrong.  This isn't a mere question
of terminology.  It means that when I talk about Web architecture and when
people writing RDF reason about Web architecture, we can't agree.  The
basis of their reasoning is false.  Why is that important to this forum?
Because I was authoring the communication standards for the Web that
correspond to interoperable implementations of the Web, whereas they were
authoring the Resource Description Format -- a formalism for *describing*
the very things that I was working to develop.  If our definitions differ,
then their definitions are wrong, by definition!  RDF isn't allowed to
change those definitions until they change the standards first.

> > > Leaving aside the fact that a relative URI reference needs
> > > a bit of context (a base URI) to identify a resource,
> > > I'm fine with this view of things; in fact, I think it's
> > > the view used in the RDF spec; but you seem to think
> > > otherwise; I'm confused...
> > 
> > The RDF spec refers to the representation as the resource -- in other words,
> > that the fragment itself is a resource.
> 
> No, when the RDF spec says that
> http://www.w3.org/2000/01/rdf-schema#label 
> identifies a resource, it's not talking about some particular fragment
> of a particular representation of the (state of the)
> resource identified by http://www.w3.org/2000/01/rdf-schema ,
> any more than http://www.w3.org/2000/01/rdf-schema identifies
> just one piece of content. http://www.w3.org/2000/01/rdf-schema#label
> identifies something that you can learn about using HTTP;
> you can learn that it's a relationship between something
> and a (string) label for that something.

And that something is an anchor, not a resource.  I am not saying that it
isn't as valuable to reason about anchors as it is to reason about resources.
I am saying that they are two different things, and if you attempt to reason
about both of them at the same time then your reasoning will be wrong (or,
at the very least, filled with case-by-case exceptions to exclude anchors
when what you really meant was resources, and exclude resources when what
you really meant was anchors).  In other words, changing the definitions
is only hurting RDF.

Exactly what is meant by #label on the Web depends on the media type that
is retrieved as a result of accessing the resource identified by 
<http://www.w3.org/2000/01/rdf-schema>.  Yes, there are semantics associated
with that label, and you can reason about what the author of the reference
intends by specifying that label, but it is separate from the semantics
associated with the resource and the Uniform Resource Identifier that
precedes it.

> > > The interpretation he's asking about says that
> > >   http://example.org/q#foo
> > >     and
> > >   http://example.org/q#bar
> > > identify the same resource. Are you sure that's correct?
> > > i.e. can you justify that from the text of RFC2396?
> > > As far as I can tell, RFC2396 doesn't say what resource
> > > either of them identifies; it only tells you that
> > > yes, they identify the same resource after you remove
> > > the #fragid. Duh.
> > 
> > The resource identifier is before the fragment.
> 
> No, *a* resource identifier is before the fragment.
> 
> If you extend the definition of the term resource (in a consistent way),
> another resource identifier is the whole thing: http://example.org/q#foo
> .

As I said, it isn't possible to extend the definition of the term resource
in that way and remain consistent with HTTP.  You can only become
inconsistent with the rest of the Web, which is not what I look for in
a specification that purports to describe the Web.

> >  The fragment identifies
> > a subset of the representation retrieved after a GET of the resource.
> 
> That's one way to learn something about that identifier; there
> are other ways, too; you can get statements about it from
> other documents, etc.

That can be said about anything.

> > It isn't possible to "access" the fragment on its own, and hence it is
> > not itself a resource
> 
> Well, that's a limited view of the term resource. But it can
> be extended (again, in a consistent fashion) to include more
> things, like properties, classes, numbers, people, noses,
> pink elephants, and so on.

Yes, it is a limited view -- it is precisely limited to what is identified
by a URI and the semantics of the protocols called HTTP and HTML.  HTML
doesn't consider the URI reference protocol element to be inseparable.
It defines two separate semantics for handling the URI versus handling
the fragment -- they aren't even interpreted by the same code within
the browser's resolution engine.  At no time whatsoever is the fragment
considered the same as a resource within any running code that actually
performs some useful function on what you and I both know as the
World Wide Web.  And yet that same code operates on resources through
the generic Web interface, most often through the protocol HTTP, with a
very specific set of semantics that excludes fragments.

URI can identify properties, classes, numbers, people, noses, pink elephants,
and so on.  Fragments do not.  Fragments can identify portions of any given
representation of properties, classes, numbers, people, noses, pink elephants,
and so on.  The semantics of a portion of a representation, even when that
portion is the whole, are different from the semantics of a resource
even though it is possible to define another resource that does have the
semantics of identifying a portion of a representation of the first resource.
There is a huge difference in semantics between resource identifiers
(which are capable of redirection/indirection, access control, remote
authoring, etc.) and fragment identifiers (which are not capable of any
of the above because they have no such role in the Web interface defined
by HTTP, libwww, Apache, etc.).

> > until someone provides another URI that that
> > has the semantics of mapping to that representation as a resource
> > (i.e., an indirect reference).
> 
> Er... that's like saying "42 isn't a number until somebody
> writes down 6 and 7 and multiplies them." 42 always was
> a number. http://www.w3.org/2000/01/rdf-schema#label always
> was a resource (in the RDF sense).

But not in the WEB sense.  It doesn't become a resource until it is made
accessible as a resource and can be operated upon as a resource.  We aren't
talking about "resource" as it is used in English -- we are talking about
"resource" as it is defined by the Web architecture.  RDF does not define
Web architecture -- it is supposed to describe it.

In any case, to further the Douglas Adams memorial theme, I think he would
have appreciated this analogy: "42" and "What is 6 times 7?" are not the
same resource.  While "42" may be a representation of the resource "What is
6 times 7?", the fact that two different resources may have, at a particular
instant in time, an identical representation, does not imply that the
semantics of those two resources are in any way equivalent.  If it were
that easy, we'd already know the question that prompted the answer "42".

I can easily state the above using the defintions of the Web architecture
that I used for URI and HTTP and the REST architectural style.  I'll be
surprised if you can do so in RDF, because RDF doesn't distinguish between
representation and resource.

[...]
> > > >  At no time whatsoever is the resource transferred
> > > > across the network when doing a GET.
> > >
> > > Quite; who/what suggested it was?
> > 
> > That's what the definition you quoted said when it referred to the fragment
> > as a resource.
> 
> Read it again; it doesn't say that. The XLink folks didn't go there.
> http://www.w3.org/TR/2000/PR-xlink-20001220/#N789

Ah, that's better -- the bit you included before was out of context -- the
surrounding paragraphs are defining what "link" means, not resource, and
so I definitely misinterpreted the excerpt.

> > > Yes, but again, I don't see how that's directly relevant;
> > > the question is what
> > >   http://example.org/q#foo
> > > is bound to; and in particular, whether it's reasonable
> > > to say that it's bound to a resource.
> > 
> > Now you lost me.  What does "bound" mean?
> 
> For "X is bound to Y" read: "X identifies Y".
> (in other words: "X denotes Y")

Okey-doke -- I wanted to make sure you didn't mean it in a protocol sense.

> >  I would say that the namespace
> > is bound according to the authority example.org.  Since example.org is never
> > given the opportunity to interpret #foo,
> 
> Interpret? huh? Surely they're given the opportunity to say
> what http://example.org/q#foo means by putting a document
> at http://example.org/q , no?

No.  This is the part that is really hard for me to describe to others
without repeating twenty pages from my dissertation, and even then it will
take a while for it to sink in before the difference makes sense.

The semantics of a link are determined by the author of that link
according to their own set of desires, the current representation of a
resource obtained when performing a retrieval action on an address, and
their *expectation* regarding future retrieval actions on that address.
A resource itself is lent semantics by the way it is referenced and
by the willingness of the naming authority's service to maintain that
relationship consistently over time.

A balance exists between those who provide resources (naming authorities)
and those who reference resources (authors).  A major goal of the REST
architectural style is to maintain a separation of concerns between the
implementation of a resource and those who would make use of that resource,
while at the same time minimizing the need for authors to change their
references when the content of the information changes.  Nevertheless,
there exists a tension between the desires of an author to precisely define
the target of their links and the desire of a naming authority to manage
the semantics of the resources they provide as a service to the authors.

The "resource", as defined by RFC 2396, is the balance point -- the
authors control their anchors, whereas the naming authority controls
the resource identified by the URI by way of the generic Web interface.
Because the granularity at which the naming authority wishes to maintain
their resource identifiers may not be sufficient to meet the needs of all
authors, an author is given the ability to extend the anchor reference
by way of the fragment identifer, providing "extensibility of the namespace",
but at the cost of making the reference more dependent on the resource
implementation than is otherwise protected by the abstact interface of
separation between client and server.

In other words, a URI reference that includes a fragment establishes a
dependency between the anchor and both the semantics defined by the
resource and the format+content of the representation retrieved via that
resource, whereas a URI reference not including a fragment is independent
of the format+content -- it is still dependent on the naming authority
maintaining the semantics of the resource identified by the URI, but
the social agreement to maintain that relationship is implied by making
the resource available via a URI.  They can control that interface.
Providing that interface is what makes it a "resource".

BTW, the "naming authority" is not necessarily the same as the owner of
the content.  If they do have the ability to manage sub-content regions
within a resource, then they also have the ability to make those regions
accessible via URI separate from that of the container resource.

Are your bookmarks governed by example.org?  No.  Is it within your
power to include a URI reference with a fragment in your bookmarks?  Yes.
In doing so, who defines the semantics of that link?  You (as author of
your side of the link) do.  There is no way for the naming authority to
keep track of such things.  The Dexter model defined hypermedia systems
as requiring external link managers to maintain the integrity between the
anchors of every link, but they were incapable of sustaining relationships
at the scale of the Internet and were not intended to include nodes that
are as variable as the service resources on the Web.  The Dexter model is
not capable of describing the Web interface -- it assumed that no such
interface would be needed in the first place.

> > that part of the URI reference
> > isn't within example.org's authority, and hence it is not a resource.
> 
> Er...  the question that started this thread
> is asked in the context of not only
> RFC2396, where 'resource' is only used to mean things
> identified by absolute URIs, but *also* in the context of
> the RDF 1.0 spec, where 'resource' is also used to mean things
> identified by absolute-URIs-with-fragment-identifiers.
> 
> So it's not very interesting to answer "... hence it is
> not a resource in the RFC2396 sense of the word."

At what point did RDF obtain significance, in and of itself, to the point
where it is allowed to redefine the Web?  RFC 2396 is a draft standard
agreed to by the Internet community and implemented in countless pieces
of independently developed software that works.  So is RFC 2616.  As far
as I am concerned, RDF is a completely unproven proposal that attempts to
define a formalism for describing the architecture defined by those RFCs
and HTML/XML.  How can it do so when it doesn't even use the same definitions?

[...]
> > "It is possible to address a portion of a resource."  That is wrong.
> 
> No, it's not wrong; it's just different.
> 
> > It is possible to reference a portion of a representation.
> 
> Just as it makes
> sense to say (as RFC2396 does) that an absolute
> URI identifies a resource that you don't directly observe... you
> only indirectly observe its state by bouncing messages off it;
> in the same way, it makes sense to say (RDF 1.0 does) that an
> absolute-URI-with-fragment-identifier identifies
> a resource that you don't directly observe... you only indirectly
> observe it by looking at statements that mention it.

That's only partially true.  A naming authority can make statements regarding
a resource that go beyond mere observation of state.  They can make the
same kind of statements about some kinds of fragment identifiers, but in
doing so they fix the representation to a particular form, size, or shape
(depending on the type of fragment).  Being the naming authority, it makes
far more sense for them to only make statements about those things they
identify as URI.

Authors can use fragment ids that are not known to the naming authority.
You can say that such a reference obtains its semantics from indirect
observation, but it doesn't have the same semantic properties as a resource
that is defined by a URI.  The interface simply isn't available.

The term "resource" has a very specific meaning to the interfaces
that define the Web because making it specific allows us to define those
interfaces -- their syntax, semantics, and scope -- in ways that are
consistent and meaningful to all Web developers.  In contrast, the RDF
definition serves no useful purpose and, in fact, makes it significantly
harder to construct plausible statements about the Web.

RDF could have accomplished the same function without confusion by stating
that we can make observations about resources and we can make observations
about representations of resources and we can make observations about
fragments of those representations.  In particular, it could then make useful
statements about the persistence of some fragments over time and be able
to distinguish between statements about representations versus statements
about resources.  Or, if you don't like that, then call it the Link
Description Format.  link != resource.

[...]
> [by the way... what happened to 'entity' and 'entity body'?
> Representation is pretty much synonymous with those, no?]

Only within the scope of an HTTP message.  I stopped using those terms
outside of HTTP because they don't really "say" anything about the bits.

[...]

> > Not in URI land, hopefully.  It doesn't mean that the old specifications
> > were right.  Tim's design documents still refer to resources as documents.
> > That was true prior to 1993, but not after 1993.
> 
> It didn't become false to say that http://www.w3.org/
> is a document, if you take 'document' in the smalltalk/objective-C
> sense, where it means something that has
> mutable state and knows how to 'paint' itself in multiple formats;
> that usage still makes sense; it's just not endorsed as an IETF
> draft standard.

That's a good point -- I never read those from the point of view of a
smalltalk/objective-C definition of document.  Of course, I don't think
that programming languages have any more right to redefine the meaning
of "document" than RDF does in redefining "resource", and for the same
reason.  Sensible folks shouldn't torture their readers like that.

> > > II. The original/liberal view
> > >
> > > There's a mapping
> > >       Absolute URI, with optional fragment identifier -> Resource
> > > RFC2396 only explicitly talks about the part of that
> > > mapping where the domain is an Absolute URI with no fragment
> > > identifier, but RFC1630, among other specs, talks
> > > about the whole mapping.
> > 
> > Right, and we abandoned it in 1995 because that definition caused
> > all sorts of ambiguities in the specs.
> 
> No, not ambiguities... folks were uncomfortable with
> the level of extensibility, so they took the general
> model and introduced arbitrary limitations based
> on which parts had been implemented. Somewhat
> like denying that there are arbitrarily large
> integers because we'd only seen 32bit registers
> to date. Sigh.

You are thinking of the time between the URL specification and RFC 2396.
We fixed those limitations.  There is absolutely nothing that you can do or
describe that cannot be done as well and described simpler using the
definitions we have right now.  The only impact is that we spend less time
around a whiteboard arguing about what we mean when discussing the interface.

> >  For example, how is access
> > control assigned to "things" on the Web.  By the resource.
> 
> Well, one could also say it's by the message; after all,
> a GET access policy is likely to be very different from a PUT access
> policy in most cases; and it's not uncommon to change
> the access policy of a resource over time.

Absolutely true -- but note what you said.  It's not uncommon to change the
access policy of a resource over time.  All of the operations that control
access policy are defined in terms of methods (GET, POST, etc.) and resources,
where resource is identified by a URI that cannot include a fragment.

> Neither view is in any ratified spec, as far as I know;
> i.e. the ratified definition of 'resource' doesn't include
> any notion of access control (did I miss something?)

HTTP.

> >  Is it possible
> > to define separate access control to different fragments of the same
> > resource?  No.
> 
> [not to disagree, but
> I presume you mean "is it practical to implement/deploy...?"

I meant that it is impossible for the current Web interface to describe it,
since the Web interface does not transmit the fragment as part of an access.

> I can trivially define it: all accesses are granted.]

:-p

> >  Therefore, a fragment is not a resource until it is
> > bound as some other URI by a naming authority that can control access
> > to the fragments as separate, identifiable resources.
> 
> That's one view. The RDF spec takes another view. Note that
> the view taken by TimBL's cwm.py code is this "a fragment
> is not a resource" view; he's taken the view that
> it's too confusing to have two different definitions
> of 'resource' around, and that the RDF specs (or at least: his
> RDF code) should use 'Thing' for the general case
> of something that's either an RFC2396-resource or
> something-identified-by-an-absolute-URI-with-fragment-identifier.

Well, I'd pick a better word than "Thing", but at least it will work.

I think that, if we were to reinvent the Web tomorrow, fragment ids
wouldn't even be in the href -- they'd be separate attributes of
the link specification (that's hypermedia lingo for the anchor 
definition record).  That would make it easier for contextual information
to be included, which would make it easier to "find" target anchors
when the destination content changes.  But that's a parallel universe
that I'm too tired to explore right now.

Cheers,

Roy T. Fielding, Chief Scientist, eBuilt, Inc.
                 2652 McGaw Avenue
                 Irvine, CA 92614-5840  fax:+1.949.609.0001
                 (fielding@ebuilt.com)  <http://www.eBuilt.com>

                 Chairman, The Apache Software Foundation
                 (fielding@apache.org)  <http://www.apache.org/>
Received on Monday, 14 May 2001 08:42:45 UTC