RE: URIs-Resource relationships

> There's a difference between ignoring aspects of resource space and
> providing a basic set of rules that allow generic URI comparison and
> processing.

Comparing URI is a meaningless operation.  URI are names (even those that
are known as locators).  If I tell you my name is Roy, that doesn't imply
you can compare other Roy's to me for equivalence.  Even extending my
name out to the full Roy Thomas Fielding gives you no guarantees.  The only
way to truly compare my Roy with some other moniker is by having a third
party that is aware of the context and semantics of both monikers to
say "yes, they are the same dude" or "no".

USA Social Security Numbers are also a form of name. What makes it possible
to compare them for equivalence?  It isn't the syntax.  It is the rules
and procedures established by the U.S. government for assigning,
maintaining,
and corroborating those numbers to individuals that makes it possible.

This isn't a technical problem. It is a fact of life for any system that
coexists well with society.  The URI spec won't try to redefine society
just to make it more convenient for the application developer.

> >RFC 2396 defines everything you need to interoperate with real URI-based
> >systems.  It doesn't have all the information that is in my dissertation,
> >but it is sufficient to develop any application I have ever seen,
> >including anything related to XML, RDF, and namespaces.
> 
> Is there a URI for your dissertation?

Not for a few more weeks, which is why I have to cut short my participation
in this discussion (last day to file is Wednesday).  However, I'll include
the relevant bits below for your entertainment.  Please don't expect me to
explain this stuff until I have a chance to breathe again (next Thursday
at the earliest).


Cheers,

Roy T. Fielding, Chief Scientist, eBuilt, Inc.
                 2652 McGaw Ave., Irvine, CA 92614-5840
                 (fielding@ebuilt.com)  <http://www.eBuilt.com>

                 Chairman, The Apache Software Foundation
                 (fielding@apache.org)  <http://www.apache.org/>

============================================================================

Extract from my dissertation [Copyright (C) 2000 Roy Thomas Fielding]


5.2 REST Architectural Elements

The Representational State Transfer (REST) style is an abstraction
of the architectural elements within a distributed hypermedia
system. REST ignores the details of implementation and
protocol syntax in order to focus on the roles of components, the
constraints upon their interaction with other components, and their
interpretation of significant data elements. It encompasses the
fundamental constraints upon components, connectors, and data that
define the basis of the Web architecture, and thus the essence of
its behavior as a network-based application.

5.2.1 Data Elements

Unlike the distributed object style [30], where all data is
encapsulated within and hidden by the processing components, the
nature and state of an architecture's data elements is a key aspect
of REST. The rationale for this design can be seen in the nature of
distributed hypermedia.

When a link is selected, information needs to be moved from the
location where it is stored to the location where it will be used
by, in most cases, a human reader. This is in distinct contrast to
most distributed processing paradigms [6, 46], where it is often
more efficient to move the "processing entity" to the data rather
than move the data to the processor. A distributed hypermedia
architect has only three fundamental options: 1) render the data
where it is located and send a fixed-format image to the recipient;
2) encapsulate the data with a rendering engine and send both to the
recipient; or, 3) send the raw data to the recipient along with
metadata that describes the data type, so that the recipient can
choose their own rendering engine.

Each option has its advantages and disadvantages. Option 1, the
traditional client-server style [30], allows all information about
the true nature of the data to remain hidden within the sender,
preventing assumptions from being made about the data structure and
making client implementation easier. However, it also severely
restricts the functionality of the recipient and places most of the
processing load on the sender, leading to scalability problems.
Option 2, the mobile object style [46], provides information hiding
while enabling specialized processing of the data via its unique
rendering engine, but limits the functionality of the recipient to
what is anticipated within that engine and may vastly increase the
amount of data transferred. Option 3 allows the sender to remain
simple and scalable while minimizing the bytes transferred, but
loses the advantages of information hiding and requires that both
sender and recipient understand the same data types.

REST provides a hybrid of all three options by focusing on a shared
understanding of data types with metadata, but limiting the scope of
what is revealed to a standardized interface. REST components
communicate by transferring a representation of the data in a format
matching one of an evolving set of standard data types, selected
dynamically based on the capabilities or desires of the recipient
and the nature of the data. Whether the representation is in the
same format as the raw source, or is derived from the source,
remains hidden behind the interface. The benefits of the mobile
object style are approximated by sending a representation that
consists of instructions in the standard data format of an
encapsulated rendering engine (e.g., Java). REST therefore gains the
separation of concerns of the client-server style without the server
scalability problem, allows information hiding through a generic
interface to enable encapsulation and evolution of services, and
provides for a diverse set of functionality through downloadable
feature-engines.

5.2.1.1 Resources and Resource Identifiers

The key abstraction of information in REST is a resource. Any
information that can be named can be a resource: a document or
image, a temporal service (e.g. "today's weather in Los Angeles"), a
collection of other resources, a non-virtual object (e.g. a person),
and so on. In other words, any concept that might
be the target of an author's hypertext reference must fit within the
definition of a resource. A resource is a conceptual mapping to a
set of entities, not the entity that corresponds to the mapping at
any particular point in time.

More precisely, a resource R is a temporally varying membership
function MR(t), which for time t maps to a set of entities, or
values, which are equivalent. The values in the set may be resource
representations and/or resource identifiers. A resource can map to
the empty set, which allows references to be made to a concept
before any realization of that concept exists -- a notion that was
foreign to most hypertext systems prior to the Web [56]. Some
resources are static in the sense that, when examined at any time
after their creation, they always correspond to the same value set.
Others have a high degree of variance in their value over time. The
only thing that is required to be static for a resource is the
semantics of the mapping, since the semantics is what distinguishes
one resource from another.

For example, the "authors' preferred version" of an academic paper
is a mapping whose value changes over time, whereas a mapping to
"the paper published in the proceedings of conference X" is static.
These are two distinct resources, even if they both map to the same
value at some point in time. The distinction is necessary so that
both resources can be identified and referenced independently. A
similar example from software engineering is the separate
identification of a version-controlled source code file when
referring to the "latest revision", "revision number 1.2.7", or
"revision included with the Orange release."

This abstract definition of a resource enables key features of the
Web architecture. First, it provides generality by encompassing many
sources of information without artificially distinguishing them by
type or implementation. Second, it allows late binding of the
reference to a representation, enabling content negotiation to take
place based on characteristics of the request. Finally, it allows an
author to reference the concept rather than some singular
representation of that concept, thus removing the need to change all
existing links whenever the representation changes (assuming the
author used the right identifier).

REST uses a resource identifier to identify the particular resource
involved in an interaction between components. REST connectors
provide a generic interface for accessing and manipulating the value
set of a resource, regardless of how the membership function is
defined or the type of software that is handling the request. The
naming authority that assigned the resource identifier, making it
possible to reference the resource, is responsible for maintaining
the semantic validity of the mapping over time (i.e., ensuring that
the membership function does not change).

Traditional hypertext systems [56], which typically operate in a
closed or local environment, use unique node or document identifiers
that change every time the information changes, relying on link
servers to maintain references separately from the content [126].
Since centralized link servers are an anathema to the immense scale
and multi-organizational domain requirements of the Web, REST relies
instead on the author choosing a resource identifier that best fits
the nature of the concept being identified. Naturally, the quality
of an identifier is often proportional to the amount of money spent
to retain its validity, which leads to broken links as ephemeral (or
poorly supported) information moves or disappears over time.

5.2.1.2 Representations

REST components perform actions on a resource by using a
representation to capture the current or intended state of that
resource and transferring that representation between components. A
representation is a sequence of bytes, plus representation metadata
to describe those bytes. Other commonly used but less precise names
for a representation include: document, file, and HTTP message
entity, instance, or variant.

A representation consists of data, metadata describing the data,
and, on occasion, metadata to describe the metadata (usually for the
purpose of verifying message integrity). Metadata is in the form of
name-value pairs, where the name corresponds to a standard that
defines the value's structure and semantics. Response messages may
include both representation metadata and resource metadata:
information about the resource that is not specific to the supplied
representation.

Control data defines the purpose of a message between components,
such as the action being requested or the meaning of a response. It
is also used to parameterize requests and override the default
behavior of some connecting elements. For example, cache behavior
can be modified by control data included in the request or response
message.

Depending on the message control data, a given representation may
indicate the current state of the requested resource, the desired
state for the requested resource, or the value of some other
resource, such as a representation of the input data within a
client's query form, or a representation of some error condition for
a response. For example, remote authoring of a resource requires
that the author send a representation to the server, thus
establishing a value for that resource that can be retrieved by
later requests. If the value set of a resource at a given time
consists of multiple representations, content negotiation may be
used to select the best representation for inclusion in a given
message.

The data format of a representation is known as a media type [97]. A
representation can be included in a message and processed by the
recipient according to the control data of the message and the
nature of the media type. Some media types are intended for
automated processing, some are intended to be rendered for viewing
by a user, and a few are capable of both. Composite media types can
be used to enclose multiple representations in a single message.

The design of a media type can directly impact the user-perceived
performance of a distributed hypermedia system. Any data that must
be received before the recipient can begin rendering the
representation adds to the latency of an interaction. A data format
that places the most important rendering information up front, such
that the initial information can be incrementally rendered while the
rest of the information is being received, results in much better
user-perceived performance than a data format that must be entirely
received before rendering can begin.

For example, a Web browser that can incrementally render a large
HTML document while it is being received provides significantly
better user-perceived performance than one that waits until the
entire document is completely received prior to rendering, even
though the network performance is the same. Note that the rendering
ability of a representation can also be impacted by the choice of
content. If the dimensions of dynamically-sized tables and embedded
objects must be determined before they can be rendered, their
occurrence within the viewing area of a hypermedia page will
increase its latency.


[...]

6.2 REST Applied to URI

Uniform Resource Identifiers (URI) are both the simplest element of
the Web architecture and the most important. URIs have been known by
many names: WWW addresses, Universal Document Identifiers, Universal
Resource Identifiers [14], and finally the combination of Uniform
Resource Locators (URL) [16] and Names (URN) [116]. Aside from its
name, the URI syntax has remained relatively unchanged since 1992.
However, the specification of Web addresses also defines the scope
and semantics of what we mean by resource, which has changed since
the early Web architecture. REST was used to define the term
resource for the URI standard [20], as well as the overall semantics
of the generic interface for manipulating resources via their
representations.

6.2.1 Redefinition of Resource

The early Web architecture defined URI as document identifiers.
Authors were instructed to define identifiers in terms of a
document's location on the network. Web protocols could then be used
to retrieve that document. However, this definition proved to be
unsatisfactory for a number of reasons. First, it suggests that the
author is identifying the content transferred, which would imply
that the identifier should change whenever the content changes.
Second, there exist many addresses that corresponded to a service
rather than a document --- authors may be intending to direct readers
to that service, rather than to any specific result from a prior
access of that service. Finally, there exist addresses that do not
correspond to a document at some periods of time, such as when the
document does not yet exist or when the address is being used solely
for naming, rather than locating, information.

The definition of resource in REST is based on a simple premise:
identifiers should change as infrequently as possible. Because the
Web uses embedded identifiers rather than link servers, authors need
an identifier that closely matches the semantics they intend by a
hypermedia reference, allowing the reference to remain static even
though the result of accessing that reference may change over time.
REST accomplishes this by defining a resource to be the semantics of
what the author intends to identify, rather than the value
corresponding to those semantics at the time the reference is
created. It is then left to the author to ensure that the identifier
chosen for a reference does indeed identify the intended semantics.

6.2.2 Manipulating Shadows

Defining resource such that a URI identifies a concept rather than a
document leaves us with another question: how does a user access,
manipulate, or transfer a concept such that they can get something
useful when a hypertext link is selected? REST answers that question
by defining the things that are manipulated to be representations of
the identified resource, rather than the resource itself. An origin
server maintains a mapping from resource identifiers to the set of
representations corresponding to each resource. A resource is
therefore manipulated by transferring representations through the
generic interface defined by the resource identifier.

REST's definition of resource derives from the central requirement
of the Web: independent authoring of interconnected hypertext across
multiple trust domains.  Forcing the interface definitions to match
the interface requirements causes the protocols to seem vague, but
that is only because the interface being manipulated is only an
interface and not an implementation. The protocols are specific
about the intent of an application action, but the mechanism behind
the interface must decide how that intention affects the underlying
implementation of the resource mapping to representations.

Information hiding is one of the key software engineering principle
that motivates the uniform interface of REST. Because a client is
restricted to the manipulation of representations rather than
directly accessing the implementation of a resource, the
implementation can be constructed in whatever form is desired by the
naming authority without impacting the clients that may use its
representations. In addition, if multiple representations of the
resource exist at the time it is accessed, a content selection
algorithm can be used to dynamically select a representation that
best fits the capabilities of that client. The disadvantage, of
course, is that remote authoring of a resource is not as
straightforward as remote authoring of a file.

6.2.3 Remote Authoring

The challenge of remote authoring via the Web's uniform interface is
due to the separation between the representation that can be
retrieved by a client and the mechanism that might be used on the
server to store, generate, or retrieve the content of that
representation. An individual server may map some part of its
namespace to a filesystem, which in turn maps to the equivalent of
an inode that can be mapped into a disk location, but those
underlying mechanisms provide a means of associating a resource to a
set of representations rather than identifying the resource itself.
Many different resources could map to the same representation, while
other resources may have no representation mapped at all.

In order to author any resource, the author must first obtain the
specific source resource URI: the set of URI that bind to the
handler's underlying representation for the target resource. A
resource does not always map to a singular file, but all resources
that are not static are derived from some other resources, and by
following the derivation tree an author can eventually find all of
the source resources that must be edited in order to modify the
representation of a resource. These same principles apply to any
form of derived representation, whether it be from content
negotiation, scripts, servlets, managed configurations, versioning,
etc.

The resource is not the storage object. The resource is not a
mechanism that the server uses to handle the storage object. The
resource is a conceptual mapping --- the server receives the
identifier (which identifies the mapping) and applies it to its
current mapping implementation (usually a combination of
collection-specific deep tree traversal and/or hash tables) to find
the currently responsible handler implementation and the handler
implementation then selects the appropriate action+response based on
the request content.  All of these implementation-specific issues
are hidden behind the Web interface; their nature cannot be assumed
by a client that only has access through the Web interface.

For example, consider what happens when a Web site grows in user
base and decides to replace its old Brand X server, based on an XOS
platform, with a new Apache server running on FreeBSD.  The disk
storage hardware is replaced.  The operating system is replaced.
The HTTP server is replaced.  Perhaps even the method of generating
responses for all of the content is replaced. However, what doesn't
need to change is the Web interface: if designed correctly, the
namespace on the new server can mirror that of the old, meaning that
from the client's perspective, which only knows about resources and
not about how they are implemented, nothing has changed aside from
the improved robustness of the site.

6.2.4 Binding Semantics to URI

As mentioned above, a resource can have many identifiers. In other
words, there may exist two or more different URI that have
equivalent semantics when used to access a server.  It is also
possible to have two URI that result in the same mechanism being
used upon access to the server, and yet those URI identify two
different resources because they don't mean the same thing.

Semantics are a by-product of the act of assigning resource
identifiers and populating those resources with representations. At
no time whatsoever do the server or client software need to know or
understand the meaning of a URI --- they merely act as a conduit
through which the creator of a resource (a human naming authority)
can associate representations with the semantics identified by the
URI. In other words, there are no resources on the server; just
mechanisms that supply answers across an abstract interface defined
by resources. It may seem odd, but this is the essence of what makes
the Web work across so many different implementations.

It is the nature of every engineer to define things in terms of the
characteristics of the components that will be used to compose the
finished product.  The Web doesn't work that way.  The Web
architecture consists of constraints on the communication model
between components, based on the role of each component during an
application action. This prevents the components from assuming
anything beyond the resource abstraction, thus hiding the actual
mechanisms on either side of the abstract interface.

====================================================================

Received on Friday, 8 September 2000 00:54:02 UTC