Proposal to amend the httpRange-14 resolution from Tore Eriksson on 2012-03-25 (www-tag@w3.org from March 2012)

From: Tore Eriksson <tore.eriksson@gmail.com>
Date: Sun, 25 Mar 2012 12:44:31 +0900
To: www-tag@w3.org
Message-ID: <CAHHQWXhB=9DGQJcWdcnqK-h4b7HTAdbv-euMd-sugwdgTXN0Kg@mail.gmail.com>
==Summary==

This proposal entails a partial reversion of the httpRange-14
resolution. Specifically, it suggests that a representation retrieved
from a HTTP URI will never* be equivalent to what the URI denotes (the
resource), but will always be a description (of the state) of the
resource, eliminating the risk of confusing a resource with its
description.

As a consequence of this proposal, it will be impossible to refer to
the description (the "information resource") with a URI. The proposal
argues that this is a feature, not a bug, and that existing methods
for describing representations are sufficient.

By being very flexible at the protocol level, and strict in the
semantics, this proposal attempts to build upon the strengths of the
HTTP protocol when used in the semantic web.

* unless explicitly stated as such in RDF

==Rationale==

The discussion leading up to httpRange-14 has been described[1] as a
collision of two different conventions for URI use. In one, a URI
denotes the document-like entity (the "information resource")
retrieved through a 200 GET response. In the other, a URI denotes any
abstract concept, not necessarily a document or other media type.

One can compare these two conventions with the development of the HTTP
protocol. The former corresponds to HTTP 0.9 which is a protocol for
exchanging hyper text documents. The latter corresponds to HTTP 1.0
which introduced the concept of resources and representations, thus
adding a level of abstraction to the WWW.

This proposal argues that the protocol for URI documentation discovery
has to reflect this difference between versions of the HTTP protocol.
The current httpRange-14 resolution is closer in spirit to HTTP 0.9,
but for HTTP 1.0 and later there is a need for an amended version,
described below.

The explicitly stated problem httpRange-14 addressed was whether a
HTTP URI is constrained to denoting "documents", or whether it could
denote any possible concept. The resolution confirmed the latter
position. The discussion then changed its focus to the existence of
two distinct resources, the resource named by the “probe” URI and a
separate resource that acts as a description for this URI. The
question then became how to avoid confounding the initial URI with the
URI for its descriptions. The method proposed for this was HTTP
redirects which will involve two (or more) URIs, starting with the
probe URI and ending with the description URI. However, the overhead
of setting up redirects has been seen as a technical hurdle by some
developers. Concerns has also been raised that details of the HTTP
protocol ought to be orthogonal to semantic definitions.

One way of ameliorating this problem is to reconsider what a
descriptive resource is. One way of formulating the problem is the
need to separate the abstract resource, say Paris, from a web page
about Paris. However, we can regard descriptions not as “web pages”,
but as “documents”. The description of Paris retrieved from the URI
denoting Paris is not a web page, but an HTML (or RDF) document. Thus
the description is equivalent to the retrieved representation. As
documents in HTML and other media type consist of octets, they are
resources that exists independently of their retrieval (i.e. locally
on disc). As they are unchanging they are also representations of
themselves, similarly to RDF literals.

This proposal thus replaces the distinction between "information
resources"[2] and other resources with the pre-existing distinction
between web-accessible resources and the binary resources used as
entity bodies in HTTP and other protocols. Since these two classes are
fundamentally different, methods for documentation discovery will be
discussed separately.

For web-accessible resources, the main method for documentation
discovery is to access the probe URI through the appropriate protocol.
A RDF graph or an equivalent object is then extracted from the
representation and the graph is tested for the explicit occurrence of
the probe URI.

The URI documentation may also reside in other resources. These
resources can be found by following links in the document or HTTP
header. There exists a number of complementary methods for describing
such links.

Another method for documentation discovery is content negotiation.
Since RDF is the preferred method for resource description,
negotiation for a RDF-based media type might return appropriate
information.

This proposal removes the need for a separate discussion for hash
URIs, since the methods described above work equivalently in this
case. As a bonus, the same rationale solves the problem with hash URIs
that also acts as local identifiers. If an element referred to by the
hash URI exists within a document, the element is a representation of
the resource denoted by the hash URI, not the resource itself. Thus no
conflict arises when the same local identifier is used in different
media types.

For the stand-alone binary resources, a method to describe them is
needed as well. Since they exist independently of the web, as files on
disc or e-mail attachments, this method should not depend on
retrieval. In many media types this is a solved problem. HTML
documents contain document meta data in the <HEAD> element, using
<TITLE>, <META>, and <LINK> elements. PDF, JPEG, and MP3 have similar
meta data containers (XMP, EXIF, ID3), whose contents can be converted
to RDF if necessary.

For some applications though, there might be a need for a URI naming
the binary resource. Fortunately, a URI scheme for representations
already exists – the data URI scheme. This scheme, although slightly
unwieldy, can unambiguously denote a binary resource of any registered
media type.

==Details==

The proposed document[1] could be replaced with a short note
explaining an algorithm to find machine-readable descriptions of a
dereferencable resource. If a rewrite is considered necessary, some
changes are proposed below:

>>>
3 Probe URI lacking local identifier

If the URI scheme of the probe URI is 'http' or 'https', the URI has a
nominal URI documentation carrier in the following ways. The cases are
not exclusive (e.g. both GET and Link: may yield nominal URI
documentation carriers for the URI).

3.1 General case

A representation retrieved through a HTTP GET on the probe URI is a
nominal representation of this URI regardless of any intervening
redirects. If a nominal representation from the probe URI includes
documentation of this URI (through RDF, RDFa, etc.), this information
is nominal under the constraints given by the HTTP header.

3.2 Content-negotiation

Alternative representations obtained by setting the Accept HTTP header
are also valid as nominal representations.

3.3 Linked documentation

If a nominal representation from the probe URI includes a URI
documentation link in its response to the retrieval request, then
nominal representations from the link target are nominal URI
documentation carriers for the probe URI.

There are multiple ways to locate a URI documentation link in an HTTP response:

* using a <LINK> with relation “alternative” and a media type capable
of providing RDF; or other relations known to link to documentation
(e.g. “describedby”)
* using a Link: response header with content equivalent as above
* from the object of a triple in the RDF graph extracted form the
response having the probe URI as the subject and rdfs:seeAlso or
rdfs:isDefinedBy as the predicate.

3.4 Discovery via redirection

For purposes of discovery, redirect chains are often followed. That
is, if retrieval is requested using a URI U1, and a retrieval using U1
yields a redirect to U2, and a retrieval request using U2 succeeds
with a result R, then R is consequently considered a nominal URI
documentation carrier for U1. Note that this practice creates a
reliance on U2's URI owner as well as on U1's, increasing chances of
failure at the application level.

3.5 Probe URI with local identifier

When a URI is of the form stem#id (a 'hash' URI), a nominal
representation from the stem is a nominal URI documentation carrier
for the probe URI. Documentation discovery is thus equivalent for both
types of URIs.

4.1 Documentation of document resources

As the resources used as entity bodies in HTTP are stand-alone binary
resources, documentation for them has to be self-contained. However,
adequate means are available in many machine-readable formats. Meta
data for HTTP document can be added through <META> and <LINK>
elements; formats like PDF, JPEG, and MP3 also provide containers for
file meta data. Media types lacking capabilities for machine-readable
meta data, e.g. text/raw, are unsuitable for machine consumption even
though they may contain human-readable documentation.

5.

[Add]
7. Relative URIs combined with <BASE> semantics change their absolute
meaning when the  representation is obtained through contexts other
than dereferencing
[Remove 5.2]
[Rename 5.3 to 5.2]

6. Comparison with the TAG resolution

This proposal avoids any semantic implications of retrieval with the
HTTP protocol, forcing all documentation to explicitly state the
subject of each statement. Importantly, the class of “information
resources” is made obsolete, since the onus is on representations as
carriers of information.
>>>

Some other documents and specifications will also be affected by this
change. The document “Architecture of the World Wide Web”[3] needs to
be edited. Remove the discussion of information resources from section
2.2 and replace all occurrences of the term with plain “resource”.
Proposed rewrite:

>>>
2.2. URI/Resource Relationships

By design a URI identifies one resource. We do not limit the scope of
what might be a resource. The term "resource" is used in a general
sense for whatever might be identified by a URI. It is conventional on
the hypertext Web to describe Web pages, images, product catalogs,
etc. as “resources”.

This document is an example of a  resource. In the case of this
document, the message payload is the HTML representation of this
document. However, our use of the term resource is intentionally more
broad. Other things, such as cars and dogs (and, if you've printed
this document on physical sheets of paper, the artifact that you are
holding in your hand), are resources too.

Constraint: URIs Identify a Single Resource
>>>

==Impact==

===Positive effects===

* The proposal conforms closely to REST principles and is thus easy to
understand for many web developers.
* The subject of meta data will always be explicitly specified and not
influenced by unexpected redirects and other retrieval level factors.
* The class of information resources can be scrapped.
* License and other meta data for documents are not lost when the
document is downloaded locally or sent via e-mail &c.
* Document specific meta data (<LINK rel=”stylesheet”>) won't be
associated with the original resource.
* 303 See also redirect are compatible with the current solution.
* Hash URIs are subsumed by the proposal and no longer treated separately.

===Negative effects===

* The model is not intuitive for laypeople who think of the HTML
document as being identical to the denoted resource.
* It will be impossible to add machine-readable meta data to simple
formats like text/raw. (You can still add human-readable data though).
* RDFa default behavior for the HTML HEAD element[4] must be amended
to account for local document meta-data.

===Conformance class changes===

As I have no idea what a conformance class is and where it is defined
I'll leave this section for others to consider.

===Risks===

The emergence of another decade-long debate on whether the change was
a good idea or not....

==References==

[1] http://www.w3.org/2001/tag/doc/uddp/
[2] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377
[3] http://www.w3.org/TR/webarch/
[4] http://www.w3.org/TR/rdfa-syntax/

End of proposal.

Tore Eriksson
Received on Sunday, 25 March 2012 03:45:01 UTC