Proposal to amend the "httpRange-14 resolution" from Mo McRoberts on 2012-03-01 (www-tag@w3.org from March 2012)

From: Mo McRoberts <mo.mcroberts@bbc.co.uk>
Date: Thu, 1 Mar 2012 01:22:11 +0000
To: www-tag@w3.org
Message-Id: <29D8EBE7-05F4-4A21-B004-631EF01C7F6A@bbc.co.uk>

Rationale

This proposal seeks to amend the W3C TAG resolution “httpRange-14”, given the call for proposals at [CPC], reflecting the lessons learned since the resolution was made in 2005.

As described in the above call for proposals, httpRange-14 sought to provide an effective mechanism for differentiating between URIs referring to things, and URIs referring to things described by those documents. The resolution itself is relatively short, but has led to no small amount of subsequent discussion in part due to the perception that it implicitly recommended the use 303 HTTP status codes as the primary means to respond to requests to dereference non-information resource identifiers (NIRs).

The CFP is motivated by concern amongst some members of the community that “the URI issue” has hindered adoption of linked data/semantic web technologies. My strong suspicion is that there are a number of factors which have hindered — to some extent — the adoption of linked data, of which this issue is a particularly minor one. These include (in my view in order of relative impact):

a) Documentation written using confusing (to the uninitiated) and academically-focussed language [RDF-PRIMER].

b) Suggestions that not insignificant technology investments are required, rather than simply publishing documents on a web server and consuming them much as the Web-of-documents was popularised (for example, requiring triple- and quad-stores and SPARQL servers, RDF-aware content management systems, reasoning engines, and so on).

c) Disagreement within the community as to whether it's XML, RDF/XML, JSON, Turtle, RDFa, Microdata or Microformats which are “the future”, noting that in many commercial settings, the responsibility for implementing these will vary considerably: for example, RDFa, Microdata and Microformats require client-side developers to have knowledge of the technologies, while emitting data views in some non-HTML format requires the CMS developer (often a different team or individual) to have that expertise.

d) Confusion in general about what RDF actually is and does.

e) Disagreements about whether httpRange-14 is actually a solution or not [CPC], [REFUSE].

f) The structure of URIs [HTTPRANGE-14].

Nevertheless, the CFP and this proposal are focussed on (f), with a side-order of (e). It's worth stressing that much of the above is in the process of being addressed, even if (e) continues unabated.

Rather than attempt to propose an alternative to the intent and mechanics of httpRange-14, this proposal instead aims to provide a starting point for simplified and considerably less ambiguous recommendations.

The constraints are as follows:

1. Differentiation

While it is often stated that the mechanisms provided by httpRange-14 allow differentiation between information and non-information resources, within linked data in general and RDF-over-HTTP in particular, there is actually a need to differentiate between three kinds of URI:

a) The URI used to refer to a document, regardless of how it is represented

b) The URI used to refer to a specific representation of a document

c) The URI used to refer to some non-information resource described by that document

It must be possible to name each of these resources unambiguously in a given context. Some properties applicable to one may not be applicable to another; conversely, there are properties applicable to all three but which are different.

2. No prior knowledge

Beyond the scheme part, URIs are intended to be opaque to the consumer. This is obvious to many, but is worth stating.

3. No protocol changes

As enticing as it may be to make HTTP-for-linked data behave somehow differently to HTTP-for-documents, this would rather defeat the purpose of HTTP as it exists today.

4. Highly desirable: "One URI"

The ability of a user to see the URI of a resource and to pass it between pieces of software and hardware has made the WWW what it is today; HTTP and URIs were designed that it shouldn’t matter whether you’re human or machine, mobile or desktop — you use a single URI to refer to a resource in all of these contexts. It is therefore undesirable if the implementation mechanisms are such that the URI that you see cannot then be used in other contexts without prior knowledge.

Details

There are two recommended routes for publishers to take detailed in this proposal. The merits of each vary depending upon the software stack in use.

Recommendation A: Using fragment URIs

1. The URI for a non-information resource takes the form: http://example.com/path/to/resource#thing

2. The URI for a document (information) resource takes the form: http://example.com/path/to/resource

3. The URI for a specific representation of that document takes the form http://example.com/path/to/resource.ext, where '.ext' is a well-known extension as included in the IANA MIME type registration pertaining to the serialisation.

4. As per [HTTP/1.1], the fragment portion of a URI is NEVER included in requests.

5. When an agent requests the information resource, the server shall if it is able to satisfy the request respond with a 200 status and;

a) A 'Content-Location:' response header containing the URI of the specific representation being served

b) At least a 'Vary: Accept' response header indicating that the response body was selected through content negotiation

6. The user agent when processing the document can unambiguously differentiate those portions referring to any of the document, the representation of the document, or an NIR described by that document.

Recommendation B: Using “303 See Other” responses

1. The respective URIs for NIRs, documents and representations of those documents is undefined.

2. When an agent requests the information resource, the server shall if it is able to satisfy the request respond with a 303 status and;

a) A 'Location:' response header containing the URI of the specific representation being served

b) At least a 'Vary: Accept' response header indicating that the response body was selected through content negotiation

3. The user agent when processing the document can unambiguously differentiate those portions referring to any of the document, the representation of the document, or an NIR described by that document.

Impact

Positive effects:

Recommendation A permits the negotiated serving of documents describing NIRs using virtually any off-the-shelf web server software. For example, using the Apache web server, all that is required is construction of the representations of the document, loading the “mod_negotiation” server module, and enabling the “MultiViews” option. More complex configurations are, of course, perfectly possible, and configuration with other web server software differs. Consequentially, this approach is straightforward to understand, implement and demonstrate by any moderately technical individual. Moreover, because no redirections occur, a single URI is presented in all contexts, whether they be human- or machine-facing.

Recommendation B affords a great deal of flexibility in naming and integration, particularly with respect to legacy systems, as it permits easy architectural separation between human and data views. It also avoids the caching issues (see below) associated with negotiated fragment URIs by redirecting the agent to a specific addressable representation.

Negative effects:

It has been reported that content negotiation resulting in 200 responses can cause problems with intermediate caching proxies (specifically, that the response body is not cached). For this reason alone it may not be a suitable recommendation in all circumstances.

Recommendation B requires a more complex configuration which increases the barrier-to-entry should it become best practice alone. However, because a redirect is a visible action, an end-user almost never sees the canonical URI for a resource, and instead is only aware of the specific representation of a document. It is difficult to escape the reasoning that this approach does amount to “exposing the mechanism of how you run your server” [COOLURIS].

References

[CPC] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html
[REFUSE] http://www.w3.org/2001/tag/2011/09/referential-use.html
[RDF-PRIMER] http://www.w3.org/TR/rdf-primer/
[HTTPRANGE-14] http://www.w3.org/2001/tag/issues.html#httpRange-14
[HTTP/1.1] http://www.w3.org/Protocols/rfc2616/rfc2616.html
[COOLURIS] http://www.w3.org/Provider/Style/URI.html

--
Mo McRoberts - Technical Lead - The Space,
0141 422 6036 (Internal: 01-26036) - PGP key CEBCF03E,
Project Office: Room 7083, BBC Television Centre, London W12 7RJ

Received on Thursday, 1 March 2012 01:22:37 UTC