Proposal to amend the httpRange-14 resolution [revision]

After some feedback from Jonathan Rees and others, I have decided to
submit a revision of my previous proposal
(http://lists.w3.org/Archives/Public/www-tag/2012Mar/0085.html). The
revised text is below.

Tore Eriksson

===========

==Summary==

This proposal is by design an attempt to be as minimalistic as
possible. It focuses on representations of the probe URI and of some
specific link targets as the only bearers of descriptions, and the use
of URI-based data models, exemplified by RDF, to encode these
descriptions. Only information explicitly expressed in RDF is
considered.

Although it retracts the implicit semantics that arises from the
httpRange-14 resolution, it is agnostic to whether information
resources exists or how they are defined. If specific systems need to
define the class of information resources, they are free to do so.
Such semantics are not viewed as defining ones though.

This proposal is also concerned with how to encode documentation of
representations. This part is mostly a reminder of existing methods.

==Rationale==

For close to a decade, the discussions concerning the interface
between RDF and the World Wide Web has been focused on *how* resources
act when interfaced on the web, especially whether the representations
they provide are content-based or description-based. This is a
valuable discussion per se, but attempts to reach a consensus have
failed so far. This proposal intentionally tries to circumvent this
discussion since the discussion is orthogonal to the current problem
of how descriptions are found. As RDF is based on URIs for
identification of resources, and HTTP and other web protocols are
protocols for retrieving a representation given a URI, the necessary
tools for solving the problem are readily available.

For web-accessible resources, the main method for documentation
discovery is to access the URI through the appropriate protocol. An
RDF graph or an equivalent object is then extracted from the
representation and the graph is checked for the explicit occurrence of
the probe URI. In the primary version of this proposal, any caching
and redirection that occurred is considered irrelevant. This proposal
also removes the need for a separate discussion for hash URIs and
content negotiation, since the methods described work equivalently in
those cases.

As many media types are unable to encode an RDF graph, a secondary
method that relies on links is also provided. These links can appear
in the HTTP response or within the retrieved representation. As it
would be premature to delimit the possible link relations at this
point, agents are allowed to add new link relations in the future.
When a link relation is encountered, the target URI is accessed and
information in the form of an RDF graph is extracted from any
representations retrieved.

The baseline document [1] only mentions URI documentation, but there
has also been requests to distinguish between URI documentation and
URI definitions. For completeness, a method for distinguishing these
is also provided, although this is not part of the main proposal. The
steps involved will be written within square brackets.

One aspect of resource documentation that has been absent from most of
the discussions is that of describing the representations themselves.
They are not web-accessible resources since they lack an URI, but they
play a prominent role on the web. How documentation is encoded in the
representation is largely dependent on the media type. Since
representations are binary objects - an octet stream with a media type
- they are problematic to denote in a RDF graph that is not contained
within the document itself. Fortunately, a URI scheme for
representations already exists - the data URI scheme. This scheme,
although slightly unwieldy, can unambiguously denote a binary resource
of any registered media type.

==Details==

The baseline document could be replaced with a short note describing
an algorithm for locating and extracting machine-readable descriptions
of a dereferencable resource. If a rewrite is considered necessary,
some changes to the baseline document are proposed below:

>>>
3 Documentation of web-accessible resources

URI documentation is provided as RDF graphs. These graphs are encoded
in nominal URI documentation carriers, representations retrieved
through web access. The exact method for extracting a RDF graph from a
document is dependent on the media type, and is not further discussed
in this document. If the URI scheme of the probe URI is 'http' or
'https', the URI has a nominal URI documentation carrier in the ways
described below.

The cases listed are not exclusive (e.g. both GET and Link: may yield
nominal URI documentation carriers for the URI). When multiple
documentation carriers are retrieved, the URI description is the RDF
graph obtained by merging the graphs acquired from each documentation
carrier.

3.1 General case

A representation retrieved by initiating an HTTP GET request on the
probe URI is a nominal representation of this URI regardless of any
intervening redirects. The representation is then checked for a
contained RDF graph. If a graph can be extracted and the graph
mentions the probe URI then this information is nominal under the
constraints given by the HTTP headers (Expires: etc.). [If no
redirects re encountered, or if all intervening redirects have a
status codes in the set {300, 301, 305, 307}, then the description is
considered a definition of the URI.]

3.2 Content-negotiation

Alternative representations obtained by setting the Accept HTTP header
are also valid as nominal representations.

3.3 Linked documentation

If a nominal representation from the probe URI includes a URI
documentation link in its response to the retrieval request, then
nominal representations from the link target are nominal URI
documentation carriers for the probe URI, and the representation is
treated as in the general case.

There are multiple ways to locate a URI documentation link in an HTTP response:

* using a <LINK> element with relation “alternative” and a media type
capable of encoding RDF
* using a <LINK> element with a relation known to link to
documentation (e.g. “describedby”)
* using a Link: response header with content equivalent to the
previous two cases
* if an RDF graph was extracted from the nominal representation, the
object of a triple having the probe URI as the subject and
rdfs:seeAlso or rdfs:isDefinedBy as the predicate is considered a link
target. [If the predicate is rdfs:isDefinedBy, the description
acquired from the nominal representations of the link target is
considered a definition of the URI.]

3.4 Discovery via redirection

Redirection is considered within the general case, as the redirect
chain is not accessible in all cases.
It should be noted though that redirects creates a reliance on all the
intervening URIs. This makes the redirect case technically similar to
methods that uses linked documentation.

3.5 Probe URI with local identifier

When a URI is of the form stem#id (a 'hash' URI), a nominal
representation from the stem is a nominal URI documentation carrier
for the probe URI. Documentation discovery is thus equivalent for both
types of URIs, and treated as in the general case.

4.1 Documentation of representations

As the resources used as entity bodies in HTTP are stand-alone binary
resources, documentation for them has to be self-contained. However,
adequate means are available in many machine-readable formats. Meta
data for HTML documents can be added through <META> and <LINK>
elements; formats like IMF, PDF, JPEG, and MP3 also provide containers
for representation meta data. Media types lacking capabilities for
machine-readable meta data, e.g. text/raw, are unsuitable for machine
consumption even though they may contain human-readable documentation.
>>>

For section 5:
[Add] 7. Relative URIs relying on <BASE> semantics might change their
absolute meaning when the representation is obtained through contexts
other than dereferencing
[Remove 5.2]
[Rename 5.3 to 5.2]

>>>
6. Comparison with the TAG resolution

This document avoids any semantic implications of retrieval with the
HTTP protocol, forcing all documentation to be explicitly stated in
RDF. Importantly, the class of "information resources" is unnecessary,
since the onus is on representations as carriers of information.
However, the steps outlined above are backwards compatible with
systems that implement the httpRange-14 resolution, and allows for
continued application of httpRange-14 where it is deemed necessary.
The semantic implications are not seen as part of a URI description
though.

One effect of this proposal is that some (a vast majority at the time
of writing) web-accessible resources will be missing machine-readable
definitions. As there is no way of determining the rdf:type of these
resources, using these URIs could be seen as problematic in some
cases. It must be noted though that when the URI is only used to
provide information, i.e. as the object of an rdf:seeAlso predicate,
what is important is the information retrieved by accessing the URI.
Specific information about the target resource should be provided
though considering its importance for provenance. It is the opinion of
the author that lack of URI definitions is not a obstacle to use the
resource in RDF, especially considering the open world assumption.

Preparing this document, a lot of the discussion has centered on
whether the semantics of retrieval by HTTP is content-based or
description-based in nature. This distinction is not important when
following the steps provided. It should be noted though, that some
methods provided will lead to inconsistent semantics for consumers
that still adhere to the httpRange-14 resolution.
>>>

==Impact==

===Positive effects===

* The proposal avoids the need for redirects, making it easy to implement.
* The proposal conforms closely to REST principles and is thus easy to
understand for many web developers.
* Resource descriptions can be provided directly through the
representation without the need to exist as a separate resource.
* 303 See also redirects are compatible with the current solution.
* The subject of meta data will always be explicitly specified in RDF
and not influenced by unexpected redirects and other retrieval level
factors.
* The proposal is oblivious to a future deprecation of the class of
information resources.
* Hash URIs are subsumed by the proposal and no longer treated separately.
* Content negotiation is handled transparently.
* License and other meta data for representations are not lost when
the document is downloaded locally or sent via e-mail &c.

===Negative effects===

* This proposal doesn't mandate that URI descriptions have a explicit
URI (except for a data: URI), and this can be detrimental for
provenance.
* Large amounts of best practice that focuses on the distinction
between instantiation and representation has to be rewritten.

===Conformance class changes===

[skipped]

===Risks===

As the proposal entails a major re-haul of current best practice, the
architectural stability of the semantic web might be put in doubt.

==References==

[1] http://www.w3.org/2001/tag/doc/uddp/

End of proposal.

Received on Thursday, 29 March 2012 14:28:15 UTC