- From: Tore Eriksson <tore.eriksson@gmail.com>
- Date: Sun, 25 Mar 2012 12:44:31 +0900
- To: www-tag@w3.org
==Summary== This proposal entails a partial reversion of the httpRange-14 resolution. Specifically, it suggests that a representation retrieved from a HTTP URI will never* be equivalent to what the URI denotes (the resource), but will always be a description (of the state) of the resource, eliminating the risk of confusing a resource with its description. As a consequence of this proposal, it will be impossible to refer to the description (the "information resource") with a URI. The proposal argues that this is a feature, not a bug, and that existing methods for describing representations are sufficient. By being very flexible at the protocol level, and strict in the semantics, this proposal attempts to build upon the strengths of the HTTP protocol when used in the semantic web. * unless explicitly stated as such in RDF ==Rationale== The discussion leading up to httpRange-14 has been described[1] as a collision of two different conventions for URI use. In one, a URI denotes the document-like entity (the "information resource") retrieved through a 200 GET response. In the other, a URI denotes any abstract concept, not necessarily a document or other media type. One can compare these two conventions with the development of the HTTP protocol. The former corresponds to HTTP 0.9 which is a protocol for exchanging hyper text documents. The latter corresponds to HTTP 1.0 which introduced the concept of resources and representations, thus adding a level of abstraction to the WWW. This proposal argues that the protocol for URI documentation discovery has to reflect this difference between versions of the HTTP protocol. The current httpRange-14 resolution is closer in spirit to HTTP 0.9, but for HTTP 1.0 and later there is a need for an amended version, described below. The explicitly stated problem httpRange-14 addressed was whether a HTTP URI is constrained to denoting "documents", or whether it could denote any possible concept. The resolution confirmed the latter position. The discussion then changed its focus to the existence of two distinct resources, the resource named by the “probe” URI and a separate resource that acts as a description for this URI. The question then became how to avoid confounding the initial URI with the URI for its descriptions. The method proposed for this was HTTP redirects which will involve two (or more) URIs, starting with the probe URI and ending with the description URI. However, the overhead of setting up redirects has been seen as a technical hurdle by some developers. Concerns has also been raised that details of the HTTP protocol ought to be orthogonal to semantic definitions. One way of ameliorating this problem is to reconsider what a descriptive resource is. One way of formulating the problem is the need to separate the abstract resource, say Paris, from a web page about Paris. However, we can regard descriptions not as “web pages”, but as “documents”. The description of Paris retrieved from the URI denoting Paris is not a web page, but an HTML (or RDF) document. Thus the description is equivalent to the retrieved representation. As documents in HTML and other media type consist of octets, they are resources that exists independently of their retrieval (i.e. locally on disc). As they are unchanging they are also representations of themselves, similarly to RDF literals. This proposal thus replaces the distinction between "information resources"[2] and other resources with the pre-existing distinction between web-accessible resources and the binary resources used as entity bodies in HTTP and other protocols. Since these two classes are fundamentally different, methods for documentation discovery will be discussed separately. For web-accessible resources, the main method for documentation discovery is to access the probe URI through the appropriate protocol. A RDF graph or an equivalent object is then extracted from the representation and the graph is tested for the explicit occurrence of the probe URI. The URI documentation may also reside in other resources. These resources can be found by following links in the document or HTTP header. There exists a number of complementary methods for describing such links. Another method for documentation discovery is content negotiation. Since RDF is the preferred method for resource description, negotiation for a RDF-based media type might return appropriate information. This proposal removes the need for a separate discussion for hash URIs, since the methods described above work equivalently in this case. As a bonus, the same rationale solves the problem with hash URIs that also acts as local identifiers. If an element referred to by the hash URI exists within a document, the element is a representation of the resource denoted by the hash URI, not the resource itself. Thus no conflict arises when the same local identifier is used in different media types. For the stand-alone binary resources, a method to describe them is needed as well. Since they exist independently of the web, as files on disc or e-mail attachments, this method should not depend on retrieval. In many media types this is a solved problem. HTML documents contain document meta data in the <HEAD> element, using <TITLE>, <META>, and <LINK> elements. PDF, JPEG, and MP3 have similar meta data containers (XMP, EXIF, ID3), whose contents can be converted to RDF if necessary. For some applications though, there might be a need for a URI naming the binary resource. Fortunately, a URI scheme for representations already exists – the data URI scheme. This scheme, although slightly unwieldy, can unambiguously denote a binary resource of any registered media type. ==Details== The proposed document[1] could be replaced with a short note explaining an algorithm to find machine-readable descriptions of a dereferencable resource. If a rewrite is considered necessary, some changes are proposed below: >>> 3 Probe URI lacking local identifier If the URI scheme of the probe URI is 'http' or 'https', the URI has a nominal URI documentation carrier in the following ways. The cases are not exclusive (e.g. both GET and Link: may yield nominal URI documentation carriers for the URI). 3.1 General case A representation retrieved through a HTTP GET on the probe URI is a nominal representation of this URI regardless of any intervening redirects. If a nominal representation from the probe URI includes documentation of this URI (through RDF, RDFa, etc.), this information is nominal under the constraints given by the HTTP header. 3.2 Content-negotiation Alternative representations obtained by setting the Accept HTTP header are also valid as nominal representations. 3.3 Linked documentation If a nominal representation from the probe URI includes a URI documentation link in its response to the retrieval request, then nominal representations from the link target are nominal URI documentation carriers for the probe URI. There are multiple ways to locate a URI documentation link in an HTTP response: * using a <LINK> with relation “alternative” and a media type capable of providing RDF; or other relations known to link to documentation (e.g. “describedby”) * using a Link: response header with content equivalent as above * from the object of a triple in the RDF graph extracted form the response having the probe URI as the subject and rdfs:seeAlso or rdfs:isDefinedBy as the predicate. 3.4 Discovery via redirection For purposes of discovery, redirect chains are often followed. That is, if retrieval is requested using a URI U1, and a retrieval using U1 yields a redirect to U2, and a retrieval request using U2 succeeds with a result R, then R is consequently considered a nominal URI documentation carrier for U1. Note that this practice creates a reliance on U2's URI owner as well as on U1's, increasing chances of failure at the application level. 3.5 Probe URI with local identifier When a URI is of the form stem#id (a 'hash' URI), a nominal representation from the stem is a nominal URI documentation carrier for the probe URI. Documentation discovery is thus equivalent for both types of URIs. 4.1 Documentation of document resources As the resources used as entity bodies in HTTP are stand-alone binary resources, documentation for them has to be self-contained. However, adequate means are available in many machine-readable formats. Meta data for HTTP document can be added through <META> and <LINK> elements; formats like PDF, JPEG, and MP3 also provide containers for file meta data. Media types lacking capabilities for machine-readable meta data, e.g. text/raw, are unsuitable for machine consumption even though they may contain human-readable documentation. 5. [Add] 7. Relative URIs combined with <BASE> semantics change their absolute meaning when the representation is obtained through contexts other than dereferencing [Remove 5.2] [Rename 5.3 to 5.2] 6. Comparison with the TAG resolution This proposal avoids any semantic implications of retrieval with the HTTP protocol, forcing all documentation to explicitly state the subject of each statement. Importantly, the class of “information resources” is made obsolete, since the onus is on representations as carriers of information. >>> Some other documents and specifications will also be affected by this change. The document “Architecture of the World Wide Web”[3] needs to be edited. Remove the discussion of information resources from section 2.2 and replace all occurrences of the term with plain “resource”. Proposed rewrite: >>> 2.2. URI/Resource Relationships By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term "resource" is used in a general sense for whatever might be identified by a URI. It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. This document is an example of a resource. In the case of this document, the message payload is the HTML representation of this document. However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you've printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. Constraint: URIs Identify a Single Resource >>> ==Impact== ===Positive effects=== * The proposal conforms closely to REST principles and is thus easy to understand for many web developers. * The subject of meta data will always be explicitly specified and not influenced by unexpected redirects and other retrieval level factors. * The class of information resources can be scrapped. * License and other meta data for documents are not lost when the document is downloaded locally or sent via e-mail &c. * Document specific meta data (<LINK rel=”stylesheet”>) won't be associated with the original resource. * 303 See also redirect are compatible with the current solution. * Hash URIs are subsumed by the proposal and no longer treated separately. ===Negative effects=== * The model is not intuitive for laypeople who think of the HTML document as being identical to the denoted resource. * It will be impossible to add machine-readable meta data to simple formats like text/raw. (You can still add human-readable data though). * RDFa default behavior for the HTML HEAD element[4] must be amended to account for local document meta-data. ===Conformance class changes=== As I have no idea what a conformance class is and where it is defined I'll leave this section for others to consider. ===Risks=== The emergence of another decade-long debate on whether the change was a good idea or not.... ==References== [1] http://www.w3.org/2001/tag/doc/uddp/ [2] http://lists.w3.org/Archives/Public/www-tag/2003Jul/0377 [3] http://www.w3.org/TR/webarch/ [4] http://www.w3.org/TR/rdfa-syntax/ End of proposal. Tore Eriksson
Received on Sunday, 25 March 2012 03:45:01 UTC