- From: Tore Eriksson <tore.eriksson@gmail.com>
- Date: Thu, 29 Mar 2012 23:21:23 +0900
- To: www-tag@w3.org
- Cc: Jonathan A Rees <rees@mumble.net>
After some feedback from Jonathan Rees and others, I have decided to submit a revision of my previous proposal (http://lists.w3.org/Archives/Public/www-tag/2012Mar/0085.html). The revised text is below. Tore Eriksson =========== ==Summary== This proposal is by design an attempt to be as minimalistic as possible. It focuses on representations of the probe URI and of some specific link targets as the only bearers of descriptions, and the use of URI-based data models, exemplified by RDF, to encode these descriptions. Only information explicitly expressed in RDF is considered. Although it retracts the implicit semantics that arises from the httpRange-14 resolution, it is agnostic to whether information resources exists or how they are defined. If specific systems need to define the class of information resources, they are free to do so. Such semantics are not viewed as defining ones though. This proposal is also concerned with how to encode documentation of representations. This part is mostly a reminder of existing methods. ==Rationale== For close to a decade, the discussions concerning the interface between RDF and the World Wide Web has been focused on *how* resources act when interfaced on the web, especially whether the representations they provide are content-based or description-based. This is a valuable discussion per se, but attempts to reach a consensus have failed so far. This proposal intentionally tries to circumvent this discussion since the discussion is orthogonal to the current problem of how descriptions are found. As RDF is based on URIs for identification of resources, and HTTP and other web protocols are protocols for retrieving a representation given a URI, the necessary tools for solving the problem are readily available. For web-accessible resources, the main method for documentation discovery is to access the URI through the appropriate protocol. An RDF graph or an equivalent object is then extracted from the representation and the graph is checked for the explicit occurrence of the probe URI. In the primary version of this proposal, any caching and redirection that occurred is considered irrelevant. This proposal also removes the need for a separate discussion for hash URIs and content negotiation, since the methods described work equivalently in those cases. As many media types are unable to encode an RDF graph, a secondary method that relies on links is also provided. These links can appear in the HTTP response or within the retrieved representation. As it would be premature to delimit the possible link relations at this point, agents are allowed to add new link relations in the future. When a link relation is encountered, the target URI is accessed and information in the form of an RDF graph is extracted from any representations retrieved. The baseline document [1] only mentions URI documentation, but there has also been requests to distinguish between URI documentation and URI definitions. For completeness, a method for distinguishing these is also provided, although this is not part of the main proposal. The steps involved will be written within square brackets. One aspect of resource documentation that has been absent from most of the discussions is that of describing the representations themselves. They are not web-accessible resources since they lack an URI, but they play a prominent role on the web. How documentation is encoded in the representation is largely dependent on the media type. Since representations are binary objects - an octet stream with a media type - they are problematic to denote in a RDF graph that is not contained within the document itself. Fortunately, a URI scheme for representations already exists - the data URI scheme. This scheme, although slightly unwieldy, can unambiguously denote a binary resource of any registered media type. ==Details== The baseline document could be replaced with a short note describing an algorithm for locating and extracting machine-readable descriptions of a dereferencable resource. If a rewrite is considered necessary, some changes to the baseline document are proposed below: >>> 3 Documentation of web-accessible resources URI documentation is provided as RDF graphs. These graphs are encoded in nominal URI documentation carriers, representations retrieved through web access. The exact method for extracting a RDF graph from a document is dependent on the media type, and is not further discussed in this document. If the URI scheme of the probe URI is 'http' or 'https', the URI has a nominal URI documentation carrier in the ways described below. The cases listed are not exclusive (e.g. both GET and Link: may yield nominal URI documentation carriers for the URI). When multiple documentation carriers are retrieved, the URI description is the RDF graph obtained by merging the graphs acquired from each documentation carrier. 3.1 General case A representation retrieved by initiating an HTTP GET request on the probe URI is a nominal representation of this URI regardless of any intervening redirects. The representation is then checked for a contained RDF graph. If a graph can be extracted and the graph mentions the probe URI then this information is nominal under the constraints given by the HTTP headers (Expires: etc.). [If no redirects re encountered, or if all intervening redirects have a status codes in the set {300, 301, 305, 307}, then the description is considered a definition of the URI.] 3.2 Content-negotiation Alternative representations obtained by setting the Accept HTTP header are also valid as nominal representations. 3.3 Linked documentation If a nominal representation from the probe URI includes a URI documentation link in its response to the retrieval request, then nominal representations from the link target are nominal URI documentation carriers for the probe URI, and the representation is treated as in the general case. There are multiple ways to locate a URI documentation link in an HTTP response: * using a <LINK> element with relation “alternative” and a media type capable of encoding RDF * using a <LINK> element with a relation known to link to documentation (e.g. “describedby”) * using a Link: response header with content equivalent to the previous two cases * if an RDF graph was extracted from the nominal representation, the object of a triple having the probe URI as the subject and rdfs:seeAlso or rdfs:isDefinedBy as the predicate is considered a link target. [If the predicate is rdfs:isDefinedBy, the description acquired from the nominal representations of the link target is considered a definition of the URI.] 3.4 Discovery via redirection Redirection is considered within the general case, as the redirect chain is not accessible in all cases. It should be noted though that redirects creates a reliance on all the intervening URIs. This makes the redirect case technically similar to methods that uses linked documentation. 3.5 Probe URI with local identifier When a URI is of the form stem#id (a 'hash' URI), a nominal representation from the stem is a nominal URI documentation carrier for the probe URI. Documentation discovery is thus equivalent for both types of URIs, and treated as in the general case. 4.1 Documentation of representations As the resources used as entity bodies in HTTP are stand-alone binary resources, documentation for them has to be self-contained. However, adequate means are available in many machine-readable formats. Meta data for HTML documents can be added through <META> and <LINK> elements; formats like IMF, PDF, JPEG, and MP3 also provide containers for representation meta data. Media types lacking capabilities for machine-readable meta data, e.g. text/raw, are unsuitable for machine consumption even though they may contain human-readable documentation. >>> For section 5: [Add] 7. Relative URIs relying on <BASE> semantics might change their absolute meaning when the representation is obtained through contexts other than dereferencing [Remove 5.2] [Rename 5.3 to 5.2] >>> 6. Comparison with the TAG resolution This document avoids any semantic implications of retrieval with the HTTP protocol, forcing all documentation to be explicitly stated in RDF. Importantly, the class of "information resources" is unnecessary, since the onus is on representations as carriers of information. However, the steps outlined above are backwards compatible with systems that implement the httpRange-14 resolution, and allows for continued application of httpRange-14 where it is deemed necessary. The semantic implications are not seen as part of a URI description though. One effect of this proposal is that some (a vast majority at the time of writing) web-accessible resources will be missing machine-readable definitions. As there is no way of determining the rdf:type of these resources, using these URIs could be seen as problematic in some cases. It must be noted though that when the URI is only used to provide information, i.e. as the object of an rdf:seeAlso predicate, what is important is the information retrieved by accessing the URI. Specific information about the target resource should be provided though considering its importance for provenance. It is the opinion of the author that lack of URI definitions is not a obstacle to use the resource in RDF, especially considering the open world assumption. Preparing this document, a lot of the discussion has centered on whether the semantics of retrieval by HTTP is content-based or description-based in nature. This distinction is not important when following the steps provided. It should be noted though, that some methods provided will lead to inconsistent semantics for consumers that still adhere to the httpRange-14 resolution. >>> ==Impact== ===Positive effects=== * The proposal avoids the need for redirects, making it easy to implement. * The proposal conforms closely to REST principles and is thus easy to understand for many web developers. * Resource descriptions can be provided directly through the representation without the need to exist as a separate resource. * 303 See also redirects are compatible with the current solution. * The subject of meta data will always be explicitly specified in RDF and not influenced by unexpected redirects and other retrieval level factors. * The proposal is oblivious to a future deprecation of the class of information resources. * Hash URIs are subsumed by the proposal and no longer treated separately. * Content negotiation is handled transparently. * License and other meta data for representations are not lost when the document is downloaded locally or sent via e-mail &c. ===Negative effects=== * This proposal doesn't mandate that URI descriptions have a explicit URI (except for a data: URI), and this can be detrimental for provenance. * Large amounts of best practice that focuses on the distinction between instantiation and representation has to be rewritten. ===Conformance class changes=== [skipped] ===Risks=== As the proposal entails a major re-haul of current best practice, the architectural stability of the semantic web might be put in doubt. ==References== [1] http://www.w3.org/2001/tag/doc/uddp/ End of proposal.
Received on Thursday, 29 March 2012 14:28:15 UTC