httpRange-14 Change Proposal from Jeni Tennison on 2012-03-25 (public-lod@w3.org from March 2012)

From: Jeni Tennison <jeni@jenitennison.com>
Date: Sun, 25 Mar 2012 10:47:11 +0100
To: "www-tag@w3.org List" <www-tag@w3.org>
Cc: public-lod community <public-lod@w3.org>
Message-Id: <B4FF563C-FAE4-442D-99DF-92DE5CA8441C@jenitennison.com>
Hi,

Please find below a Change Proposal for the consideration of the TAG in response to [1] on behalf of (alphabetically):

Ian Davis
Leigh Dodds
Nick Gibbins (University of Southampton)
Hugh Glaser
Steve Harris
Masahide Kanzaki
Gregg Kellogg
Niklas Lindström
Jerry Persons
Dave Reynolds
Bill Roberts
Andy Seaborne
John Sheridan
Ben O'Steen
Damian Steer
Thomas Steiner
Ed Summers
Jeni Tennison
Davy Van Deursen

and with thanks to other members of the LOD mailing list who helped identify areas that required clarification. The original version is at [2].

[1] http://www.w3.org/2001/tag/doc/uddp/change-proposal-call.html
[2] https://docs.google.com/document/d/1ognNNOIcghga9ltQdoi-CvbNS8q-dOzJjhMutJ7_vZo/edit

---

# Summary #

This proposal contains two substantive changes.

First, a 200 response to a probe URI no longer by itself implies that the probe URI identifies an information resource or that the response is a representation of the resource identified by the probe URI; instead, this can only be inferred if the probe URI is the object of a ‘describedby’ relationship or the target of a 303 redirection.

Second, it enables publishers to link to URI documentation for a given probe URI by providing a 200 response to that probe URI that contains a statement including a ‘describedby’ relationship from the probe URI to the URI documentation.


# Rationale

While there are instances of linked data websites using 303 redirections, there are also many examples of people making statements about hash-less URIs (particularly using HTML link relations, RDFa, microdata, and microformats) where those statements indicate that the URI is supposed to identify a non-information resource such as a Person or Book. The Appendix provides examples of these.

Rather than simply telling these people that they are Doing It Wrong, “Understanding URI Hosting Practice as Support for URI Documentation Discovery” should ensure that:

 * applications that interpret such data do not draw wrong conclusions about these URIs simply because they return a 200 response

 * publishers of this data can easily upgrade to making the distinction between the non-information resource that the page holds information about and the information resource that is the page itself, should they discover that they need to


# Details

In section 4.1, in place of the second paragraph and following list, substitute:

  There are three ways to locate a URI documentation link in an HTTP response:

   * using the Location: response header of a 303 See Other response [httpbis-2], 
     e.g.

     303 See Other
     Location: http://example.com/uri-documentation>

   * using a Link: response header with link relation 'describedby' ([rfc5988], 
     [powder]), e.g.

     200 OK
     Link: <http://example.com/uri-documentation>; rel="describedby"

   * using a ‘describedby’ ([powder]) relationship within the RDF graph created by 
     interpreting the content of a 200 response, eg:

     200 OK
     Content-Type: text/turtle

     PREFIX :<http://www.iana.org/assignments/relation/>
     <http://example.com> 
       :describedby <http://example.com/uri-documentation> ;
       .

Before the last paragraph of section 4.2 insert the following two paragraphs:

  In the third case, where the ‘describedby’ relationship is used,   
  <http://www.iana.org/assignments/relation/describedby> and 
  <http://www.w3.org/2007/05/powder-s#describedby> must be treated as equivalent, as 
  defined in Section 4.1.4 Semantic Linkage Using the describedby Property of the 
  POWDER Recommendation.

In the last paragraph of section 4.1, for “(But see below for the case when retrieval is successful.)” substitute “The next section describes how to interpret a 200 response, and therefore applies in the last two cases described above.”

In section 4.2, in place of the first paragraph (after the Editorial Note), substitute:

  If there is a nominal representation Z from the probe URI (a 2XX response), and 
  the application is aware of a ‘describedby’ relationship of which the probe URI is 
  the object, which may be the case because

   * the probe URI is itself a URI linked to through one of the mechanisms listed in 
     Section 4.1 
   or
   * Z itself contains a statement in which the probe URI is the object of a 
     ‘describedby’ relationship

  then this is equivalent to there being a nominal URI documentation carrier for the 
  probe URI that says that Z is a current representation of the resource identified 
  by the probe URI, and, moreover, that the identified resource is an "information 
  resource" (see below). In other cases, no such inference can be made (the   
  application cannot tell whether the probe URI identifies an information resource 
  or not).

We also recommend that a clear guide on best practices when publishing and consuming data should be written, possibly an update to [cooluris].


# Impact

## Positive Effects

 * common usage of URIs in sites supporting RDFa, microdata and microformats are no longer deemed to be Doing It Wrong, which means this data can be interpreted in the way that it was intended by those publishers by conformant applications

 * publishers that cannot change server configuration (to use 303s or Link headers) can still use separate URIs to identify a non-information resource and the information resource that describes it

 * publishers who (through ignorance or preference) originally publish data about non-information resources without using 303s or Link headers can retain those URIs and add the ‘describedby’ statement to add separate identifiers

 * it is possible to have multiple description documents for a given URI, where a 303 response only allows one

 * it means the same method can be used to provide descriptions of non-information resources as is used for providing descriptions of information resources, which aids adoption

 * it means there is a standard method for providing links from documentation to the thing that it documented

 * it provides a standard means of explicitly adding in data information that would otherwise only be available if a resource is accessed by HTTP, which means that reasoning dumps of crawls of the web (eg from webdatacommons.org) becomes more consistent with what could be inferred from the crawl itself

 * browser location bars don’t change when navigating to URIs that provide a 200 response, which means less copy/paste errors and user confusion when trying to encourage people to use URIs for non-information resources and not those for their documentation

## Negative Effects

 * existing applications that assume that a 200 response is only given for an information resource may make false inferences about what a probe URI identifies (but this happens already, as people already publish data in this way)

 * there are more cases where applications will have to draw on reasoning from other properties (eg declared types of resources) to work out what a URI identifies

 * when documentation is served with a 200 response from a probe URI and does not contain a 'describedby' statement, some agents (including the publisher) might use it to identify the documentation and others a non-information resource. Publishers still need to provide support for two distinct URIs if they want to enable more consistent use of the probe URI; a set of best practices for linked data publishers would need to spell out what publishers should do and how consumers should interpret the information provided within the response and that found at the end of any ‘describedby’ links


# Conformance Classes Changes

There is no mention of conformance classes in the document.


# Risks

There are no risks.


# References

[cooluris]
Leo Sauermann and Richard Cyganiak. Cool URIs for the Semantic Web. W3C Interest Group Note, 03 December 2008. (See http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.)

# Appendix: Examples of 200 Responses for NIRs

## http://www.logosportswear.com/product/1531

response with a 301 redirection to 
http://www.logosportswear.com/product/1531/harbor-cruise-boat-tote which contains the RDFa statement

 <http://www.logosportswear.com/product/1531>
   a <http://rdf.data-vocabulary.org/#Product> ;
   .

The URI is intended to identify a product, not a web page.


## http://developer.yahoo.com/yui/docs/YAHOO.util.Dom.html

contains RDFa statements that state that this web page contains events, methods and properties:

 <http://developer.yahoo.com/yui/docs/YAHOO.util.Dom.html> 
   yui:attributes <#configattributes>;
   yui:description """
                       Provides helper methods for DOM elements.
                   """;
   yui:events <#events>;
   yui:methods <#methods>;
   yui:name "YAHOO.util.Dom";
   yui:properties <#properties> .

From the statements, the intention is for the URI to identify the (programming language) Object, not a web page (despite the .html on the end!).


## http://gondwanaland.com/mlog/2005/03/13/semweb-not-by-committee/

contains the RDFa statements

 <http://gondwanaland.com/mlog/2005/03/13/semweb-not-by-committee/>
    dcterms:publisher <http://gondwanaland.com/mlog/> ;
    sioc:has_owner <https://creativecommons.net/ml/> ;
    .

The range of dcterms:publisher is a dcterms:Agent, but http://gondwanaland.com/mlog/ returns a 200.

The range of sioc:has_owner is a sioc:UserAccount, but https://creativecommons.net/ml/ returns a 200.


## http://www.feedbooks.com/book/2679

contains microdata statements. How you should interpret these as RDF is obviously debatable but the obvious thing to do is for a href attribute to indicate the resource that it targets, so the page includes the statements

  [ a schema:Book ;
    schema:author <http://www.feedbooks.com/author/496> ; ]

The range for schema:author is intended (I think) to be a person rather than a web page about a person, but resolving http://www.feedbooks.com/author/496 gives you a 200.

(Based on the webdatacommons.org dumps, this site used to serve up RDFa that stated that <http://www.feedbooks.com/book/2679> identified a Book; books are not web pages.)


## http://www.mybanktracker.com/Citibank/Profile

contains the RDFa statements

   <http://www.mybanktracker.com/Citibank/Profile> 
     v:dtreviewed "2012-01-05 16:42:49"@en-US;
     v:itemreviewed <http://www.mybanktracker.com/Citibank/Profile>;
     v:rating "4"@en-US;
     v:reviewer <http://www.mybanktracker.com/member/lisaehrlich>;
     .

The review is clearly about Citibank and not the web page.

The object of the v:reviewer property should, I imagine, be a person but is instead a web page.


## http://www.flickr.com/photos/andreaweckerle/2559011937/

used to contain the triples (according to the webdatacommons.org data)

  <http://www.flickr.com/photos/andreaweckerle/2559011937/>  
    dcterms:creator <http://www.flickr.com/photos/andreaweckerle/> 
    .

  <http://www.flickr.com/photos/andreaweckerle/> 
    foaf:name "andreaweckerle"
    .

where http://www.flickr.com/photos/andreaweckerle/ resolves to a 200 but is plainly intended here to be a Person. Those statements don't seem to be there any more.


## http://www.businesswire.com

redirects with a 302 to http://www.businesswire.com/portal/site/home/ which gives a 200 response and contains microdata that maps to the triples

<http://www.businesswire.com> a schema:Organization;
   schema:name "BUSINESS WIRE";
   schema:url <http://www.businesswire.com> .

which says that http://www.businesswire.com is an organisation, not a web page.

-- 
Jeni Tennison
http://www.jenitennison.com
Received on Sunday, 25 March 2012 09:47:37 UTC