On "in defense of Ambiguity" (was RE: Uniform access to descriptions) from Williams, Stuart (HP Labs, Bristol) on 2008-03-27 (www-tag@w3.org from March 2008)

From: Williams, Stuart (HP Labs, Bristol) <skw@hp.com>
Date: Thu, 27 Mar 2008 13:59:27 +0000
To: Harry Halpin <hhalpin@ibiblio.org>
CC: Jonathan Rees <jar@creativecommons.org>, "www-tag@w3.org WG" <www-tag@w3.org>, Pat Hayes <phayes@ihmc.us>
Message-ID: <9674EA156DA93A4F855379AABDA4A5C611A1D57487@G5W0277.americas.hpqcorp.net>
Hello Harry,

Apologies for the delay in responding, Easter intervened, and I have also taken time to read through the whole of the pre-print that you reference... an enjoyable read though I do have some quibbles...

Just before I leap to the defence of the TAG, I was not a member of the TAG at the time it resolved httpRange-14 with the advice at [1] (which whilst quoted the source seems to go unreferenced in the paper) - so I am unable to address or articulate what the TAG intent wrt to it's resolution at the time.

That said, as an interested bystander at the time, I did lend the resolution may support (for whatever that may be worth!) [2] - which I can speak to, though I have explained that in the past [3]. A key point (from my pov) of which is picked up by John Cowan [2] (ie. the extract he quotes rather than the barrier to entry he cites); and which I (at least like to) think was influential on Pat's re-presentation of the TAG's advice at [4] which I saw at least as a level of ice melting and coin-dropping with respect to may earlier explaination - though the two may be entirely unrelated - which I also thanked him for [5]. So... I am disappointed that little of that more positive comprehension and make's it into the paper - infavor of perhaps a level of stick-wagging and maybe teasing in places - though maybe I'm feeling sensitive.

So... to repeat what I regard as the positive qualities of the resolution announced at [1], they are:

1) The resolution settles the question of whether or not an http scheme URI without fragment can be used within articulated the constraints of Web Architecture to *refer* to any kind of thing. The answer it gave was 'yes'. Prior to that answer there was at least a dispute whether http URI sans fragment could be used with the constraints of Web Architecture to refer to non-document-like things. At least one prior position was that http URI sans fragment could be used to refer only to document-like things and that references to other kinds of things (if made with http URI) should be made with fragmented http URI - and lest you think otherwise, I am speaking of reference *not* access, and I don't believe that I am confused about such matters (though you may of course beg to differ) - more below.

2) The resolution provides a strategy whereby an agent equipped with a reference encanted as an http URI can obtain a ether a description or a representation of the said referenced thing (though which, if any, is obtained is known only once an access using the referencing URI has been attempted).

On Access and Reference:

A few sentence quotes from the paper:

"Web architecture does not determine what any names, including URIs, refer to. It only determines what they access. "

"The relationship between access and reference is essentially arbitrary; it is ours to decide and cannot be decided by Web architecture."

"In practice, web architecture does not determine what any names, including URIs, refer to. It only determines what they access."

The three seem to say roughly to repeatedly restate the same point.

Wrt http URIs at least then it is within the gift of Web Architecture to state what they are used to HTTP URIs refer to - starting with the URI spec and delgating onward through registered scheme specifications and so forth each respecting such constraints are imposed from above. Surely, that is the nature of normative references - such as a normative reference to the URI spec. in the RDF specification - which would entail URI not simply being unconstrained free names in a logic - but taking on such constraints as are imposed by those reference specifications (and delegated onward to scheme specifications) - operating at the intersection of URI related specs and RDF specs.

Whilst it would likely take some time for me to find adequate references to ground this, I believe that an intentional constraint of the design of Web Architecture is that: the intended referrent of all references made using URI assigned (by social processes) to a web accessible things (ie. things that provide a direct 200 response to an HTTP GET operation) is the said web accessible thing.

Whilst "Stuart's mother is Scottish" is a true statement about my mother, the referent of the proper name Stuart is nevertheless... me. (translate to rdf and URIs if you wish).

So... I find myself a little at odds with the Web Architecture "does not" possibly implied "cannot" tone of the quoted sentences. As a set of constraints Web Architecture is a choice - and 'conforming' to them is a choice - such that Web Architecture can determine things for those that choose to live within its constraints.

Another sentence quote:

"The use of 303 redirection seems to presume that if a URI accesses something directly (not through an HTTP redirect) then the URI must refer to what it accesses. This presumes, wrongly, that the distinction between access and reference is based on the distinction between accessible and inaccessible referents"

So the first sentence captures (as a presumption) what I stated above (ok... without citations) as an intentional constraint of Web Architecture - however I fail to understand how the second sentence follows from the first - that for accessible referents their associated URI refer to what could be accessed (ie. access does not in fact have to occur) and is, well er... , less-grounded for inaccessible referents doesn't establish any distinction between access and reference. References are made by incanting URI (referring names) in documents; Access attempts are what happen when you use those referring names in protocols - they don't happen as a direct consequence of reference, something actionable intervenes. I think you set up a false basis for the distinction which AFAICT the TAG at least has not made for the purposes of then arguing that it is wrong...


"One could state that URIs only refer to accessible things just when the accessible thing is actually assigned that name; and assigning a name is done only by an explicit naming convention, the Web equivalent of pointing to the thing and giving it a name. There are two ways to attach a name to a thing:  by being it or by naming a URI that accesses it. The two kinds of naming convention this makes possible are similar, respectively, to wearing a name badge and to having someone point at you and say, "I will call you Spiderman."  "

A little more care to distinguish a (for want of a better expression) delegated naming authority 'baptising' a thing (particially by deploying/configuring some web infrastructure that will respond to access requests made using that name) and some third party saying "Will call you (ie. Harry Halpin) "http://www.w3.org"" when making a reference using a name. ie. there are three parties (a named thing; a naming authority; and some party making a reference with a name) and you speak of two (a named thing; and one of the other two but I can't tell which).

Later on you introduce some useful rdf properties and go on the claim their use in someway contributes to distinguishing access from reference. I'm probably being thick, but at present I fail to see how! RDF statements are made in documents using *references* made with URIs. AFAIUI I think the constraints of Web Architecture as mention above are that all URI are taken as referring; some can be used for access; those that can be use for access (always) refer to what they access. [This is modulo domain name reallocations and resources moving - which is regarded as 'less than idea']. Some (simple) examples of how inclusion of RDF statements (by whatever mechanism) distinguish access and referencing usages of a URI would be helpful.

On fragID and media-types and # URIs:

You devote a paragraph or two to arguing:

 "It appears that the W3C may very well be contradicting the relevant IETF specification by supporting the hash URIs. ..."

and I can see how you reach that point. However, I'd like to offer a different way to look at it, though that two comes with a costs.

Basically when deploying a resource/URI have an eye on your intentions - that is you are incontrol of the fragids that you deploy - be intentional about what you (as resource author/owner) intend that they refer to - and (this is the cost) don't mix incompatable media-types (or at the very least ensure that there is no intersection in the fragIds used in representations with incompatible media-types).

What do I mean by incompatible media types: well RDF has an implicit thing described by indirection in the interpretation of URIs with fragIDs - what is referenced is not the text in RDF document at a given 'anchor' but the thing that that text describes. This is at odds with HTML and probably most */*+xml media types where what is referenced is some fragment of the documents text. IMO, this makes content-negotiation between RDF and HTML representations of the same thing close to impossible. I think that presents significant issues for both GRDDL and RDFa that are presently unresolved - and probably in the "I hope noone notices right now" category.

But... if you take an intentional view about the arrangement fragments/anchors/potential link targets I think that you may find some peace wrt to fragIds having a coherent meaning across muliple available representations - the person deplying the resource should ensure their coherence - the media-type induced meaning and the meaning induced by RDF assertions about the same thing seem to operate at different levels in any case - though, with intent, for a give resource affairs can be so organised that they are consistent.

Regards

Stuart
--
[1] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039
[2] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0045
[3] http://lists.w3.org/Archives/Public/www-tag/2007Jul/0018
[4] http://lists.w3.org/Archives/Public/www-tag/2007Sep/0017
[5] http://lists.w3.org/Archives/Public/www-tag/2007Sep/0035

> -----Original Message-----
> From: Harry Halpin [mailto:hhalpin@ibiblio.org]
> Sent: 20 March 2008 20:27
> To: Williams, Stuart (HP Labs, Bristol)
> Cc: Jonathan Rees; www-tag@w3.org WG
> Subject: Re: Uniform access to descriptions
>
> Williams, Stuart (HP Labs, Bristol) wrote:
> > Hello Harry,
> >
> >
> >> We also should make sure any solution
> >> is *easy* to deploy over various levels and makes it
> perfectly clear
> >> what's going on (somewhat unlike 303, which is rather hard to
> >> deploy and minimalist).
> >>
> >
> > 303 is straight-forward and simple. If you want to use it
> to good effect to get agent to triple about things that
> aren't on the web then you can use it to good effect to do so
> for the things 'off-the-web' that you have chosen to give
> http: URI (sans frag) to.
> >
> Thanks for the reply Stuart (the rest I talk about in my response to
> Roy). I do think 303 *might* help the particular problem brought up by
> DanC [1] a while back, but it does not address the issue about
> connecting authoritative representations to URIs. Both you
> and Jonathan
> may be interested in the following pre-print of a paper by myself and
> Pat called "In Defense of Ambiguity" which comes out in the IJSWIS
> Journal 4(3), later this year [2]. The pre-print is here:
>
> http://www.ibiblio.org/hhalpin/homepage/publications/indefense
> ofambiguity.html
>
> I'm just going to cut and paste a bit from the paper here, which comes
> over the number of ways in which 303 is insufficient to
> distinguish not
> just between information resources and other types, but also between a
> rather simple relationship between access and reference (feel free to
> substitute "information resource" for "access" and "thing
> that isn't an
> information resource" for "reference" - that's close enough
> for reading
> purposes to get the general geist).
>
> Furthermore, we'll go into the numerous ways that the hash solution,
> while I think very useful - I use it myself - needs various
> standards to
> be fixed a bit to work and also doesn't really address the problem of
> attaching normative descriptions to resources. I do hope this
> helps, but
> I also think it should make us in the Web community a bit
> nervous about
> rubber-stamping any solution to both httpRange-14 and
> httpDescriptions-57 quite yet.
>
> " Pragmatically, there are problems with the TAG's suggested
> redirection. It uses a distinction in how a text is delivered (an HTTP
> code) to disambiguate the accessible Web page itself; a
> category mistake
> analogous to requiring the postman dance a jig when delivering an
> official letter. Since the vast majority of names, even on the Web,
> refer to things which are not accessible, this requires referring URIs
> to perform a act of redirection with doubtful benefit. As
> shown earlier,
> since the URI bears no trace of its delivery to the majority of human
> Web users that do not monitor or understand HTTP status codes, no
> disambiguation is achieved for the human. The TAG is correct
> in noticing
> this solution could solve the problem of inference brought up by
> Connolly (2006), but it does so in such a manner that not only makes
> normally harmless overloading illegal but that does not even make the
> distinction between access and reference clear. The
> particular solution
> requires the use of an arcane redirection technique that most people
> actually hosting URIs are not familiar with and cannot even deploy,
> since deploying 303 redirection requires access to the web server many
> users may not have. It also produces harmful effects by misusing HTTP
> codes for an alien purpose. The particular code, 303, is only
> valid for
> HTTP 1.1 and was originally introduced to solve a completely different
> problem. As put by the specification, "this method exists primarily to
> allow the output of a POST-activated script to redirect the user agent
> to a selected resource," not to distinguish access and reference
> (Fielding et al., 1999). The 303 status code was invented due to the
> over-use of the HTTP 1.0 302 status code to redirect both temporarily
> and permanently. The 307 and 303 status codes in HTTP 1.1 could
> disambiguate between the two cases of redirection, with the 303 status
> code having future requests to that URI being automatically redirected
> by the browser unlike the 307 status code, which is only a "temporary"
> redirection. Given this history, it is unclear why 303 is suitable for
> distinguishing between access and reference. Why not just invent a new
> HTTP status code? The negative effects of this redirection requirement
> will continue and achieve little in return.
>
>
> The main alternative to using HTTP 303 is to have a fragment
> identifier-the hash-attached to a URI to get redirection for free. So,
> if one wanted a URI that referred to the Eiffel Tower itself
> without the
> hassle of a 303 redirection, one would use the URI
> http://www.tour-eiffel.fr/# to refer to the Eiffel Tower and the URI
> http://www.tour-eiffel.fr/ to access a Web page about the
> Eiffel Tower.
> Since browsers think the "#" URI means a fragment of a
> document or some
> other representation, if a user tries to access via HTTP GET a "hash
> URI" it will not return a "404 Not Found" status code, but instead
> simply resolve to the URI before the hash. In this way
> machine reasoners
> can keep the URI that refers to the Eiffel Tower and a Web page about
> the Eiffel Tower separate, while a human can access the URI
> "about" the
> Eiffel Tower and receive some information about it, in
> essence by taking
> advantage of some predefined behavior in web browsers. This solution
> would solve the inference problem where monuments and Web pages are
> defined in OWL as disjoint. This is valid because according to the W3C
> TAG's "Architecture of the Web," using a fragment identifier
> technically
> also identifies a separate and distinct "secondary resource"
> (Jacobs and
> Walsh, 2004). Further, the TAG states that "primary and
> secondary simply
> indicate that there is a relationship between the resources for the
> purposes of one URI: the URI with a fragment identifier. Any resource
> can be identified as a secondary resource" (Jacobs and Walsh,
> 2004). So,
> using hash URIs has the exact same problem as 303
> redirection, since it
> doesn't normatively define any sort of relationship between the two
> URIs, much less distinguish between access and reference.
>
> It appears that the W3C may very well be contradicting the
> relevant IETF
> specification by supporting the hash URIs. The URI specification says
> "the semantics of a fragment identifier are defined by the set of
> representations that might result from a retrieval action on
> the primary
> resource. The fragment's format and resolution is therefore
> dependent on
> the media type of a potentially retrieved representation, even though
> such a retrieval is only performed if the URI is dereferenced"
> (Berners-Lee et al., 2005). If the media type explicitly defines what
> fragment identifiers do, then the user should obey the standard of the
> media type. Only "if no such representation exists, then the semantics
> of the fragment are considered unknown and are effectively
> unconstrained" (Berners-Lee et al., 2005). In other words, only if you
> get a 404 from http://www.tour-eiffel.fr/ can
> http://www.tour-eiffel.fr/# mean anything you want. However, if a Web
> page with the "text/html" media type is returned by accessing the
> primary (no hash) URI, then according to the HTML specification, "for
> documents labeled as text/html, the fragment identifier designates the
> correspondingly named element; any element may be named with the id
> attribute" (Connolly, 2000). In other words, fragment
> identifiers should
> be used for named elements in the document, not as a shortcut for
> distinguishing URIs used for reference and access. This defeats the
> entire purpose of using hash URIs, since the supposed benefit is that
> humans can "follow-their-noses" by accessing the primary URI
> and thereby
> access some human readable HTML about the URI. In the case where the
> "application/rdf+xml" media type is returned by the accessible URI,
> things are different. "In RDF, the thing identified by a URI with
> fragment identifier does not necessarily bear any particular
> relationship to the thing identified by the URI alone" so the hash
> convention can legitimately identify anything, including
> non-accessible
> resources (Schwartz, 2004). This seems to defeat the point of
> returning
> representations, since unlike rendered HTML, RDF/XML is much
> more easily
> used by machines than humans. If people accessed
> http://www.tour-eiffel.fr/ and received RDF/XML most would
> have no idea
> what to do with it. It is most useful for machine processing, not
> informing humans.
>
>
> Strangely enough, the very idea that a media type determines the
> semantics of the fragment identifier is in conflict with other
> statements from the W3C. Even if one accepted a "URI identifies one
> thing." if by using content negotiation, both a "application/rdf+xml"
> and "text/html" media type were available for a URI, then the
> meaning of
> the URI with fragment identifier would be interpreted two
> different ways
> depending on the media type received, and so the URI would
> not identify
> a single resource with a global scope. This fundamentally breaks the
> orthogonality of the specifications, as a single resource can return
> different kinds of representations, so how a "hash URI" can be used is
> dependent on media types. The URI specification explicitly says one
> should not do this, for "whatever is identified by the fragment should
> be consistent across all those representations" (Berners-Lee et al.,
> 2005). One could imagine the hash somehow being consistent across
> representations, but if the fragment identifier exists in a
> RDF document
> and in the HTML document, the meaning of the fragment
> identifier will be
> muddled since it will identify both a portion of a document
> in HTML and
> possibly some non-Web accessible thing. In cases where the fragment
> identifier exists in RDF and not in HTML, it will be a broken fragment
> identifier for an HTML document and perhaps specified by the
> RDF, and so
> inconsistent. If the fragment identifier is non-existent in
> both the RDF
> and HTML documents, in RDF the fragment identifier can identify a
> non-Web accessible resource but not so in the HTML document, where it
> will just be a broken fragment identifier for a particular document.
> Regardless, there needs to be a mechanism in HTML for saying
> that either
> the given use of a fragment identifier is for non-Web
> accessible things,
> or that fragment identifiers that are not given by the HTML
> representation can be anything, including non-Web accessible
> things. So,
> this use of fragment identifiers, while convenient and much more
> practical than 303 redirection, is as far from "a URI identifies one
> thing" as one can get. One can assume that at some point the W3C will
> fix the relevant specifications to be more inline with their proposed
> solutions, but the hash URI is no panacea for distinguishing
> access and
> reference. While easier for users to deploy than 303 redirection, it
> still does not distinguish access and reference any better than 303
> redirection.
>
> References:
>
> Berners-Lee, T., Fielding, R., and Masinter. L. (2005). IETF RFC 3986
> Uniform Resource Identifier (URI): Generic Syntax.
> http://www.ietf.org/rfc/rfc3986.txt.
>
> Conolly, D. (2000). IETF RFC (Informational) 2854 The
> 'text/html' Media
> Type. http://www.ietf.org/rfc/rfc2854.txt.
>
> Connolly, D. (2006). A Pragmatic Theory of Reference for the Web.
> Proceedings of the Identity, Reference, and the Web (IRW2006) Workshop
> at the World Wide Web Conference (WWW2006). Edinburgh, United Kingdom.
> May 22nd 2006.
>
> Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach,
> P. and Berners-Lee, T. (1999) IETF RFC 2616 - Hypertext Transfer
> Protocol - HTTP/1.1. http://www.ietf.org/rfc/rfc1738.txt.
>
> Jacobs, I. and Walsh, N. (2004). Architecture of the World
> Wide Web. W3C
> Recommendation. http://www.w3.org/TR/webarch/.
>
>
>
> Schwatrz, A. (2004). IETF RFC 3870 application/rdf+xml Media Type
> Registration. http://www.ietf.org/rfc/rfc3870.txt.
>
> [1]http://www.w3.org/2006/04/irw65/urisym.html
> [2]http://www.ibiblio.org/hhalpin/homepage/publications/indefe
> nseofambiguity.html
>
> > Does 303 guarantee to get you to triple? No - but then you
> have probably provided very little help to anyone interested
> in the URI you deployed.
> >
> > Hard to deploy? Well, yes and no depending on the server
> software you are using and your access priviledges. That's a
> pragmatic problem induced by the design of servers and the
> admin policies under which they operate. It's not a problem
> of Architecture.
> >
> > Regard
> >
> > Stuart
> > --
> > Hewlett-Packard Limited registered Office: Cain Road,
> Bracknell, Berks RG12 1HN
> > Registered No: 690597 England
> >
> >
>
>
> --
>                 -harry
>
> Harry Halpin,  University of Edinburgh
> http://www.ibiblio.org/hhalpin 6B522426
>
--
Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN
Registered No: 690597 England
Received on Thursday, 27 March 2008 14:03:48 UTC