Re: URI in a URI from Nathan on 2011-02-04 (semantic-web@w3.org from February 2011)

From: Nathan <nathan@webr3.org>
Date: Fri, 04 Feb 2011 15:41:03 +0000
To: Hugh Glaser <hg@ecs.soton.ac.uk>
CC: Vincent Huang A <vincent.a.huang@ericsson.com>, "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <4D4C1E0F.10305@webr3.org>
Hugh Glaser wrote:
> A very useful question.

agreed

> It is an important issue for the emerging services accessing the semantic web.
> 
> Yes in general using a # is a bad thing.

It should be noted that the above is a personal opinion by Hugh, is 
certainly not consensus, and the truth of the matter is that for the 
purpose of naming, the lexical form of a URI is irrelevant.

However, from a practical perspective when dereferencing URIs and 
providing services on the web, then all the component parts of the 
generic URI syntax have a purpose which should be understood by those 
publishing data / creating services.

Everything up to, and including, the query string component of an 
http:// or https:// scheme URI is visible to HTTP servers, the fragment 
is not. This is not true of all protocols and usage though, for instance 
if you pop a URI with a fragment in to a SPARQL query in order to find 
out more about it, the fragment will not be stripped and will be visible 
to the query engine.

Choosing which components of a URI you want to make use of, query string 
and/or fragments or neither, very much depends on your use case, and is 
a decision you're free to take yourself - each component has it own use, 
and if you want to make use of that component for the purpose it was 
intended, then feel free to do so.

> It is likely the server will never see the fragment after the #.

Given an HTTP server, then the fragment of a URI which is being used as 
the effective request URI, will almost certainly not be seen. Obviously 
if a hash-uri is encoded in a query parameter or included in the message 
then it will be seen.

> I don't know if it is written anywhere, but there seems to me a bit of a consensus around this.

I'm unsure what consensus or what "this" is being referred to, but as 
above HTTP Servers will not see the fragment component of a URI that is 
being dereferenced.

That is to say, if you try to dereference http://example.org/foo#bar via 
HTTP GET then:

- the fragment (#bar) will be removed by the HTTP client,
- http://example.org/foo will be split in to it's component parts
- a request will be made by HTTP to GET a response from something 
referenced by /foo on a server named by example.org.

Thus, if you would like the server to see "bar" then you'll need to 
construct your URIs as:

   http://example.org/foo/bar
or
   http://example.org/foo?bar
or
  any other valid URI syntax which does not include a fragment.

If however you simply want a process running at http://example.org/foo 
to return information about one or more things, say "bar" and "baz", 
then it may be beneficial to leverage the fragment component of the URI.

> And it is folded into the RESTful stuff.
> So for example
> http://kmi-web05.open.ac.uk/REST_API.html
> describes a typical service invocation with a URI as argument as:
> http://watson.kmi.open.ac.uk/API/semanticcontent/metadata/?uri=[docURI]

I'd be wary of terming the above a REST API, but that's orthogonal to 
this discussion, the usage pattern described by the spec and being 
referred to is common to http accessible services, and primarily is born 
from the familiarity of HTML forms sending element/value pairs in a 
query string when the form is submitted (via GET).

If you want to create a service that can be queried to return 
information about x or y, where x and y refer to things elsewhere on the 
web or in a database, then it often makes sense to pass x or y to the 
service so it knows what to do.

Making use of the querying string to pass x and y when using GET is 
perfectly valid for this use case, and indeed is the primary use case 
for the query string component of a uri.

If the information you return is more persistent in nature, then you may 
want to simply append x or y to the end of the uri instead, this is 
because many network caches and intermediaries automatically don't cache 
response where the target uri had a query string.

Hence why some (like those listed below) use query params (essentially 
inferring this is a service you query), and others, like uri burner, 
simply append the uri (essentially inferring this the name of some 
information about y).

> We do something similar in http://sameas.org/
> http://sameas.org/about.php describes in detail, with things like:
> http://sameas.org/?uri=http://dbpedia.org/resource/London (the NIR)
> and
> http://sameas.org/rdf?uri=http://dbpedia.org/resource/London (an IR)
> etc.
> and also in the rkbexplorer services things like
> http://www.rkbexplorer.com/network/?uri=http://southampton.rkbexplorer.com/id/person-00021&type=person-person&format=tsv
> as well as with two URIs
> http://www.rkbexplorer.com/connections/?source=http://southampton.rkbexplorer.com/id/person-da9c463f8b783083d7d7e9003db8224f-57e2ec2d7aee429c73fef344805033e2&target=http://southampton.rkbexplorer.com/id/person-17e6d4cf4846bd195454a7c1143a20fb-32a6807d38b58d6d56e31d88f5e48de2&type=person-person
> 
> So you could use
> http://example.com/sensors/sensor1/lookup?uri=http://sweet.jpl.nasa.gov/2.1/propTemperature
> 
> Unless someone wants to tell us that is crazy?

Not at all, but Vincent, it's worth noting that for a long time there 
has been much debate about whether we should "name" "things" with what 
is referred to as slash or hash URIs, both myself and Hugh have 
previously had some strong convictions in this respect, and some of 
Hughs email may have had a tone that reflected this, when really it's 
orthogonal to this discussion (no offence to hugh of course, we've all 
been quite.. passionate, about this in the past, myself included!).

For the purpose of naming, URIs may as well be "x" or "y" with no 
distinction between, for the purpose of dereferencing and creating HTTP 
services then you want to be considering the bigger picture, URI 
components and the network effect of how you deploy your service or data.

> This is the sort of thing it is useful to have some best practice emerge on.

Unsure if best practise is good, but certainly having some information 
for people which points out the uses of each uri component, the uses of 
uris for naming, and RESTful / network considerations would be v 
beneficial :)

> Or can anyone point us at where it is written?

Not as yet, but sounds like a good idea Hugh,

Best,

Nathan
Received on Friday, 4 February 2011 15:42:20 UTC