Hash vs. Slash: What Is Identified?

To hash or to slash... here are some things to consider. This is also
me doing my bit for rdfms-fragments [1].

Abstract: The question is whether or not "namespaces" on the Semantic
Web should end with a hash "#", or a slash "/" (or even a question
mark "?"). For example, the RDF Schema terms are all in the
"namespace":-

   http://www.w3.org/2000/01/rdf-schema#

wheras the Dublin Core terms are in:-

   http://purl.org/dc/elements/1.1/

The difference? Terms with the former will necessarily be
URI-references, that is, a URI with a FragmentID on the end, whereas
terms with the latter will most likely be URIs.

TimBL's view is, apparently, that of the hashist. Quotable notables:-

[[[
It is important, on the Semantic Web, to be clear about what is
identified. An http: URI (without fragment identifier) necessarily
identifies a generic document. This is because the HTTP server
response about a URI can deleiver a rendition of (or location of, or
apologies for) a document which is identified by the URI requested. A
client which understands the http: protocol can immediately conclude
that the fragementid-less URI is a generic document. This is true even
if the publisher (owner of the DNS name) has decided not to run a
server. Even if it just records the fact that the document is not
available online, still a client knows it refers to a document. This
means that identifiers for arbitrary RDF concepts should have fragment
identifiers. This, in turn, means that RDF namespaces should end with
"#".
]]] - http://www.w3.org/DesignIssues/Fragment

[[[
20:48:31 <timbl> The HTTP implicitly says that http; URIs idenifiy
works - abstract generic documents.
[...]
20:53:55 <timbl> Anything which uses the same URI for a document about
Dn and for Dan himself.
20:54:12 <timbl> You can't use an HTTP URI for Dan because HTTp can't
return you Dan.
[...]
20:54:43 <timbl> Now, if we had a  "277  Here is some stuff ABOUT what
you wanted" then we could have abstarct things.
[...]
21:12:49 <timbl> danbri: 1) if i trust any document which talks about
...#Person I will know thinsg about it and 2) if I do a HTTp GET then
I go through a process which incldues the media type and I end up with
more (rather definitive in this cae) infromation about ...#Person.
[...]
21:14:20 <timbl> To do this properly, RDF needs its won content type.
because text/xml fragids refer to parts of a document, not to an
abstract object described by th document.
[...]
21:21:15 <timbl> Now, when the document is XML, then you have to look
to the namespace to find out what the thing means.
[...]
21:22:41 <timbl> because o this limitation, publishers cannot allocate
URIs with fragment identifiers which could be construed difefrently
for documents for whcih they support content negotiation.
]]] - http://ilrt.org/discovery/chatlogs/rdfig/2001-03-31.txt

A rather clear view unfolds...

[[[
The HTTP spec provides a whole protocol for giveing representations of
documents.  You can't change a few words and make it a protocol for
getting information about abstract things described by documents.
[...]
The problem with  the worldnet "Logo" URI ( .../Logo) is that it
actually does identify, usefully, a document.  We still need the URI
for that document.

Fortunately, the fragment ID allows us to refer to something defined
or described by the document, and that can be quite abstract.
[...]
Documents are documents. They are powerful because (with HTTP and slew
of existing and future languages) we can do a whole lot with them. We
can argue about their contents logically. I don't mind the semantic
web architecture being built on a infrastructure of documents ((and
messages)).
]]] -
http://lists.w3.org/Archives/Public/www-rdf-interest/2001Nov/0182

So, from this POV, we can conclude:-

1) Resources identified with HTTP URIs are necessarily "documents"
2) Concepts can be described by these documents. Content negotiation
worries are a known issue
3) In XML, the namespace of the language is inherently bound to
defining what the FragIDs mean

On the "content negotiation" worries, DanBri is often wary about it:-

[[[
20:55:30 <danbri> Aside from that, there is the architectural quirk
that HTTP content negotiation, lang neg etc allow the same URI to be
associated (in complex ways) with 'concrette' documents at various
stages of their lifecycle.
[...]
21:12:48 <danbri> ie. when you're not in a retrieval contenxt,
_something_ else determines what #foo means.
]]] - http://ilrt.org/discovery/chatlogs/rdfig/2001-03-31.txt

[[[
'#' is a downright broken bit of web architecture. The '#'
fragment/view semantics are defined as being relative to the mime type
of the object. Since mime types can be content-negotiated, that's
hairy since a single URI plus '#' doesn't mean much without additional
assumptions about mime types. For example,
http://www.w3.org/Icons/WWW/w3c_main has both GIF and PNG mime-typed
variants. So the semantics of http://www.w3.org/Icons/WWW/w3c_main#foo
can't be considered outside the context of some HTTP transaction,
since the mime type of the resource isn't an instrinsic property of
the resource identified.
]]] -
http://lists.w3.org/Archives/Public/www-rdf-interest/2000Mar/0028

Indeed, I have tried to sort this subject out myself:-

[[[
Another fact that comes into play is that persistence across FragID
space should also be maintained. When we derference a URI in one
browser, we expect the dereferenced result to be similar in
functionality to the same URI having been dereferenced in another
browser. We tend to get upset when it isn't [...]. This is similar to
the fact that if we have different ACCEPT headers in our HTTP
requests, we still expect resources that are similar in functionality,
and I suggest that that functionality extends to FragIDs representing
more or less the same "thing", notwithstanding the fact that the
interpretation of a FragID depends upon the amount of information
available, and that this clearly depends upon the method of
derferencing, and the context.
]]] - http://lists.w3.org/Archives/Public/uri/2001Oct/0019

Aaron has argued strongly on many occasions that HTTP resources can be
anything. His namespaces are stadfastly "#" less, and he has even
argued that URI-views don't belong in RDF. I asked him about
rdfms-fragments:-

[[[
01:43:07 <AaronSw> well you know my argument... it's just not a
resource
01:43:20 <sbp> It is according to RDF M&S
01:43:26 <AaronSw> pffft
[...]
01:44:26 <sbp> if you start using HTTP resources as RDF terms, you
lose a way to address the HTTP resource as a network retrievable
entity
01:44:39 <sbp> case in point: your logicerror.com stuff
[...]
01:45:47 <AaronSw> Well that needs to be sorted out, but # is not the
solution.
01:46:04 <AaronSw> the network retrievable entity never had the URI
01:46:15 <sbp> Precisely, it needs to be sorted out. HTTP URIs can't
identify two different resources. So, either "#" or "urn"
01:46:15 <AaronSw> HTTP is very clear on this: a URI represents a
Resource, which can be anything
01:46:25 <AaronSw> the server just sends back a bag of bits which is
somehow a resource.
01:46:30 <AaronSw> err related to the resource
[...]
01:51:17 <sbp> as TimBL said, you can't ask for Dan over HTTP
01:51:51 <AaronSw> Of course not!
01:52:14 <AaronSw> what's your point
01:52:30 <sbp> you can't identify Dan in HTTP space
01:52:41 <sbp> because he's not there
01:52:45 <AaronSw> That's not true.
01:53:03 <AaronSw> HTTP identifies resources, of any sort.
01:53:12 <AaronSw> But it can't return resources, no protocol can.
01:53:31 <AaronSw> Amazon is an implementation of the isbn: scheme
[...]
02:00:02 <sbp> Point me to the acceptable return code for a query on
Dan
02:00:11 <sbp> http://www.w3.org/People/Connolly/Dan
02:00:34 <AaronSw> 200
02:00:49 <AaronSw> 200 OK
02:01:26 <sbp> 10.2.1 200 OK
02:01:27 <sbp>    The request has succeeded. The information returned
with the response
02:01:27 <sbp>    is dependent on the method used in the request, for
example:
02:01:27 <sbp>    GET    an entity corresponding to the requested
resource is sent in
02:01:27 <sbp>           the response;
02:01:33 <sbp> what entity should be returned?
02:01:49 <sbp> is has to be a representation of Dan. That's impossible
02:02:05 <AaronSw> A page that states, the URI you have requested
represents Dan Connolly and gives a description of him, etc.
02:02:08 <AaronSw> a representation?
02:02:12 <AaronSw> it doesn't say that!
02:02:21 <AaronSw> it says: "an entity corresponding to the requested
resource"
02:02:28 <AaronSw> that's what's being returned, my friend
02:02:58 <sbp> Well, I'm very unhappy about it. I'd say that it's
information *about* Dan, not Dan himself
02:03:34 <AaronSw> Well it corresponds to the resource, no?
02:03:39 <AaronSw> I mean, I don't want to give you Dan.
02:03:47 <AaronSw> Perhaps I should send you a Not Authorized?
02:03:58 <AaronSw> Don't take away my Danny-boy!
02:05:22 <sbp> I think you're twisting what HTTP should be able to
do... it's a hypertext transfer protocol: transferring data suitable
for HyperText systems. That's just data, MIME an' all
02:05:35 <AaronSw> I never contradicted that.
02:05:42 <AaronSw> I don't disagree.
02:06:00 <sbp> er... so you agree?
02:06:11 <AaronSw> Yes, I agree that's what HTTP Is supposed to do.
02:06:22 <sbp> * sbp wonders why you didn't save a few characters
02:06:26 <AaronSw> But there's also a social contract, of sorts,
involved.
02:06:36 <sbp> It's a very weak social contract
02:06:39 <AaronSw> If I request a resource, I want something related
back.
02:06:52 <AaronSw> Otherwise URIs wouldn't be very useful.
]]] - http://blogspace.com/swhack/chatlogs/2001-08-03.txt

As you can tell, my opinion switches from one way to the other
depending upon the weather... but Aaron remains quite set.

Onto Roy Fielding's stuff:-

[[[
In any case, since there is nothing that cannot be identified by an
http URL, including the notion of "nothing" should someone be inclined
to dedicate an identifier to it, I just cannot understand why this
question keeps being raised.  It is an identifier, pure and simple,
and has all the mathematical properties of any other symbolic
identifier. Enumerating those properties in every spec is a hopeless
waste of time.
]]] - http://lists.w3.org/Archives/Public/uri/2001Nov/0027

[[[
> That's like saying that, because a 'mailto:' URI is a URI and
> URI's can identify anything, I can use a 'mailto:' URI to
> denote an abstract concept ...

Yes, you can.  It is just an identifier.  A variable.  A mathematical
symbol described by a sequence of characters in a syntax defined
by the first part of that string leading up to the colon character.
]]] - http://lists.w3.org/Archives/Public/uri/2001Nov/0044

Speaks for itself. Roy and Mark Baker have done a lot of work talking
about REST, which seems pertinent to the whole discussion. Mark's
view?

[[[
Earlier you suggested that "brilliance" was abstract, yet I happen to
have a URI for it here;

http://www.markbaker.ca/2001/11/Brilliance/
]]] - http://lists.w3.org/Archives/Public/uri/2001Nov/0038

At this point, we can quite easily go around in circles. The questions
still remain, and are:-

* What do HTTP URIs necessarily identify. What the the semantics of
these resources, and how do they differ from the broad set of all
resources that may be denoted by a URI
* What do fragment IDs identify, how do they relate to the concepts of
"resource" in both the URI and RDF senses of the word

I would argue that the definition of "resource" is consistent across
both the URI and RDF specifications. RFC 2396:-

[[[
Resource
A resource can be anything that has identity.  Familiar examples
include an electronic document, an image, a service (e.g., "today's
weather report for Los Angeles"), and a collection of other resources.
Not all resources are network "retrievable"; e.g., human beings,
corporations, and bound books in a library can also be considered
resources.

The resource is the conceptual mapping to an entity or set of
entities, not necessarily the entity which corresponds to that mapping
at any particular instance in time.  Thus, a resource can remain
constant even when its content---the entities to which it currently
corresponds---changes over time, provided that the conceptual mapping
is not changed in the process.
]]] - http://www.ietf.org/rfc/rfc2396.txt

RDF M&S:-

[[[
Resources
All things being described by RDF expressions are called resources. A
resource may be an entire Web page; such as the HTML document
"http://www.w3.org/Overview.html" for example. A resource may be a
part of a Web page; e.g. a specific HTML or XML element within the
document source. A resource may also be a whole collection of pages;
e.g. an entire Web site. A resource may also be an object that is not
directly accessible via the Web; e.g. a printed book. Resources are
always named by URIs plus optional anchor ids (see [URI]). Anything
can have a URI; the extensibility of URIs allows the introduction of
identifiers for any entity imaginable.
]]] - http://www.w3.org/TR/REC-rdf-syntax/

Because URIs can denote anything with identity, then it follows that
*whatever* URI references + FragID (i.e. with hash) denote, they
denote a subClassOf the resources that URIs can denote (which may be
equivalent). It's not a difficult piece of logic to grasp. This is
what RFC 2396 has to say about URI references:-

[[[
The semantics of a fragment identifier is a property of the data
resulting from a retrieval action, regardless of the type of URI used
in the reference.  Therefore, the format and interpretation of
fragment identifiers is dependent on the media type [RFC2046] of the
retrieval result.  The character restrictions described in Section 2
for URI also apply to the fragment in a URI-reference.  Individual
media types may define additional restrictions or structure within the
fragment for specifying different types of "partial views" that can be
identified within that media type.

A fragment identifier is only meaningful when a URI reference is
intended for retrieval and the result of that retrieval is a document
for which the identified fragment is consistently defined.
]]] - http://www.ietf.org/rfc/rfc2396.txt

Also of interest is the "Are URI-References bound to resources?"
thread on uri@w3.org. Here's a sample excerpt (Roy):-

[[[
[...] how is access control assigned to "things" on the Web.  By the
resource.  Is it possible to define separate access control to
different fragments of the same resource?  No.  Therefore, a fragment
is not a resource until it is bound as some other URI by a naming
authority that can control access to the fragments as separate,
identifiable resources.  Because if we decided the other way -- that a
fragment was a resource too -- then we'd have to define a new term for
that subset of "old-style resources" that were actually subject to the
Web behavioral model.

The same logic applies to many other aspects of the Web design beyond
access control.  That's why I separated the definition of resource and
representation, and hence why REST is an acronym for representational
state transfer.  I needed to do that for HTTP/1.1 because the old
model just didn't fit things like CGI, Apache modules, and URN
indirection.
]]] - http://lists.w3.org/Archives/Public/uri/2001May/0021

DanC:-

[[[
That's one view. The RDF spec takes another view. Note that the view
taken by TimBL's cwm.py code is this "a fragment is not a resource"
view; he's taken the view that it's too confusing to have two
different definitions of 'resource' around, and that the RDF specs (or
at least: his RDF code) should use 'Thing' for the general case of
something that's either an RFC2396-resource or
something-identified-by-an-absolute-URI-with-fragment-identifier.
]]] - http://lists.w3.org/Archives/Public/uri/2001May/0024

Still, the question for me remains the wholly practical one: when I
create a new namespace, should I end it with a hash or a slash? I
don't feel that we're coming any closer to answering that question,
and that is quite saddening.

Flip a coin, perhaps?

[1] http://www.w3.org/2000/03/rdf-tracking/#rdfms-fragments

--
Kindest Regards,
Sean B. Palmer
@prefix : <http://webns.net/roughterms/> .
:Sean :hasHomepage <http://purl.org/net/sbp/> .

Received on Wednesday, 28 November 2001 00:05:41 UTC