Re: naive question: why prefer absolute URIs to # URIs for linked data? from Tim Berners-Lee on 2011-09-02 (www-tag@w3.org from September 2011)

From: Tim Berners-Lee <timbl@w3.org>
Date: Fri, 2 Sep 2011 15:41:36 -0400
To: Ian Davis <me@iandavis.com>
Cc: Jonathan Rees <jar@creativecommons.org>, Harry Halpin <hhalpin@ibiblio.org>, Manu Sporny <msporny@digitalbazaar.com>, www-tag@w3.org
Message-Id: <97CDCA3F-5482-4504-A67C-C4D80B692313@w3.org>
On 2011-08 -31, at 16:52, Ian Davis wrote:

> I think there are a number of contributing factors:
> 
> [... see my previous red herring message]
> 
> 2) Fragments are not sent to the server when they are dereferenced which means the server has to guess what information to send.
> 

Well, this is an architectural decision. There are two valid architectures.

1.

There is a very common architecture in which the document 
is a well-defined unit.  In business, there are documents like 
catalogs, order, delivery notes, invoices, and checks, which have
specific provenance, trust, and role in various protocols.
If I want to refer to a line on an invoice, then it is reasonable
to get back the whole invoice, as the graph in the invoice
is a considered set of triples, which were issued as  a message 
on a specific date, by specific author, and which only make sense together.
Business protocols operate in terms of these documents,
and the integrity of them, and the ability to express data about
them is crucial.

In law, similarly, it is rare that an abstract concept is typically
defined by reference to a particular document -- an act or regulation.
The Act is the the unit of information, it has (as invoices do) references
to others, but it has well-defined bounds, and provenance and
metadata. [1]

Many systems are built so that the documents while they get big
they are constrained not to be massive, as they are the units of
transport and everything is hunky-dory.

So in these systems, it isn't a question of the server having to 
guess what information to send. It is the publisher, it knows
what information goes in a document.  It publishes
various documents, some about similar things, and when people
quote a URI they quote it to another person knowing what sort of info it will return.
So will I say "Hi, I am <http://www.3.org/People/Berners-Lee/card#i>"
I am using an identifier which specifically refers to me as on my business card.
That's useful.   One way of looking at it is that there  document-based system has
been defined, and the design of that system determines what
goes into a document, what the server sends, and what the client expects.

2.

There is another architecture where there is no concept of a document.
When in the semantic web we have aggregated 
large amounts of data and you are running a query service behind which there is
a large aggregation of data, and all manner of shapes of graphs.
We find there is no typical size of node, in fact precisely the whole graph becomes scale free.
There is no natural division of the data into documents.
In these the operation of a GET on an item is not so well defined.

Indeed, you ask, rightly, how should the server know what information to 
send if I ask not for a document, but for a node?
In these systems it is natural to not use a hash, after all, there is no document,
and so no document URI.
In these systems we currently use 303.  (I wish we had a 209 or something as 303 is a terrible
waste of roundtrips). Of course one can access thee things using SPARQL, which resolves the question.
but suppose a client doesn't know a lot about the graph, and just wants to ask 
about the item itself?

In these systems, though, problem that the server doesn't know what to send has not 
gone away, it has reappeared in a different form. The server,looking at a random
node in a random graph, has to guess what the clients wants to know.
This is the same as the SPARQL DESCRIBE problem of course.
In fact, in many cases, giving the client the immediate arcs but recursively
including those on unidentified bnodes will actually result in a graph of reasonable size.
My favorite describe algorithm. 
But you probably want to have a limit on that in a real system containing arbitrary graphs.

Now technically on the web, you 
can use hash URIs fine here, where they are of the form  /id13498579#it
and the document /id13498579 contains the data just about id13498579#it

So there is a virtual document, the result of doing
a SPARQL DESCRIBE id13498579#it query, whose URI is  /id13498579 .
This is actually useful as even if the server doesn't have a document concept,
other people do and someone can annotate it to say whether they trust it, etc.
I assume the only issue with that is it looks ugly. 


So *my* summary of why people don't like hash URIs would be that while 
for the first sort of system they seem natural, for the second they look ugly. 
  Which is not an insignificant issue!! It makes code which generates them more complicated.  It makes it more difficult for people to remember, and so on.



> If you're storing data for that URI in a database you have to key it against the hashless version of the URI along with all other URIs that share that hashless part.

No, you can just generate the hash uri of the form table6/id-821374#it where the #it is a
constant addition. You don't have to store the relationship in your table.
 

> Also the server can't log accesses to the full URI which means you don't get accurate analytics.
> 

With the #it method above it can as there is a 1-1 correspondence.

> 3) You can't use HTTP headers or status codes to refer to a hash URI. For example you can't 404 a hash URI or redirect it.

With the #it method above it can as there is a 1-1 correspondence.
> 
> 4) The role of the fragment is changing in modern web development practice. Its becoming a bearer of state and/or part of the interaction architecture of an application. See #! URLs or javascript techniques for tabbed pages.

The fact that people are building on the fragid in one way doesn't mean we shouldn't 
also build on it in this way.

> 
> Ian
> 


Tim

[1] "The residential status of a person is decided under two different Acts, one under Income Tax Act, 1961, ( I.T. Act) and another under Foreign Exchange Regulation Act, 1973 (FERA). The concept of Non-Resident under FERA is different as compared to that under Income Tax Act. Under Income Tax Act, the residential status of a person is determined on the basis of number of days he stays in India whereas under FERA, it is the intention of a person to be in India or outside India would be an important factor determining his residential status." - http://www.vakilno1.com/nri/taxation/definitions.htm
Received on Friday, 2 September 2011 19:41:41 UTC