Personal comments on "Providing and Discovering Definitions of URIs"

Date: Mon, 27 Jun 2011 12:24:17 +0100
Hi Jonathan,

I'd like to submit some personal comments on the "Providing and
Discovering Definitions of URIs" document. These are comments on the
general approach and style of the document, rather than point by point
comments at this stage.

First of all I'm glad to see that the TAG is revisiting this area. I'm
also glad to see that there is effort to explore the solution space
and discuss various trade-offs. Unfortunately I don't think that this
current document achieves that.

I think my main concern is that the document is that the discussion
is, deliberately I know, approached in a vague and general way. While
I can see that the intention is to remain neutral I don't think it
helps anyone to engage with the material. I'd prefer to see a much
clearer run down of the general issues that people are facing, along
with a much clearer set of success criteria, before a more detailed
review of the actual proposals that have been surfaced.

One item that is completely glossed over is that, outside of the
Semantic Web community, no-one cares about this issue at all. There is
plenty of structured data being published from a number of sources,
and projects like Facebook Open Graph and Schema.org show how this
trend is moving towards a URI based approach. This is great and to be
encouraged. So how do we avoid making all of those efforts
automatically wrong?

Current issues such as problems with bookmarking; difficulty of
serving # URI based data from triple-stores; lack of support in web
applications; efficiency; etc. should have greater prominence in the

I think it'd also be useful to elaborate further on some of the
success criteria. I'm surprised by wording of criteria 6 ("A URI
should have a single agreed meaning globally...). I don't think that
the AWWW document states this, but instead recommends that a URI
should have a single meaning and is used consistently by its

The document notes that it may not be possible to meet all of those
criteria, but gives no indication of their relevant importance. My
feeling is that the results of previous discussions on this topic have
prioritised architectural correctness over and above all of the other
issues, which has lead to the current situation. This is something to

The discussion of various approaches would benefit greatly from
reference to actual concrete implementations. Using actual
implementations as illustrations might help ground further discussion.
It would be nice to see less bias in discussion of alternatives,
particularly to recommendation 5.3. In places the document rejects
assertions, and yet makes several sweeping claims itself. A more
evidence based approach will yield to clearer discussion of what, at
times, is already a heated debate.

There's also seems to be an assumption in the document that a *single*
approach will win out. That's worth reflecting on in itself. We
already have a "mixed economy" in how people are publishing data.

Perhaps it would be better looking at this issue as a set of
architectural patterns, with their own strengths and weaknesses, and
instead provide guidance on how to choose an approach?

Coming from a REST point of view, I don't see the web as being divided
into IRs and NIRs. I see a set of abstract resources which I can
interact with to obtain representations. A publisher decides what a
URI denotes and I am able to interoperate with them better if we agree
on those definitions. HTTP and RDF place some constraints on what, and
how, I can make statements about those resources and the
representations that are returned. Specific patterns for publishing
data can help us interoperate, others can make it more difficult.

As a publisher of data and documents, I may be willing to trade-off my
ability to make statements about, e.g. licensing and provenance, of my
URI descriptions, if it becomes much easier for me to just publish
that data. There are ways that I could still surface some of that
information using other techniques (e.g. link headers, etc). Similarly
as a consumer I may be willing to trade-off "correctness" in my client
code in order to ensure that I can get data from as many sources as

I think the TAG ought to be working towards providing guidance for
both publishers and consumers that illustrates the benefits of a
"higher fidelity" approach to data publishing, but without proscribing
simpler practices.



