Re: Personal comments on "Providing and Discovering Definitions of URIs" from Jonathan Rees on 2012-02-01 (www-tag@w3.org from February 2012)

From: Jonathan Rees <rees@mumble.net>
Date: Wed, 1 Feb 2012 12:10:28 -0500
To: Leigh Dodds <leigh.dodds@talis.com>
Cc: www-tag@w3.org
Message-Id: <0E6CF0B9-B1EE-426C-B8DB-26DD90BEDA5A@mumble.net>
Thanks again for your review. I've revised the document based on Dave Reynolds's comments, and am now turning to yours. I hope I have started to address some of your concerns in the latest draft , but I think if the document is going to get better I will need some more help from you. Comments inline.

Latest ongoing: http://www.w3.org/2001/tag/awwsw/issue57/latest/
Latest dated version as of this email: http://www.w3.org/2001/tag/awwsw/issue57/20120130/ 
Tracker: ISSUE-57

On Jun 27, 2011, at 7:24 AM, Leigh Dodds wrote:

> Hi Jonathan,
> 
> I'd like to submit some personal comments on the "Providing and
> Discovering Definitions of URIs" document. These are comments on the
> general approach and style of the document, rather than point by point
> comments at this stage.
> 
> First of all I'm glad to see that the TAG is revisiting this area. I'm
> also glad to see that there is effort to explore the solution space
> and discuss various trade-offs. Unfortunately I don't think that this
> current document achieves that.
> 
> I think my main concern is that the document is that the discussion
> is, deliberately I know, approached in a vague and general way. While
> I can see that the intention is to remain neutral I don't think it
> helps anyone to engage with the material. I'd prefer to see a much
> clearer run down of the general issues that people are facing, along
> with a much clearer set of success criteria, before a more detailed
> review of the actual proposals that have been surfaced.

I've expanded the list, but I can't say whether it will be engaging enough. What, in your opinion, are the biggest issues and success criteria?

> One item that is completely glossed over is that, outside of the
> Semantic Web community, no-one cares about this issue at all. There is
> plenty of structured data being published from a number of sources,
> and projects like Facebook Open Graph and Schema.org show how this
> trend is moving towards a URI based approach. This is great and to be
> encouraged. So how do we avoid making all of those efforts
> automatically wrong?

Through a consensus process, I would say. I hope to get someone involved who is watching out for these interests.

I have made an effort to remove bias toward the status quo, and would appreciate hearing if it remains.

> Current issues such as problems with bookmarking; difficulty of
> serving # URI based data from triple-stores;

What are you referring to re triple stores? I've been trying to understand the case against hash URIs, and have come up with very little (assuming you observe the few-fragids-per-stem pattern). Since I'm not immersed in the pragmatics I really need to hear from other people on this (and thanks to Ian Davis, Harry, Manu, et al. for what they've provided so far, which has made the presentation more fair I think).

> lack of support in web
> applications;

Examples? Again, I have no concrete information on this, and could really use it, since otherwise it's hard to make a balanced presentation.

> efficiency; etc. should have greater prominence in the
> document
> 
> I think it'd also be useful to elaborate further on some of the
> success criteria. I'm surprised by wording of criteria 6 ("A URI
> should have a single agreed meaning globally...). I don't think that
> the AWWW document states this, but instead recommends that a URI
> should have a single meaning and is used consistently by its
> publisher?

Since no one can agree on what the success criteria are I have renamed them "desiderata".

You're right about the "uniformity" point being incompletely articulated and attributed; I will try to expand on it (the primary material is RFC 3986, not AWWW). But it doesn't help if a publisher uses a URI uniformly, if others don't use it the same way, i.e. interoperably. This is why 'uniformity' across contexts and communities is sought. One way to get this is by saying the publisher always gets some kind of "authority", but then you have to be able to agree on that, on when they're saying it and how, on what are acceptable channels, and so on - so some kind of consensus is still essential.

I will at least interject the word "interoperability" into the discussion of uniformity. Anyhow, this is a desideratum that is expressed by only some parties, just like all the others, and if it's unachievable, or we decide not to go for it, so be it.

> The document notes that it may not be possible to meet all of those
> criteria, but gives no indication of their relevant importance. My
> feeling is that the results of previous discussions on this topic have
> prioritised architectural correctness over and above all of the other
> issues, which has lead to the current situation. This is something to
> revisit.

I think the important thing is for parties to know what the rules are, so that they can communicate clearly, but planning for the future (which is what I see is the role of architecture) also has some weight. But I don't see much conflict in this case, since the received architecture is based on pragmatics - we may just end up deciding that we need a different (correct) architecture. Not sure how to treat this in the document, so unless you can be more specific I think I can't act on this advice through a change in the document. Hopefully this issue, of which architecture is best, will come out as alternatives are weighed.

> The discussion of various approaches would benefit greatly from
> reference to actual concrete implementations. Using actual
> implementations as illustrations might help ground further discussion.
> It would be nice to see less bias in discussion of alternatives,
> particularly to recommendation 5.3. In places the document rejects
> assertions, and yet makes several sweeping claims itself. A more
> evidence based approach will yield to clearer discussion of what, at
> times, is already a heated debate.

OK - can you suggest some things I should reference? I've linked to one of Ian's blog posts more prominently this time around. I have made an effort to reduce the number of sweeping claims, and have removed many, so if you find any in more recent versions (check the 'latest' link), I would love to hear from you.

> There's also seems to be an assumption in the document that a *single*
> approach will win out. That's worth reflecting on in itself. We
> already have a "mixed economy" in how people are publishing data.

I have tried to clarify the difference between a "method" and an overall "solution", i.e. that approaches are not exclusive unless they conflict. Thanks

> Perhaps it would be better looking at this issue as a set of
> architectural patterns, with their own strengths and weaknesses, and
> instead provide guidance on how to choose an approach?

The important point is interoperability, so any consensus statement (solution) will list all methods that we have all decided are to be used going forward. If others arise in the future that's fine as long as they don't conflict - I have said in the UDDP draft that there is no assumption either of completeness or of authority.

> Coming from a REST point of view, I don't see the web as being divided
> into IRs and NIRs.

I don't either, and I (and TimBL, and others) have repeatedly emphasized this point. Remember that 303 does *not* mean not an information resource (NIR), and the Crossref example exploits this to great success. But let's not get into this again now... the documents ought to speak for themselves.

> I see a set of abstract resources which I can
> interact with to obtain representations. A publisher decides what a
> URI denotes and I am able to interoperate with them better if we agree
> on those definitions. HTTP and RDF place some constraints on what, and
> how, I can make statements about those resources and the
> representations that are returned. Specific patterns for publishing
> data can help us interoperate, others can make it more difficult.
> 
> As a publisher of data and documents, I may be willing to trade-off my
> ability to make statements about, e.g. licensing and provenance, of my
> URI descriptions, if it becomes much easier for me to just publish
> that data. There are ways that I could still surface some of that
> information using other techniques (e.g. link headers, etc). Similarly
> as a consumer I may be willing to trade-off "correctness" in my client
> code in order to ensure that I can get data from as many sources as
> possible.
> 
> I think the TAG ought to be working towards providing guidance for
> both publishers and consumers that illustrates the benefits of a
> "higher fidelity" approach to data publishing, but without proscribing
> simpler practices.

This isn't about publishers talking to themselves and their friends, it's about consensus agreement on patterns of communication, which means a user who has no previous relationship with the publisher (OR someone else who happens to be using their URIs) ought to be able to encounter their RDF (or others') and have a way to understand it. If the same behavior means different things to different publishers and clients there will be confusion. I have no particular interest in any particular solution at this point, I just want to end the pain, and the only way to do that is through a consensus statement.

I think the conflict is mainly about how to preserve the ability to write metadata and be generally understood, and I have said this elsewhere; if no one used URIs for metadata (for describing documents, images, etc. that are on the web) there would not be as much of a problem. I have been trying to separate the coverage of the issue into separate documents in a suite, and perhaps this is not working so well. I'll see what I can do to copy some of this motivation into this document, or at least leave pointers behind.

Sorry to push so much back at you, but my time on this is limited and I can use all the help I can get.

Best
Jonathan

> Cheers,
> 
> L.
> 
> -- 
> Leigh Dodds
> Programme Manager, Talis Platform
> Mobile: 07850 928381
> http://kasabi.com
> http://talis.com
> 
> Talis Systems Ltd
> 43 Temple Row
> Birmingham
> B2 5LS
>
Received on Wednesday, 1 February 2012 17:11:12 UTC