Re: URIs in data primer draft updated & httpRange-14 background from David Booth on 2013-03-20 (www-tag@w3.org from March 2013)

From: David Booth <david@dbooth.org>
Date: Tue, 19 Mar 2013 23:19:46 -0400
To: Jeni Tennison <jeni@jenitennison.com>
CC: "www-tag@w3.org List" <www-tag@w3.org>
Message-ID: <51492AD2.3000101@dbooth.org>
Hi Jeni,

I just noticed that this is on the agenda for tomorrow's TAG meeting, so 
I thought I had better make comments now on
http://www.w3.org/2001/tag/doc/uris-in-data-2013-03-07/

1.  This is nicely written and presents a very good example.  Kudos!

2.  It serves a good purpose in suggesting ways that people can indicate 
a distinction between a thing and its description.

3. The document tries to do two different things:

   - It suggests certain RDF properties that data publishers can use to 
indicate to data consumers whether a URI in that data will dereference 
to the thing that the URI denotes, versus dereferencing to a description 
of the thing that it denotes.

  - It vaguely describes a protocol between data authors/publishers, URI 
owners and data consumers, for coordinating the provision and use of URI 
definitions, roughly along the lines of
http://www.w3.org/wiki/UriDefinitionDiscoveryProtocol

The first of these goals is achieved very nicely, and I believe the 
document should be slightly retitled to more tightly convey this goal, 
and should focus only on this goal.

There are multiple problems with attempting to achieve the second goal 
in this document:

  - It is an independent goal, and would be better addressed in its own 
document.

  - It is very mushy as written.  It does not rise to the level of 
precision that a protocol specification needs.  For this reason, it 
would be harmful to publish as is, as it would simply create more 
confusion rather than adding clarity.  I *do* think it is a worthy goal 
(and would be happy to help work on it, as I am sure that you and others 
are aware that I have devoted a great deal of time and thought to 
figuring out these issues), but it belongs in its own document.

I imagine that there are those who would claim that it is okay for this 
part of the document to be mushy, in the belief that such a protocol is 
impossible or impractical or whatever.  I firmly disagree.  But 
regardless, that is a question that should be decided on its own merit, 
rather than by publishing a mushy spec under the *assumption* that it 
nothing better could be done.

4. There is an important omission in the Note of Section 3, which 
reminds the user of the importance of using different URIs for a 
different things:
[[
If the URI http://photo.example.com/psd/12345 supported content 
negotiation such that a request with Accept: text/html provided an HTML 
page but a request with Accept: image/jpeg returned the image, the URI 
is being used to identify two distinct resources: the image and the 
landing page. As discussed in The Architecture of the World Wide Web 
[WEBARCH], this pattern should be avoided: different resources should be 
named with different URIs.
]]

What this omission fails to acknowledge is that it is useful and even 
*recommended* to use a single URI for different resources, when in fact 
those resources have something in common at a more abstract level.  This 
indeed is the purpose of content negotiation.  It allows a single URI to 
abstractly identify a work that may be rendered and served in different 
languages or different media types.

The problem with this omission is that it covers up an inconvenient 
iceberg of truth that underlies this issue: there is no objective way to 
say whether two resources should or should not be viewed as the same 
resource at a more abstract level.  Indeed, Markus Lanthaler touched on 
this when he noted that a JPEG image and an HTML document could indeed 
identify the same resource:
http://www.w3.org/mid/007701ce1dbe$64ccd3e0$2e667ba0$@lanthaler@gmx.net
As humans, we often naively assume that our own subjective notions of 
what should or should not be considered different resources are based on 
objective, absolute distinctions.  After all, "it would be insane" to 
use a differently colored div for each pixel in an HTML document, as 
Markus notes. But they are not.  They are based on implicit, subjective 
assumptions that may not hold for a particular URI owner.

Of course, the flip side of this iceberg is the problem of ambiguity: it 
is generally impossible to be completely unambiguous about the resource 
that a URI identifies.  Fortunately, URIs can be unambiguous enough to 
be useful for many applications, even while being ambiguous for other 
applications.  As you know, I've discussed this quite a lot over the 
years, so I won't go further into that at present.

Returning to the Note in Section 3, I would suggest explicitly 
acknowledging that there can be different viewpoints about what 
constitutes the same or different resources, and architecturally it is 
up to the URI owner to decide whether two resources are the same 
resource (at a more abstract level) or different.  This would align with 
the AWWW's existing guidance on the meaning of a URI containing a 
fragment identifier, when different media types are served via content 
negotiation.  As AWWW section 3.2.2 states:
http://www.w3.org/TR/webarch/#p137
"The representation provider decides when definitions of fragment 
identifier semantics are are sufficiently consistent."
Ultimately, the decision about whether two resources need to be 
considered the same or different depends on the applications that will 
use them, as different distinctions matter to different applications.

Perhaps it would be enough to add a sentence at the end of the Note in 
Section 3, roughly along these lines: "However, it is up to the data 
publisher to decide whether these resources are similar enough to be 
considered the same at a more abstract level, in which case the URI that 
identifies is really identifying that abstract resource.

5.  Regarding the JSON examples, although the JSON and Turtle examples 
are asserted to be equivalent, JSON, when interpreted as a serialization 
of RDF, is not self-describing
http://www.w3.org/2001/tag/doc/selfDescribingDocuments.html
because a recipient knowing only that it was JSON would not know the 
conventions for interpreting it as an RDF serialization.  Section 2 does 
mention JSON-LD, which hopefully will become a full-fledged RDF 
serialization of RDF (though last I knew it was at risk of becoming a 
competing language), in which case JSON-LD would be self-describing, 
because of its JSON-LD media type.  My suggestion: make clear that the 
examples are JSON-LD (and pray that JSON-LD is standardized to be an RDF 
serialization) and thus should be delivered with a JSON-LD media type -- 
not merely generic JSON.

6. It may be helpful to have the examples in both JSON-LD and Turtle 
throughout, because of the different audiences for this document.  I 
don't feel strongly about this though.  It's a judgement call.

Finally, on the meta level, I would politely suggest that you and other 
members of the TAG be more active in attempting to include me in such 
work in the future.  Given how much time and thought I have put into 
these topics over the years -- as I am sure you are aware -- and given 
how important it is to reach community consensus, it is disappointing 
that you and others in the TAG did not reach out to include me.

Best wishes,
David

On 03/07/2013 06:01 PM, Jeni Tennison wrote:
> Hi,
>
> I have updated the draft of the "URIs in Data Primer" which is here:
>
> http://www.w3.org/2001/tag/doc/uris-in-data-2013-03-07/
>
> I have included within a "Background" Appendix a summary of where we
> are with this work and a bit of how we got here:
>
> http://www.w3.org/2001/tag/doc/uris-in-data-2013-03-07/#background
>
> Noah, I have also updated the product page and there is a new version
> at:
>
> http://www.w3.org/2001/tag/products/defininguris-2013-03-07.html
>
> Cheers,
>
> Jeni
>
Received on Wednesday, 20 March 2013 03:20:14 UTC