Re: [Fwd: Reversing HTTP Range 14 and SemWeb Cool URIs decision] from Jonathan Rees on 2011-01-20 (public-awwsw@w3.org from January 2011)

From: Jonathan Rees <jar@creativecommons.org>
Date: Thu, 20 Jan 2011 18:42:03 -0500
To: Harry Halpin <hhalpin@w3.org>
Cc: nathan@webr3.org, AWWSW TF <public-awwsw@w3.org>
Message-ID: <AANLkTi=DMfPzk-LbMLLzVJu7nQ0rWDqOO26i_wkSKZjz@mail.gmail.com>
On Thu, Jan 20, 2011 at 5:54 PM, Harry Halpin <hhalpin@w3.org> wrote:
>> This is at least the third time the suggestion to change the
>> convention has come up in recent months. There's clearly a disconnect
>> of some kind.
>>
>> The issue is interoperability. If different agents interpret a URI in
>> incompatible ways, confusion will result. The httpRange-14 rule nudges
>> the community to interpret certain URIs in a particular way, removing
>> choice in the interest of consistency.
>
> Actually, I'm on the side of against 303s. It's a smart hack, but
> nonetheless a hack, and IMHO likely hurting adoption of Linked Data. I'm
> happy for people not to discuss this here, but I do think we should be
> worried about having an over-engineered and metaphysically murky solution
> when Microsoft is now out with a direct RDF competitor in the form of
> oData that requires no 303s or metaphysics.
>
> Why 303:?
>
> 1) It clearly separates URIs for "things" on the semantic web from their
> descriptions.
> 2) It's a common convention

and don't forget about web architecture.

> But...
>
> 1) It's doing in a status code that which could easily be done in some
> triples themselves accessed from the URI.
>
> 2) It causes excessive round-trips/redirects for no good reasons, which
> irritates webmasters and bots.

.host-meta ?

> 3) It is non-trivial to set up via Apache

Hmm.  More details please. (Not pushing back, really want to know.)
This has not been my experience.

> 4) And most webmasters can't even do that, so it makes just putting a RDF
> file on the Web a "bad thing" without access. Great for adoption :)
>
> 5) It's not a standard. It's on no W3C Recs. It's a W3C TAG finding, and
> primarily something Tim wrote in a design note that got put forth by SWEO
> as Note. So there's no process to go through to change it to begin with.

Yes there is. The TAG is obligated by charter to turn its findings and
resolutions into architectural recommendations. It could be on rec
track tomorrow, if anyone cared. It ought to be on the TAG's radar,
but I can't do it alone. Bring us a petition!

> 6) The only practical example of code failing by lack of using it involves
> "owl:sameAs" and OWL reasoning, and owl:sameAs is demonstrably mis-used
> anyways (see paper in ISWC 2010) and OWL reasoning of any kind is not used
> often with Linked Data to begin with.

So you're not worried about Dublin Core and CC metadata getting
attached to the wrong subject?

> I think 1) would win the game by itself. Thats why we made the IRW
> ontology, to show that it could be done outside status codes, and
> therefore one does not have to put metaphysics in HTTP status codes where
> it doesn't belong. But 1) combined with practical reasons like (2->4) and
> the fact that it's not a standard (5), makes me think that just like
> namespaces eventually got dropped due to lack of support by all but the
> few in the HTML5 DOM, 303s will eventually be dropped if open linked data
> wants to actually get used. It's unfortunate but likely. But I'd rather
> have that happen that the world not use RDF.
>
> And, I might add,  I _like_ 303. It's a very creative hack. I think it
> should be an *option*, but we need easier options to distinguish what we
> mean by URIs.
>
> And "#" is not much better. To be blunt, people forget to put it there
> when writing and cut and pasting URIs, and it's applied inconsistently
> across the XML and RDF universes. It puts a magic redirect in where one
> isn't necessary to begin with.

I hadn't heard these objections before.  Will put them in my pipe and
smoke them.  More concrete information would help.

> In essence: distinctions that can be done in formats should probably not
> be made on the protocol level. The protocol should tell you how to
> interpret the format.

I think the argument is that HREF="xxx" is a reference to xxx and that
it is a reference to a document (hypertext node, web page, pick one),
and similarly the HTTP protocol is a language and its URIs have no
more or less or different 'meaning' than URIs anywhere else. But this
only makes sense from the point of view of the specs; it will hardly
have any force for someone who doesn't already believe.

My question, nonpolemically, is what exactly is the follow-your-nose
story under any alternative to the httpRange-14 rule? httpRange-14 has
all sorts of flaws, but I'm not sure that the alternatives make any
more sense than it does. And I would think that a different rule would
lead to misinterpretation of large numbers of of CC license
assertions, something I have an interest in. That seems like a
problem.

And if those URIs did refer to something other than the web page, how
would you refer to the web page?... maybe IRW solves that.

I wish I could keep up with the mailing lists and IRC. Is anyone among
the semweb masses talking about using tdb: ?

Thanks
Jonathan

> Anyways, my two cents, and I'm happy to stop by and discuss IRW at some
> point.
>
>   cheers,
>         harry
>
>
>>
>> One answer to the engineers is: If you're doing this in the privacy of
>> your own world, it doesn't matter whether you use a 200 or 303 or 627
>> or # or ?. But for the semantic web you're advised to use # (or 303)
>> because that's the way the semantic web is designed. If you write RDF
>> that uses a 200-responding URI in some creative way, someone else
>> (e.g. an agent that either doesn't read, or doesn't believe, or
>> doesn't know how to process the content) could use the URI to refer to
>> the web page, and friction will result.
>>
>> I sort of understand why people think 303 is hard (but I have to take
>> it on faith since with the software I use it's easy). I really don't
>> understand why people think using # is a burden. (I understand Alan
>> R's objections to #, but they're of a very different nature.)
>>
>> As I've recently said to Ed Summers, if someone really thinks that
>> changing the convention is possible and necessary, they ought to go
>> through process, and talk to the groups responsible for the technology
>> that they are enjoying: the TAG, SWIG, SWCG, etc. Whinging about this
>> on some semantic web list undermines the idea of standards. Otherwise
>> - just deal with it.
>>
>> Complaining that httpRange-14 has never gone rec track (to receive
>> review) is absolutely legitimate - but no one has come to the TAG and
>> asked for it to. I've never heard anyone question the legitimacy of
>> the decision, only the wisdom of it.
>>
>> Anyhow... sorry, we shouldn't take up the debate on this list - we're
>> all fatigued by it. I think AWWSW's job might be:
>> 1. how would you refer to a web page, defensively, if you didn't know
>> how the receiving agent was going to interpret 200-yielding URIs
>> (httpRange-14 vs. whatever)?  (using terms from the awwsw vocabulary,
>> I would presume.)
>> 2. is there any practical reason to choose the httpRange-14 rule over
>> some other rule? (I believe there is - e.g. it gives an easy way to
>> write metadata and metadata is good.)
>> 3. advise the TAG on PR or other followup regarding the httpRange-14 rule
>>
>> Jonathan
>>
>> On Sun, Jan 16, 2011 at 6:18 PM, Nathan <nathan@webr3.org> wrote:
>>> fwd: A real world use case from Manu Sporny in reply to my earlier mail:
>>>
>>> -------- Original Message --------
>>> Subject: Reversing HTTP Range 14 and SemWeb Cool URIs decision
>>> Date: Sun, 16 Jan 2011 18:06:29 -0500
>>> From: Manu Sporny <msporny@digitalbazaar.com>
>>> To: Nathan Rixham <nathan@webr3.org>
>>>
>>> Hi all,
>>>
>>> Just wanted to follow up Nathan's post with a few more points. I pinged
>>> him about this very issue yesterday because I wanted to see if he had
>>> come to the same conclusion that we had after a number of years of
>>> dealing with the repercussions of the HTTP Range 14 decision and the
>>> solution offered by the Cool URIs for the Semantic Web[1] document.
>>>
>>> In general, I agree with Nathan's argumentation. The concepts outlined
>>> in [1] do make sense in many cases, but when implemented in a business
>>> environment, they lead to conceptual and engineering overhead that our
>>> developers are just not willing to accept. The Cool URIs for the
>>> Semantic Web document is needlessly restrictive. I'll explain the reason
>>> below.
>>>
>>> Here is some background on why this matters to our company, in
>>> particular:
>>>
>>> We are working on a world standard for Universal Web Payment that we
>>> hope to send through the W3C or IETF process at some point, it's called
>>> PaySwarm and you can find out more about it here:
>>>
>>> http://payswarm.com/
>>>
>>> PaySwarm is a decentralized payment system with multiple transaction
>>> processors (effectively, banks) that are capable of executing commercial
>>> transactions (think buying and selling digital goods, like webapp stores
>>> or access to blog content/entertainment). These transaction processors
>>> create legally enforce-able digital contracts between people and
>>> transfer money between financial accounts. A demonstration of the
>>> purchasing process can be found here:
>>>
>>> http://digitalbazaar.com/2010/09/12/payswarm-api/
>>>
>>> The end result of a purchase between a website operator and a customer
>>> is a Contract. The Contract contains a number of digital signatures,
>>> Assets (a description of what was purchased), a License, financial
>>> accounts, the asset provider, the purchaser and a number of other
>>> "things" that are all identified via URIs.
>>>
>>> We want these URIs to be very long lived - 10, 20 or even 50+ years.
>>>
>>> We are using HTML+RDFa to express semantics via the pages and JSON-LD[2]
>>> for the messaging between all the distributed parts of the system.
>>>
>>> You can see an example of what a financial transfer looks like in this
>>> system here (skip down to the Example):
>>>
>>> http://purl.org/commerce#transfer
>>>
>>> In other words, we have completely bought into the benefits that the
>>> Semantic Web can offer and we are building a decentralized financial
>>> system that takes advantage of those benefits.
>>>
>>> This brings us to the end of Nathan's postal code example:
>>>
>>> Nathan Rixham wrote:
>>>>
>>>> I don't see a problem with any of the above, neither does anybody in
>>>> this scenario afaict - can you see a problem?
>>>>
>>>> Now, this example isn't constrained technically, and nothing breaks -
>>>> however it does break current web arch rules because:
>>>>  - it doesn't use the 303 pattern
>>>>  - it fails to make any distinction between IR and NIR
>>>>  - it fails to make any distinction between primary and secondary
>>>>    resources
>>>>  - it names the same concept in one case as an IR, and in another as
>>>>    an NIR, although they're both the same thing
>>>>  - the cool uris, change (they're uncool)
>>>
>>> Currently, our identifiers for long-lived resources look like this:
>>>
>>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit
>>> https://mybank.example.com/accounts/3879279824#account
>>> https://payswarm.example.com/people/john-doe#public-key-5
>>>
>>> However, when these identifiers were presented to our development team,
>>> the first question that arose was:
>>>
>>> "Why do we need to have #deposit at the end of the deposit URL? The same
>>> for account? We've designed the system to be RESTful, why are we
>>> duplicating information in the URL?"
>>>
>>> That is, instead of this:
>>>
>>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit
>>>
>>> Why not this?
>>>
>>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28
>>>
>>> We are designing the system to return an HTML+RDFa representation of the
>>> resource above when a browser hits the resource. It will contain both
>>> human-readable information and semantic data that is specific to the
>>> deposit. We are not talking about a concept that doesn't exist on the
>>> web. The concept only exists on the web and it's at the URL listed
>>> above.
>>>
>>> Our engineers see no reason to perform a 303 re-direct and maintain 3
>>> URLs (identifier, HTML, and RDF) for each identified resource in our
>>> system. That is, implementing the solution doesn't seem to prevent any
>>> nasty technical issues from arising, except perhaps in reasoning agents
>>> that assume certain things about what's at the end of a URL (it's not
>>> always a document). Why the implementation burden when there isn't a
>>> clear technical benefit?
>>>
>>> We don't need to re-hash all of the HTTP Range 14 discussions, nor any
>>> of the many e-mails and blog posts discussing the topic at length. We've
>>> read many of those documents and don't see why Semantic Web applications
>>> can't be implemented with non-hash, non-303 URLs if they are designed
>>> properly. That is to say that hash URLs and 303 results are reasonable
>>> mechanisms for semantic web identification, but so is a clean URL that
>>> returns HTML+RDFa, like:
>>>
>>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28
>>>
>>> More to the point:
>>>
>>>>  - it doesn't use the 303 pattern
>>>
>>> The 303 pattern is very difficult for non-experts to implement correctly
>>> and many budgets do not allow for separate RDF documents and separate
>>> HTML documents. Our developers groan every time this solution is brought
>>> up, and for good reason. I can't find a good argument for why our
>>> company should spend the money to manage a URL space that is 3x as big
>>> for no clear technical benefit.
>>>
>>>>  - it fails to make any distinction between IR and NIR
>>>
>>> I continue to fail to see the relevant technical reason for the
>>> distinction. I understand the argumentation, but it is but one world
>>> view, and is largely philosophical. If one takes the position that
>>> "There is no such thing as a non-information resource on the Semantic
>>> Web", does a technical issue arise? It's all information and information
>>> is contextual, right?
>>>
>>>>  - it fails to make any distinction between primary and secondary
>>>>    resources
>>>
>>> In this particular case, it doesn't matter and largely simplifies the
>>> concepts put forth on the semantic web. This was a big point of
>>> contention among our developers - understanding that #fragment
>>> identifiers referred to one thing in a graphical browser (the @id
>>> attribute) and another thing in a semantic web application was a painful
>>> jump for them to make. It's not that they didn't get it - they just
>>> thought it was an awful hack and goes against the concepts of REST that
>>> they use when developing our systems. That is:
>>>
>>> You can't do this via HTTP 1.1:
>>>
>>> DELETE http://example.org/foo#bar
>>>
>>> and hope to only delete the information identified by #bar... you will
>>> most likely delete the entire foo resource instead. Contrast that with
>>> this:
>>>
>>> DELETE http://example.org/people/bob/public-keys/1
>>>
>>> You could delete just the information about key 1 using HTTP semantics
>>> above... it's much cleaner.
>>>
>>>>  - it names the same concept in one case as an IR, and in another as
>>>>    an NIR, although they're both the same thing
>>>
>>> Again, I hold that we shouldn't be making a distinction between an IR
>>> and a NIR when choosing semantic web identifiers. While there may be
>>> good uses for NIR (I honestly don't know of any), I don't think that the
>>> semantic web should use them as a part of its core architecture. If you
>>> use an identifier on the semantic web, it should be an IR, redirect to
>>> an IR (an alias, but only if the IR moved), or not exist at all.
>>>
>>>>  - the cool uris, change (they're uncool)
>>>
>>> Yes, we do all want Cool URIs, but there are cases where it is
>>> impossible to keep the URIs around forever. As we are all painfully
>>> aware of these days, banks fail - and merge. Just because my account is:
>>>
>>> http://blue-bank.example.com/accounts/28394723984
>>>
>>> today, doesn't mean that the website will exist tomorrow. Or that after
>>> the merge, it won't be:
>>>
>>> http://red-bank.example.com/accounts/afd-akop
>>>
>>> with a semantic web re-direct/alias from the first to the second. So,
>>> while we all want URIs that don't change - they will, and more often
>>> than we'd like.
>>>
>>> For financial systems like PaySwarm, especially because it will become
>>> an open standard, it is important that we have a clear solution on how
>>> to deal with these issues.
>>>
>>> My fear is that if we continue pushing [1] and the HTTP Range 14
>>> decision that developers will likely continue to view the Semantic Web
>>> as an over-engineered solution to a problem that they rarely care
>>> about... and I wouldn't blame them if they were to do that. We can't
>>> change developer workflow as drastically as the SemWeb Cool URIs
>>> proposes - especially when it is unnecessary to do so.
>>>
>>> Thoughts?
>>>
>>> -- manu
>>>
>>> [1] http://www.w3.org/TR/cooluris/
>>> [2] http://json-ld.org/
>>>
>>> --
>>> Manu Sporny (skype: msporny, twitter: manusporny)
>>> President/CEO - Digital Bazaar, Inc.
>>> blog: Linked Data in JSON
>>> http://digitalbazaar.com/2010/10/30/json-ld/
>>>
>>>
>>>
>>>
>>
>>
>
>
Received on Thursday, 20 January 2011 23:42:34 UTC