[Fwd: Reversing HTTP Range 14 and SemWeb Cool URIs decision] from Nathan on 2011-01-16 (public-awwsw@w3.org from January 2011)

From: Nathan <nathan@webr3.org>
Date: Sun, 16 Jan 2011 23:18:31 +0000
To: AWWSW TF <public-awwsw@w3.org>
Message-ID: <4D337CC7.4030401@webr3.org>
fwd: A real world use case from Manu Sporny in reply to my earlier mail:

-------- Original Message --------
Subject: Reversing HTTP Range 14 and SemWeb Cool URIs decision
Date: Sun, 16 Jan 2011 18:06:29 -0500
From: Manu Sporny <msporny@digitalbazaar.com>
To: Nathan Rixham <nathan@webr3.org>

Hi all,

Just wanted to follow up Nathan's post with a few more points. I pinged
him about this very issue yesterday because I wanted to see if he had
come to the same conclusion that we had after a number of years of
dealing with the repercussions of the HTTP Range 14 decision and the
solution offered by the Cool URIs for the Semantic Web[1] document.

In general, I agree with Nathan's argumentation. The concepts outlined
in [1] do make sense in many cases, but when implemented in a business
environment, they lead to conceptual and engineering overhead that our
developers are just not willing to accept. The Cool URIs for the
Semantic Web document is needlessly restrictive. I'll explain the reason
below.

Here is some background on why this matters to our company, in particular:

We are working on a world standard for Universal Web Payment that we
hope to send through the W3C or IETF process at some point, it's called
PaySwarm and you can find out more about it here:

http://payswarm.com/

PaySwarm is a decentralized payment system with multiple transaction
processors (effectively, banks) that are capable of executing commercial
transactions (think buying and selling digital goods, like webapp stores
or access to blog content/entertainment). These transaction processors
create legally enforce-able digital contracts between people and
transfer money between financial accounts. A demonstration of the
purchasing process can be found here:

http://digitalbazaar.com/2010/09/12/payswarm-api/

The end result of a purchase between a website operator and a customer
is a Contract. The Contract contains a number of digital signatures,
Assets (a description of what was purchased), a License, financial
accounts, the asset provider, the purchaser and a number of other
"things" that are all identified via URIs.

We want these URIs to be very long lived - 10, 20 or even 50+ years.

We are using HTML+RDFa to express semantics via the pages and JSON-LD[2]
for the messaging between all the distributed parts of the system.

You can see an example of what a financial transfer looks like in this
system here (skip down to the Example):

http://purl.org/commerce#transfer

In other words, we have completely bought into the benefits that the
Semantic Web can offer and we are building a decentralized financial
system that takes advantage of those benefits.

This brings us to the end of Nathan's postal code example:

Nathan Rixham wrote:
> I don't see a problem with any of the above, neither does anybody in
> this scenario afaict - can you see a problem?
>
> Now, this example isn't constrained technically, and nothing breaks -
> however it does break current web arch rules because:
>   - it doesn't use the 303 pattern
>   - it fails to make any distinction between IR and NIR
>   - it fails to make any distinction between primary and secondary
>     resources
>   - it names the same concept in one case as an IR, and in another as
>     an NIR, although they're both the same thing
>   - the cool uris, change (they're uncool)

Currently, our identifiers for long-lived resources look like this:

https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit
https://mybank.example.com/accounts/3879279824#account
https://payswarm.example.com/people/john-doe#public-key-5

However, when these identifiers were presented to our development team,
the first question that arose was:

"Why do we need to have #deposit at the end of the deposit URL? The same
for account? We've designed the system to be RESTful, why are we
duplicating information in the URL?"

That is, instead of this:

https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit

Why not this?

https://mybank.example.com/deposits/7f7f62ab53cc623cd28

We are designing the system to return an HTML+RDFa representation of the
resource above when a browser hits the resource. It will contain both
human-readable information and semantic data that is specific to the
deposit. We are not talking about a concept that doesn't exist on the
web. The concept only exists on the web and it's at the URL listed above.

Our engineers see no reason to perform a 303 re-direct and maintain 3
URLs (identifier, HTML, and RDF) for each identified resource in our
system. That is, implementing the solution doesn't seem to prevent any
nasty technical issues from arising, except perhaps in reasoning agents
that assume certain things about what's at the end of a URL (it's not
always a document). Why the implementation burden when there isn't a
clear technical benefit?

We don't need to re-hash all of the HTTP Range 14 discussions, nor any
of the many e-mails and blog posts discussing the topic at length. We've
read many of those documents and don't see why Semantic Web applications
can't be implemented with non-hash, non-303 URLs if they are designed
properly. That is to say that hash URLs and 303 results are reasonable
mechanisms for semantic web identification, but so is a clean URL that
returns HTML+RDFa, like:

https://mybank.example.com/deposits/7f7f62ab53cc623cd28

More to the point:

>   - it doesn't use the 303 pattern

The 303 pattern is very difficult for non-experts to implement correctly
and many budgets do not allow for separate RDF documents and separate
HTML documents. Our developers groan every time this solution is brought
up, and for good reason. I can't find a good argument for why our
company should spend the money to manage a URL space that is 3x as big
for no clear technical benefit.

>   - it fails to make any distinction between IR and NIR

I continue to fail to see the relevant technical reason for the
distinction. I understand the argumentation, but it is but one world
view, and is largely philosophical. If one takes the position that
"There is no such thing as a non-information resource on the Semantic
Web", does a technical issue arise? It's all information and information
is contextual, right?

>   - it fails to make any distinction between primary and secondary
>     resources

In this particular case, it doesn't matter and largely simplifies the
concepts put forth on the semantic web. This was a big point of
contention among our developers - understanding that #fragment
identifiers referred to one thing in a graphical browser (the @id
attribute) and another thing in a semantic web application was a painful
jump for them to make. It's not that they didn't get it - they just
thought it was an awful hack and goes against the concepts of REST that
they use when developing our systems. That is:

You can't do this via HTTP 1.1:

DELETE http://example.org/foo#bar

and hope to only delete the information identified by #bar... you will
most likely delete the entire foo resource instead. Contrast that with this:

DELETE http://example.org/people/bob/public-keys/1

You could delete just the information about key 1 using HTTP semantics
above... it's much cleaner.

>   - it names the same concept in one case as an IR, and in another as
>     an NIR, although they're both the same thing

Again, I hold that we shouldn't be making a distinction between an IR
and a NIR when choosing semantic web identifiers. While there may be
good uses for NIR (I honestly don't know of any), I don't think that the
semantic web should use them as a part of its core architecture. If you
use an identifier on the semantic web, it should be an IR, redirect to
an IR (an alias, but only if the IR moved), or not exist at all.

>   - the cool uris, change (they're uncool)

Yes, we do all want Cool URIs, but there are cases where it is
impossible to keep the URIs around forever. As we are all painfully
aware of these days, banks fail - and merge. Just because my account is:

http://blue-bank.example.com/accounts/28394723984

today, doesn't mean that the website will exist tomorrow. Or that after
the merge, it won't be:

http://red-bank.example.com/accounts/afd-akop

with a semantic web re-direct/alias from the first to the second. So,
while we all want URIs that don't change - they will, and more often
than we'd like.

For financial systems like PaySwarm, especially because it will become
an open standard, it is important that we have a clear solution on how
to deal with these issues.

My fear is that if we continue pushing [1] and the HTTP Range 14
decision that developers will likely continue to view the Semantic Web
as an over-engineered solution to a problem that they rarely care
about... and I wouldn't blame them if they were to do that. We can't
change developer workflow as drastically as the SemWeb Cool URIs
proposes - especially when it is unnecessary to do so.

Thoughts?

-- manu

[1] http://www.w3.org/TR/cooluris/
[2] http://json-ld.org/

-- 
Manu Sporny (skype: msporny, twitter: manusporny)
President/CEO - Digital Bazaar, Inc.
blog: Linked Data in JSON
http://digitalbazaar.com/2010/10/30/json-ld/
Received on Sunday, 16 January 2011 23:19:15 UTC