Re: [Fwd: Reversing HTTP Range 14 and SemWeb Cool URIs decision] from Harry Halpin on 2011-01-20 (public-awwsw@w3.org from January 2011)

From: Harry Halpin <hhalpin@w3.org>
Date: Thu, 20 Jan 2011 22:54:23 -0000 (GMT)
To: "Jonathan Rees" <jar@creativecommons.org>
Cc: nathan@webr3.org, "AWWSW TF" <public-awwsw@w3.org>
Message-ID: <a409ad8fb4a66a04bece871db91da793.squirrel@webmail-mit.w3.org>
> This is at least the third time the suggestion to change the
> convention has come up in recent months. There's clearly a disconnect
> of some kind.
>
> The issue is interoperability. If different agents interpret a URI in
> incompatible ways, confusion will result. The httpRange-14 rule nudges
> the community to interpret certain URIs in a particular way, removing
> choice in the interest of consistency.

Actually, I'm on the side of against 303s. It's a smart hack, but
nonetheless a hack, and IMHO likely hurting adoption of Linked Data. I'm
happy for people not to discuss this here, but I do think we should be
worried about having an over-engineered and metaphysically murky solution
when Microsoft is now out with a direct RDF competitor in the form of
oData that requires no 303s or metaphysics.

Why 303:?

1) It clearly separates URIs for "things" on the semantic web from their
descriptions.
2) It's a common convention

But...

1) It's doing in a status code that which could easily be done in some
triples themselves accessed from the URI.

2) It causes excessive round-trips/redirects for no good reasons, which
irritates webmasters and bots.

3) It is non-trivial to set up via Apache

4) And most webmasters can't even do that, so it makes just putting a RDF
file on the Web a "bad thing" without access. Great for adoption :)

5) It's not a standard. It's on no W3C Recs. It's a W3C TAG finding, and
primarily something Tim wrote in a design note that got put forth by SWEO
as Note. So there's no process to go through to change it to begin with.

6) The only practical example of code failing by lack of using it involves
"owl:sameAs" and OWL reasoning, and owl:sameAs is demonstrably mis-used
anyways (see paper in ISWC 2010) and OWL reasoning of any kind is not used
often with Linked Data to begin with.

I think 1) would win the game by itself. Thats why we made the IRW
ontology, to show that it could be done outside status codes, and
therefore one does not have to put metaphysics in HTTP status codes where
it doesn't belong. But 1) combined with practical reasons like (2->4) and
the fact that it's not a standard (5), makes me think that just like
namespaces eventually got dropped due to lack of support by all but the
few in the HTML5 DOM, 303s will eventually be dropped if open linked data
wants to actually get used. It's unfortunate but likely. But I'd rather
have that happen that the world not use RDF.

And, I might add,  I _like_ 303. It's a very creative hack. I think it
should be an *option*, but we need easier options to distinguish what we
mean by URIs.

And "#" is not much better. To be blunt, people forget to put it there
when writing and cut and pasting URIs, and it's applied inconsistently
across the XML and RDF universes. It puts a magic redirect in where one
isn't necessary to begin with.

In essence: distinctions that can be done in formats should probably not
be made on the protocol level. The protocol should tell you how to
interpret the format.

Anyways, my two cents, and I'm happy to stop by and discuss IRW at some
point.

   cheers,
         harry


>
> One answer to the engineers is: If you're doing this in the privacy of
> your own world, it doesn't matter whether you use a 200 or 303 or 627
> or # or ?. But for the semantic web you're advised to use # (or 303)
> because that's the way the semantic web is designed. If you write RDF
> that uses a 200-responding URI in some creative way, someone else
> (e.g. an agent that either doesn't read, or doesn't believe, or
> doesn't know how to process the content) could use the URI to refer to
> the web page, and friction will result.
>
> I sort of understand why people think 303 is hard (but I have to take
> it on faith since with the software I use it's easy). I really don't
> understand why people think using # is a burden. (I understand Alan
> R's objections to #, but they're of a very different nature.)
>
> As I've recently said to Ed Summers, if someone really thinks that
> changing the convention is possible and necessary, they ought to go
> through process, and talk to the groups responsible for the technology
> that they are enjoying: the TAG, SWIG, SWCG, etc. Whinging about this
> on some semantic web list undermines the idea of standards. Otherwise
> - just deal with it.
>
> Complaining that httpRange-14 has never gone rec track (to receive
> review) is absolutely legitimate - but no one has come to the TAG and
> asked for it to. I've never heard anyone question the legitimacy of
> the decision, only the wisdom of it.
>
> Anyhow... sorry, we shouldn't take up the debate on this list - we're
> all fatigued by it. I think AWWSW's job might be:
> 1. how would you refer to a web page, defensively, if you didn't know
> how the receiving agent was going to interpret 200-yielding URIs
> (httpRange-14 vs. whatever)?  (using terms from the awwsw vocabulary,
> I would presume.)
> 2. is there any practical reason to choose the httpRange-14 rule over
> some other rule? (I believe there is - e.g. it gives an easy way to
> write metadata and metadata is good.)
> 3. advise the TAG on PR or other followup regarding the httpRange-14 rule
>
> Jonathan
>
> On Sun, Jan 16, 2011 at 6:18 PM, Nathan <nathan@webr3.org> wrote:
>> fwd: A real world use case from Manu Sporny in reply to my earlier mail:
>>
>> -------- Original Message --------
>> Subject: Reversing HTTP Range 14 and SemWeb Cool URIs decision
>> Date: Sun, 16 Jan 2011 18:06:29 -0500
>> From: Manu Sporny <msporny@digitalbazaar.com>
>> To: Nathan Rixham <nathan@webr3.org>
>>
>> Hi all,
>>
>> Just wanted to follow up Nathan's post with a few more points. I pinged
>> him about this very issue yesterday because I wanted to see if he had
>> come to the same conclusion that we had after a number of years of
>> dealing with the repercussions of the HTTP Range 14 decision and the
>> solution offered by the Cool URIs for the Semantic Web[1] document.
>>
>> In general, I agree with Nathan's argumentation. The concepts outlined
>> in [1] do make sense in many cases, but when implemented in a business
>> environment, they lead to conceptual and engineering overhead that our
>> developers are just not willing to accept. The Cool URIs for the
>> Semantic Web document is needlessly restrictive. I'll explain the reason
>> below.
>>
>> Here is some background on why this matters to our company, in
>> particular:
>>
>> We are working on a world standard for Universal Web Payment that we
>> hope to send through the W3C or IETF process at some point, it's called
>> PaySwarm and you can find out more about it here:
>>
>> http://payswarm.com/
>>
>> PaySwarm is a decentralized payment system with multiple transaction
>> processors (effectively, banks) that are capable of executing commercial
>> transactions (think buying and selling digital goods, like webapp stores
>> or access to blog content/entertainment). These transaction processors
>> create legally enforce-able digital contracts between people and
>> transfer money between financial accounts. A demonstration of the
>> purchasing process can be found here:
>>
>> http://digitalbazaar.com/2010/09/12/payswarm-api/
>>
>> The end result of a purchase between a website operator and a customer
>> is a Contract. The Contract contains a number of digital signatures,
>> Assets (a description of what was purchased), a License, financial
>> accounts, the asset provider, the purchaser and a number of other
>> "things" that are all identified via URIs.
>>
>> We want these URIs to be very long lived - 10, 20 or even 50+ years.
>>
>> We are using HTML+RDFa to express semantics via the pages and JSON-LD[2]
>> for the messaging between all the distributed parts of the system.
>>
>> You can see an example of what a financial transfer looks like in this
>> system here (skip down to the Example):
>>
>> http://purl.org/commerce#transfer
>>
>> In other words, we have completely bought into the benefits that the
>> Semantic Web can offer and we are building a decentralized financial
>> system that takes advantage of those benefits.
>>
>> This brings us to the end of Nathan's postal code example:
>>
>> Nathan Rixham wrote:
>>>
>>> I don't see a problem with any of the above, neither does anybody in
>>> this scenario afaict - can you see a problem?
>>>
>>> Now, this example isn't constrained technically, and nothing breaks -
>>> however it does break current web arch rules because:
>>>  - it doesn't use the 303 pattern
>>>  - it fails to make any distinction between IR and NIR
>>>  - it fails to make any distinction between primary and secondary
>>>    resources
>>>  - it names the same concept in one case as an IR, and in another as
>>>    an NIR, although they're both the same thing
>>>  - the cool uris, change (they're uncool)
>>
>> Currently, our identifiers for long-lived resources look like this:
>>
>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit
>> https://mybank.example.com/accounts/3879279824#account
>> https://payswarm.example.com/people/john-doe#public-key-5
>>
>> However, when these identifiers were presented to our development team,
>> the first question that arose was:
>>
>> "Why do we need to have #deposit at the end of the deposit URL? The same
>> for account? We've designed the system to be RESTful, why are we
>> duplicating information in the URL?"
>>
>> That is, instead of this:
>>
>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28#deposit
>>
>> Why not this?
>>
>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28
>>
>> We are designing the system to return an HTML+RDFa representation of the
>> resource above when a browser hits the resource. It will contain both
>> human-readable information and semantic data that is specific to the
>> deposit. We are not talking about a concept that doesn't exist on the
>> web. The concept only exists on the web and it's at the URL listed
>> above.
>>
>> Our engineers see no reason to perform a 303 re-direct and maintain 3
>> URLs (identifier, HTML, and RDF) for each identified resource in our
>> system. That is, implementing the solution doesn't seem to prevent any
>> nasty technical issues from arising, except perhaps in reasoning agents
>> that assume certain things about what's at the end of a URL (it's not
>> always a document). Why the implementation burden when there isn't a
>> clear technical benefit?
>>
>> We don't need to re-hash all of the HTTP Range 14 discussions, nor any
>> of the many e-mails and blog posts discussing the topic at length. We've
>> read many of those documents and don't see why Semantic Web applications
>> can't be implemented with non-hash, non-303 URLs if they are designed
>> properly. That is to say that hash URLs and 303 results are reasonable
>> mechanisms for semantic web identification, but so is a clean URL that
>> returns HTML+RDFa, like:
>>
>> https://mybank.example.com/deposits/7f7f62ab53cc623cd28
>>
>> More to the point:
>>
>>>  - it doesn't use the 303 pattern
>>
>> The 303 pattern is very difficult for non-experts to implement correctly
>> and many budgets do not allow for separate RDF documents and separate
>> HTML documents. Our developers groan every time this solution is brought
>> up, and for good reason. I can't find a good argument for why our
>> company should spend the money to manage a URL space that is 3x as big
>> for no clear technical benefit.
>>
>>>  - it fails to make any distinction between IR and NIR
>>
>> I continue to fail to see the relevant technical reason for the
>> distinction. I understand the argumentation, but it is but one world
>> view, and is largely philosophical. If one takes the position that
>> "There is no such thing as a non-information resource on the Semantic
>> Web", does a technical issue arise? It's all information and information
>> is contextual, right?
>>
>>>  - it fails to make any distinction between primary and secondary
>>>    resources
>>
>> In this particular case, it doesn't matter and largely simplifies the
>> concepts put forth on the semantic web. This was a big point of
>> contention among our developers - understanding that #fragment
>> identifiers referred to one thing in a graphical browser (the @id
>> attribute) and another thing in a semantic web application was a painful
>> jump for them to make. It's not that they didn't get it - they just
>> thought it was an awful hack and goes against the concepts of REST that
>> they use when developing our systems. That is:
>>
>> You can't do this via HTTP 1.1:
>>
>> DELETE http://example.org/foo#bar
>>
>> and hope to only delete the information identified by #bar... you will
>> most likely delete the entire foo resource instead. Contrast that with
>> this:
>>
>> DELETE http://example.org/people/bob/public-keys/1
>>
>> You could delete just the information about key 1 using HTTP semantics
>> above... it's much cleaner.
>>
>>>  - it names the same concept in one case as an IR, and in another as
>>>    an NIR, although they're both the same thing
>>
>> Again, I hold that we shouldn't be making a distinction between an IR
>> and a NIR when choosing semantic web identifiers. While there may be
>> good uses for NIR (I honestly don't know of any), I don't think that the
>> semantic web should use them as a part of its core architecture. If you
>> use an identifier on the semantic web, it should be an IR, redirect to
>> an IR (an alias, but only if the IR moved), or not exist at all.
>>
>>>  - the cool uris, change (they're uncool)
>>
>> Yes, we do all want Cool URIs, but there are cases where it is
>> impossible to keep the URIs around forever. As we are all painfully
>> aware of these days, banks fail - and merge. Just because my account is:
>>
>> http://blue-bank.example.com/accounts/28394723984
>>
>> today, doesn't mean that the website will exist tomorrow. Or that after
>> the merge, it won't be:
>>
>> http://red-bank.example.com/accounts/afd-akop
>>
>> with a semantic web re-direct/alias from the first to the second. So,
>> while we all want URIs that don't change - they will, and more often
>> than we'd like.
>>
>> For financial systems like PaySwarm, especially because it will become
>> an open standard, it is important that we have a clear solution on how
>> to deal with these issues.
>>
>> My fear is that if we continue pushing [1] and the HTTP Range 14
>> decision that developers will likely continue to view the Semantic Web
>> as an over-engineered solution to a problem that they rarely care
>> about... and I wouldn't blame them if they were to do that. We can't
>> change developer workflow as drastically as the SemWeb Cool URIs
>> proposes - especially when it is unnecessary to do so.
>>
>> Thoughts?
>>
>> -- manu
>>
>> [1] http://www.w3.org/TR/cooluris/
>> [2] http://json-ld.org/
>>
>> --
>> Manu Sporny (skype: msporny, twitter: manusporny)
>> President/CEO - Digital Bazaar, Inc.
>> blog: Linked Data in JSON
>> http://digitalbazaar.com/2010/10/30/json-ld/
>>
>>
>>
>>
>
>
Received on Thursday, 20 January 2011 22:54:27 UTC