Re: hash in WebID - cachability problem

On 4 Dec 2012, at 13:13, Kingsley Idehen <kidehen@openlinksw.com> wrote:

> On 12/4/12 6:00 AM, Henry Story wrote:
>> On 3 Dec 2012, at 18:27, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>> 
>>> On 12/3/12 7:58 AM, Nathan wrote:
>>>> Kingsley Idehen wrote:
>>>>> On 12/3/12 5:56 AM, Henry Story wrote:
>>>>>> We still have to see what the issues are here. It seems that the cacheability problem pointed out by Nathan
>>>>>> would affect all JavaScript applications, not just tabulator
>>>>>> 
>>>>>> [[
>>>>>> 
>>>>>> RFC 2616 says:
>>>>>> "The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable."
>>>>>> 
>>>>>> HTTPBis says:
>>>>>> "A 303 response SHOULD NOT be cached unless it is indicated as cacheable by Cache-Control or Expires header fields."
>>>>>> 
>>>>>> Of course most tooling will not cache 303s as it's been a MUST NOT for 13+ years.
>>>>>> ]]
>>>>>> 
>>>>> You cache data.
>>>>> 
>>>>> User agents don't cache entity names.
>>>>> 
>>>>> Using DBpedia as an example, you don't cache <http://dbpedia.org/resource/Linked_Data>, you cache: <http://dbpedia.org/page/Linked_Data>, assuming the HTML representation of the entity description is what you are interested in. Same applies to all the other representations of the description of: <http://dbpedia.org/resource/Linked_Data> .
>>>> Correct, the issue people note, is that you still have to do a GET on <http://dbpedia.org/resource/Linked_Data> in order to get the 303 URI, that second URI can then be loaded from cache, so it involves at least one HTTP GET each time, even with caching.
>>>> 
>>>> 
>>> Depends on the apps Linked Data exploitation logic :-)
>> No it does not depend on the logic, this is an engineering reality, as I show below.
>> 
>>> An Entity Name in this case is just an indirect route to its description, same applies to the Entity Description Document Address. Thus, an application can make a decision about which "route to data" it works with.
>> Of course we know this. But that does not remove the problem that a TCP connection is expensive to build up and even making a request on an open connection requires a packet to go back and forth from the client to the server, which can require a few packet to go once around the world and back. So logically you may be right, be we are engineers here, and we have to take things like the speed of light into account.
> 
> As has been discussed many times, architecture and engineering are slightly different specializations. A spec is more about architecture. Implementation is about engineering. A spec should be about making optimization choices, that's for engineers to deal with.

You mean "A spec should not be about making optimisation choices..." presumably.

I am just pointing out the issues here, so that we can be clear about them. I am 
not taking sides on the spec. 

> 
>> 
>> The 303 solution requires each URI to be dereferenced once, even if they all end up being located in the same document.
> 
> Same thing with a # URI.

Not really, since you can have all the #uris in one document, and then you can do 1 GET and have all the definitions. If foaf had put them all in one document, that would have meant a 85x improvement in TCP connections. 

This does not necessarily have an impact on the #debate btw, which is why I am surprised you keep denying the obvious.

> 
>>  The foaf ontology for example has close to 85 definitions, which means one has to do 85 requests if one wants to be sure that each of these is really defined in the foaf spec. Now technologies such as SPDY might help a lot here, but those are in development.
>> 
>> In the case of a WebID protocol you have to dereference each URI to GET its meaning. For WebID it is very most likely that each WebIF is uniquely associated with the Profile, so that you are requiring 2 gets where one would do, since the second is usually not going to be cache. Slowing down authentication for large services quite dramatically, in a space where timing is of the essence.
>> 
>> Now as far as dbpedia goes amazingly enough they don't even cache the 303s. HttpBis requires cache control headers. I don't see any there:
>> 
>> $ curl -i http://dbpedia.org/resource/Berlin
>> HTTP/1.1 303 See Other
>> Date: Tue, 04 Dec 2012 10:40:23 GMT
>> Content-Type: text/html; charset=UTF-8
>> Content-Length: 0
>> Connection: keep-alive
>> Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc212-64  VDB
>> Accept-Ranges: bytes
>> Location: http://dbpedia.org/page/Berlin
>> 
>> 
>> If the dbpedia folks can't even get these basic engineering right, then what should one expect of most end users?
> 
> As per usual, no idea about the tone of your response, is it so difficult to stick with the thrust of the debate?

See below.

> Read: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
> 
> "The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable."
> 
> Hence:
> 
> curl -I http://dbpedia.org/page/Berlin
> HTTP/1.1 200 OK
> Date: Tue, 04 Dec 2012 12:08:00 GMT
> Content-Type: text/html; charset=UTF-8
> Content-Length: 957677
> Connection: keep-alive
> Vary: Accept-Encoding
> Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc212-64  VDB
> Accept-Ranges: bytes
> Expires: Tue, 11 Dec 2012 12:07:58 GMT


I know those URIs are cacheable. That was not the point of the discussion. We were speaking about the cacheability of what you call the entitiy uris, (e.g. http://dbpedia.org/resource/Berlin ) which HTTPBis allows:

HTTPBis says:
"A 303 response SHOULD NOT be cached unless it is indicated as cacheable by Cache-Control or Expires header fields."

I was just pointing out that DBPedia does not take that possibility into account, which it could. 
If you look at what I am saying, I am in fact giving you plenty of good ways to improve your arguments.

> 
>> 
>> And even if you did get it up to HTTPBis standards, we are still fighting two issues
>>  - browsers and all proxies need to cache these correctly
>>  - javascript fetching those resources needs to make these extra requests
>> 
>> Finally I think HTTPBis is still not a standard. So if we go by current standards, then we have
>> to accept that the current state of specs is that
>> 
>> "The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable."
> 
> You cache: http://dbpedia.org/page/Berlin . It's the address of the data. Hence the response header:
> 
> Location: http://dbpedia.org/page/Berlin

Again, you don't seem to be reading what I am writing. Let me repeat:

HTTPBis says:
"A 303 response SHOULD NOT be cached unless it is indicated as cacheable by Cache-Control or Expires header fields."

Your Berlin entity does not have a Cache-Control header. 


>> 
>>> The same issue arises in the ODBC or JDBC realms, I can use a connection string or a data source name to access data in an RDBMS via a compliant driver/provider/cartridge. My ODBC/JDBC application decides how this affects the kind of user interaction it seeks to deliver.
>> In JDBC and ODBC realms the database is usually on the same machine, or not far away from the machine that makes the query. On the Web requests can come from everywhere in the world. We are dealing here with a completely different space.
> 
> So you might think, and that's part of what you are failing to understand. They are conceptually similar, since you have data source connection strings (locators) and data source names both leading you to the same data (Tables, Views, or even Procedures masquerading as Views) in an RDBMS that might be anywhere on a LAN or WAN.

A Bicycle and a car are conceptually similar, they are both means of transportation. Engineering wise they are very different tools though. What applies to one does not apply to the other.

>> 
>>> This is one of those "horses for courses" issues that AWWW handles well, once application developers tap into it etc..
>> Well perhaps the DBPedia people could start by adding Cache Control headers and show the example. Also perhaps you could add SPDY to your service, and we could then see if that helps.
> 
> See my comment above. You don't seem to understand the difference between <http://dbpedia.org/resource/Linked_Data> and <http://dbpedia.org/page/Linked_Data> etc..

I understand it very well. Try to distinguish your interlocutors: I have been working in this
space for 8 years, I give courses on this, I write applications that deal with this. 

>> 
>> In any case we are dealing with large legacy issues of deployed browsers. If you think that Apple's KeyChain UI is a major issue worth mentioning on the wiki, then we have to take it that this is even a larger issue. So you have to choose here. If the Keychain is a major stumbling block, then even more so are existing and deployed caches and web browsers.
> 
> I am moving on, we aren't going to make any progress. This response is more to do with closing a loop.
> 
> Bye!

Instead of getting stuck on the Keychain argument - which any other standards group would throw out immediately - read my email more carefully, and you'll see I am giving you a lot of exits which can make your arguments a lot better, and also allow us here to make a lot better progress. You seem to be blocked because you think I am on one side of the #debate, which I am not. I would just like some consistency in your arguments. You cannot both argue that the Keychain is important because of legacy issues, and not take browsers, javascript and proxies into account. This *is* just simple logic.

If you want good arguments for your case follow my hint above on SPDY. We could make a lot more interesting progress there.

> 
> Kingsley
>> 
>>> 
>>> -- 
>>> 
>>> Regards,
>>> 
>>> Kingsley Idehen	
>>> Founder & CEO
>>> OpenLink Software
>>> Company Web: http://www.openlinksw.com
>>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>>> Twitter/Identi.ca handle: @kidehen
>>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>> 
>>> 
>>> 
>>> 
>>> 
>> Social Web Architect
>> http://bblfish.net/
>> 
> 
> 
> -- 
> 
> Regards,
> 
> Kingsley Idehen	
> Founder & CEO
> OpenLink Software
> Company Web: http://www.openlinksw.com
> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
> Twitter/Identi.ca handle: @kidehen
> Google+ Profile: https://plus.google.com/112399767740508618350/about
> LinkedIn Profile: http://www.linkedin.com/in/kidehen
> 
> 
> 
> 
> 

Social Web Architect
http://bblfish.net/

Received on Tuesday, 4 December 2012 13:06:44 UTC