Re: hash in WebID - cachability problem from Kingsley Idehen on 2012-12-04 (public-webid@w3.org from December 2012)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Tue, 04 Dec 2012 07:13:52 -0500
To: public-webid@w3.org
Message-ID: <50BDE900.1050605@openlinksw.com>
On 12/4/12 6:00 AM, Henry Story wrote:
> On 3 Dec 2012, at 18:27, Kingsley Idehen <kidehen@openlinksw.com> wrote:
>
>> On 12/3/12 7:58 AM, Nathan wrote:
>>> Kingsley Idehen wrote:
>>>> On 12/3/12 5:56 AM, Henry Story wrote:
>>>>> We still have to see what the issues are here. It seems that the cacheability problem pointed out by Nathan
>>>>> would affect all JavaScript applications, not just tabulator
>>>>>
>>>>> [[
>>>>>
>>>>> RFC 2616 says:
>>>>> "The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable."
>>>>>
>>>>> HTTPBis says:
>>>>> "A 303 response SHOULD NOT be cached unless it is indicated as cacheable by Cache-Control or Expires header fields."
>>>>>
>>>>> Of course most tooling will not cache 303s as it's been a MUST NOT for 13+ years.
>>>>> ]]
>>>>>
>>>> You cache data.
>>>>
>>>> User agents don't cache entity names.
>>>>
>>>> Using DBpedia as an example, you don't cache <http://dbpedia.org/resource/Linked_Data>, you cache: <http://dbpedia.org/page/Linked_Data>, assuming the HTML representation of the entity description is what you are interested in. Same applies to all the other representations of the description of: <http://dbpedia.org/resource/Linked_Data> .
>>> Correct, the issue people note, is that you still have to do a GET on <http://dbpedia.org/resource/Linked_Data> in order to get the 303 URI, that second URI can then be loaded from cache, so it involves at least one HTTP GET each time, even with caching.
>>>
>>>
>> Depends on the apps Linked Data exploitation logic :-)
> No it does not depend on the logic, this is an engineering reality, as I show below.
>
>> An Entity Name in this case is just an indirect route to its description, same applies to the Entity Description Document Address. Thus, an application can make a decision about which "route to data" it works with.
> Of course we know this. But that does not remove the problem that a TCP connection is expensive to build up and even making a request on an open connection requires a packet to go back and forth from the client to the server, which can require a few packet to go once around the world and back. So logically you may be right, be we are engineers here, and we have to take things like the speed of light into account.

As has been discussed many times, architecture and engineering are 
slightly different specializations. A spec is more about architecture. 
Implementation is about engineering. A spec should be about making 
optimization choices, that's for engineers to deal with.

>
> The 303 solution requires each URI to be dereferenced once, even if they all end up being located in the same document.

Same thing with a # URI.

>   The foaf ontology for example has close to 85 definitions, which means one has to do 85 requests if one wants to be sure that each of these is really defined in the foaf spec. Now technologies such as SPDY might help a lot here, but those are in development.
>
> In the case of a WebID protocol you have to dereference each URI to GET its meaning. For WebID it is very most likely that each WebIF is uniquely associated with the Profile, so that you are requiring 2 gets where one would do, since the second is usually not going to be cache. Slowing down authentication for large services quite dramatically, in a space where timing is of the essence.
>
> Now as far as dbpedia goes amazingly enough they don't even cache the 303s. HttpBis requires cache control headers. I don't see any there:
>
> $ curl -i http://dbpedia.org/resource/Berlin
> HTTP/1.1 303 See Other
> Date: Tue, 04 Dec 2012 10:40:23 GMT
> Content-Type: text/html; charset=UTF-8
> Content-Length: 0
> Connection: keep-alive
> Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc212-64  VDB
> Accept-Ranges: bytes
> Location: http://dbpedia.org/page/Berlin
>
>
> If the dbpedia folks can't even get these basic engineering right, then what should one expect of most end users?

As per usual, no idea about the tone of your response, is it so 
difficult to stick with the thrust of the debate?

Read: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html

"The 303 response MUST NOT be cached, but the response to the second 
(redirected) request might be cacheable."

Hence:

curl -I http://dbpedia.org/page/Berlin
HTTP/1.1 200 OK
Date: Tue, 04 Dec 2012 12:08:00 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 957677
Connection: keep-alive
Vary: Accept-Encoding
Server: Virtuoso/06.04.3132 (Linux) x86_64-generic-linux-glibc212-64  VDB
Accept-Ranges: bytes
Expires: Tue, 11 Dec 2012 12:07:58 GMT


>
> And even if you did get it up to HTTPBis standards, we are still fighting two issues
>   - browsers and all proxies need to cache these correctly
>   - javascript fetching those resources needs to make these extra requests
>
> Finally I think HTTPBis is still not a standard. So if we go by current standards, then we have
> to accept that the current state of specs is that
>
> "The 303 response MUST NOT be cached, but the response to the second (redirected) request might be cacheable."

You cache: http://dbpedia.org/page/Berlin . It's the address of the 
data. Hence the response header:

Location: http://dbpedia.org/page/Berlin
>
>> The same issue arises in the ODBC or JDBC realms, I can use a connection string or a data source name to access data in an RDBMS via a compliant driver/provider/cartridge. My ODBC/JDBC application decides how this affects the kind of user interaction it seeks to deliver.
> In JDBC and ODBC realms the database is usually on the same machine, or not far away from the machine that makes the query. On the Web requests can come from everywhere in the world. We are dealing here with a completely different space.

So you might think, and that's part of what you are failing to 
understand. They are conceptually similar, since you have data source 
connection strings (locators) and data source names both leading you to 
the same data (Tables, Views, or even Procedures masquerading as Views) 
in an RDBMS that might be anywhere on a LAN or WAN.
>
>> This is one of those "horses for courses" issues that AWWW handles well, once application developers tap into it etc..
> Well perhaps the DBPedia people could start by adding Cache Control headers and show the example. Also perhaps you could add SPDY to your service, and we could then see if that helps.

See my comment above. You don't seem to understand the difference 
between <http://dbpedia.org/resource/Linked_Data> and 
<http://dbpedia.org/page/Linked_Data> etc..
>
> In any case we are dealing with large legacy issues of deployed browsers. If you think that Apple's KeyChain UI is a major issue worth mentioning on the wiki, then we have to take it that this is even a larger issue. So you have to choose here. If the Keychain is a major stumbling block, then even more so are existing and deployed caches and web browsers.

I am moving on, we aren't going to make any progress. This response is 
more to do with closing a loop.

Bye!

Kingsley
>
>>
>> -- 
>>
>> Regards,
>>
>> Kingsley Idehen	
>> Founder & CEO
>> OpenLink Software
>> Company Web: http://www.openlinksw.com
>> Personal Weblog: http://www.openlinksw.com/blog/~kidehen
>> Twitter/Identi.ca handle: @kidehen
>> Google+ Profile: https://plus.google.com/112399767740508618350/about
>> LinkedIn Profile: http://www.linkedin.com/in/kidehen
>>
>>
>>
>>
>>
> Social Web Architect
> http://bblfish.net/
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Tuesday, 4 December 2012 12:15:33 UTC