[Bug 12569] "Resource" Package Support from bugzilla@jessica.w3.org on 2011-04-30 (public-html-bugzilla@w3.org from April 2011)

From: <bugzilla@jessica.w3.org>
Date: Sat, 30 Apr 2011 06:12:11 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1QG3PX-00012P-A0@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=12569

--- Comment #5 from Wojciech Hlibowicki <wojjie2nd@gmail.com> 2011-04-30 06:12:09 UTC ---
(In reply to comment #4)
> This doesn't at all address the fact that the ZIP file has to be transferred
> serially, byte-by-byte, meaning the files are in fact NOT downloaded in
> parallel, which creates a situation where the page will load MUCH slower than
> it normally would, by disabling all parallel downloading.

Parallel downloading is more of a fix than a solution. Ideally, it is better
for the end user and the servers if we minimize connections and just have an
overall higher throughput, but the problem is that with every file requested,
there is the overhead of sending each request with headers, which if done in
parallel the delay of the sending alone becomes negligible as you are making
more use of both directions of communication.

The other issue it solves is that it gives you a greater slice of the bandwidth
on either side for systems set up to evenly divide network resources per
connection, so more connections = greater overall slice for the application in
question.

As you can not solve all the issues in another means, a smart webmaster can
still divide the resources into multiple packages, with the highest priority
items being added first to the packages. This would also solve some of the
'waste' you  mentioned when modifying a package for a tiny image. To be honest,
most people will figure out how to setup these 'packages' and won't be merging
everything under the moon, especially if they anticipate changes.

> As for your idea of having the files download in parallel somehow... the whole
> point of having a single file is to prevent multiple files from being
> delivered. I don't think you can have it both ways.

It was an alternative suggestion, the main issue here is not multiple file
downloads, is the overhead in making the requests, and the amount of data sent
for each file requested from the server, which is not much, but on a home
connection that is being used, it can amount to enough, that when added
together might create a delay in loading all resources even when in parallel.


> Now, if you're suggesting that some sort of "manifest" file could be delivered
> to the browser to tell it what all the files in the ZIP are, sure... but how
> will that help at all, if the browser still has to wait for each file to show
> up as part of the single ZIP file stream?
> 
> What we'd REALLY need for an idea like this to not hamper performance is for
> the browser (and server) to do some sort of parallel bit downloading of a
> single file, similar to bit-torrent kind of thing, where a single giant ZIP
> file could be delivered in simultaneous/parallel chunks, bit-by-bit. If you
> wanna argue for THAT, sure, go ahead. That'd be awesome. But it's a LONG way
> from being possible, as all web servers and all web browsers would have to
> support that. If either a webserver or a browser didn't, and the HTML served up
> suggested that single manifest.ZIP file, then this view of the site would be
> excruciatingly slow because all resources would default to the serial loading,
> worse than like IE3.0 kinda days.

I would not argue for that, as that would be more complex for the browsers to
implement, and more problems to iron out, along with the fact that a heavily
used website will become quickly boggled down as you increase the number of
connections. It would be more cost efficient to minimize number of connections.

> Moreover, the separate cacheability is a concern. If I have a big manifest.ZIP
> file with all my resources in it, and I change one tiny file in that manifest,
> then doesn't the entire ZIP file change (it's file signature/size/timestamp
> certainly does). So, the browser has to re-download the entire ZIP file. Lots
> and lots of wasted download of stuff that didn't change.

Well, you can always contain version numbers or hashes within the ZIP file, and
have the browser download the difference between its version and the servers,
perhaps you can expand on the technology from bit torrent and download only the
diff of the two ZIP files. Or just have faith that a webmaster will be
competent enough not to make a big manifest with everything under the moon.
Ideally, the packages should be split up into resources used on the whole site
(main javascript libraries, layout specific images, and global CSS files), then
check how many resources are required per individual page and either load them
the way you would now, or package them into another package or two.


The other ideas are not to remove parallel downloading, but to optimize it
further by allowing the browser to request multiple resources in serial, and
then do the same across multiple parallel connections. It is mainly to reduce
the amount of data sent due to headers by sharing headers, and before you point
out that headers can change from request to request, you can include the
ability to mark what elements can be transfered in this way by stating which
can be grouped and which can possibly change cookies or might change due to
data changing during a load. 

Perhaps another simpler solution would to be to have a way to mark up certain
elements to state that minimal headers can be sent to request it, like no
referrer/cookies/user agent/etc.

> All in all, I think this idea is flawed from the start. I think it has no
> chance of actually working to improve performance in the real world. There's
> just too many fundamental paradigm problems of packaging files up together that
> loses the huge performance ability of files to be separately loaded in parallel
> and separately cacheable. Any paradigm where you lose those two things is just
> not going to work.

The main problem I want to solve here is the amount of data sent with each
request and the round trip time required for making a request. Since most
internet connections typically have an upstream speed of 1/10th of download
speed, we can theorize that the typical 300-1000 bytes required to make a
request could be translated to 3kB-10kB of data we could be downloading
instead, in ideal conditions, and even in not so ideal conditions, you can
typically download more than you can send in the same time period, so any
savings in the amount sent means great boosts in the amount you can receive.
Which is obvious, as most people have been merging javascript and css files,
and creating sprite images to reduce the overall requests.



> NOW, if we're merely talking about saving the TCP overhead of establishing a
> new connection for each file (but that still files would be requested
> individually, and in parallel), then that IS something valuable. And it already
> exists and is in wide-spread use. It's called "Keep-Alive".

I am not sure whether you are being sarcastic or not, but I will give you the
benefit of the doubt and assume you are being genuine. My main concern is not
over connection overhead, it is the overhead involved in making a request to
the server. 

To be honest, you can point flaws in every idea and system and can always
attribute it to some theoretical incompetent person out there, and to be
honest, you will always have those, but should we really stifle progress over
it? A competent webmaster would take a package system and utilize it to greatly
increase the efficiency of a site, not to mention the amount saved alone from
not having to always sprite/merge images, along with the extra CSS rules/bytes
you don't have to do by packaging all the images into one package instead.
Ideally a webmaster would take this idea, and create 2-3 packages that are
optimized for the overall loading and speed of the site, which could easily
double the speed of a page over the best and most extreme optimization
techniques currently available.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Saturday, 30 April 2011 06:12:13 UTC