- From: Willy Tarreau <w@1wt.eu>
- Date: Tue, 21 Oct 2014 16:24:51 +0200
- To: Julian Reschke <julian.reschke@gmx.de>
- Cc: Poul-Henning Kamp <phk@phk.freebsd.dk>, Roberto Peon <grmocg@gmail.com>, Mark Nottingham <mnot@mnot.net>, RUELLAN Herve <Herve.Ruellan@crf.canon.fr>, Amos Jeffries <squid3@treenet.co.nz>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
On Tue, Oct 21, 2014 at 12:20:16PM +0200, Willy Tarreau wrote: > On Tue, Oct 21, 2014 at 12:02:39PM +0200, Julian Reschke wrote: > > On 2014-10-21 11:44, Willy Tarreau wrote: > > >... > > >I know, and you remember, that was one of the basic points of our > > >proposal 2 years ago. I think we'll hardly propose this here since > > >it changes the ability to pass certain invalid values. However, I > > > ... > > > > Right. But maybe we could try something the represents valid values > > well, but still allows transporting invalid values? > > I hadn't thought about that option honnestly and I find that it could be > very efficient and even save a lot of CPU by avoiding to process anything > complicated or out of range. I mean, if the date respects a very precise > format, we encode it and transport it in binary. Otherwise we leave it > in plain text mode. We can have an encoding starting with something > impossible in text mode (eg: LF byte) to mark that what follows is a > 32-bit binary encoding of a timestamp. A variable-length encoding would > allow to pass end of the current epoch and to use less bytes currently. > This method would even be compatible with existing encoding and parsers. I just experimented with this : variable length encoding of the date using 255/256 possible values for each byte, so that it remains compatible with string processing in programs (no zero byte is produced). The example code below converts valid strings to their compacted form and leaves the non-convertable ones intact. The converted string's format indicates what format it's in (ie: if it begins with 0xA its a compact one). The gains are interesting, just tested on example.com, there are 3 dates there, 24 bytes each, and reduced to 5 bytes each, that's 57 bytes saved in one response header (19 bytes saved per valid header). The encoding function works this way : it tries to parse a valid HTTP date and returns either the original string or the compacted one : const char *http_date_encode(const char *date) { struct tm tm; static char buf[10]; if (strptime(date, "%a, %d %b %Y %H:%M:%S %Z", &tm) == NULL) return date; if (!nz_encode(buf + 1, sizeof(buf) - 1, mktime(&tm))) return date; buf[0] = '\n'; return buf; } The decoder does the opposite, it checks whether the date string starts with an LF character and returns either its expansion or the string itself : const char *http_date_decode(const char *date) { struct tm *tm; static char buf[200]; time_t t; if (*date != '\n') return date; t = nz_decode(date + 1); tm = gmtime(&t); strftime(buf, sizeof(buf), "%a, %d %b %Y %H:%M:%S %Z", tm); return buf; } The non-zero encoding/decoding functions are simple base255 encoders which ensure that we won't add an extra byte for a long time, but if someone doesn't feel at ease with these, it's possible to use another format (and waste more space) : /* decode a non-zero encoded string into a long int */ unsigned long nz_decode(const char *buf) { unsigned long v; unsigned long mul = 1; for (v = 0; *buf; buf++) { v += mul * (unsigned char)(*buf - 1); mul *= 255; } return v; } /* encode value <v> into a non-zero string */ const char *nz_encode(char *buf, int size, unsigned long v) { char *out; for (out = buf; size >= 1;) { if (!v) { *out = 0; return buf; } *(out++) = 1 + (v % 255); v /= 255; } return NULL; } Then the calling code which converts the date both ways and compares outputs. int main(int argc, char **argv) { const char *in; const char *enc; const char *dec; if (argc < 2) return 1; in = argv[1]; enc = http_date_encode(argv[1]); dec = http_date_decode(enc); printf("in =<%s> (%d chars)\n", in, strlen(in)); printf("enc=<%s> (%d chars) savings=%d\n", enc, strlen(enc), strlen(in)-strlen(enc)); printf("dec=<%s> (%d chars) strcmp=%d\n", dec, strlen(dec), strcmp(in, dec)); return 0; } $ curl -I example.com HTTP/1.1 200 OK Accept-Ranges: bytes Cache-Control: max-age=604800 Content-Type: text/html Date: Tue, 21 Oct 2014 14:20:53 GMT Etag: "359670651" Expires: Tue, 28 Oct 2014 14:20:53 GMT Last-Modified: Fri, 09 Aug 2013 23:54:35 GMT Server: ECS (iad/182A) X-Cache: HIT x-ec-custom-error: 1 Content-Length: 1270 These 3 dates (Date, Expires, Last-Modified) can then be shrunk by 57 bytes total without the risk of altering an invalid content. What's nice with this method is that only the producer needs to know what headers it wants to compact, the recipient doesn't need to know them since it can use the LF byte to know if a value needs to be expanded or not. An if-modified-since in a request could be reduced by 19 bytes, which is nice as well. Thoughts ? Willy
Received on Tuesday, 21 October 2014 14:26:27 UTC