Re: Delta Compression and UTF-8 Header Values from Poul-Henning Kamp on 2013-02-09 (ietf-http-wg@w3.org from January to March 2013)

From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Sat, 09 Feb 2013 22:09:31 +0000
To: "Adrien W. de Croy" <adrien@qbik.com>
cc: "Willy Tarreau" <w@1wt.eu>, "Martin Nilsson" <nilsson@opera.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Message-ID: <77593.1360447771@critter.freebsd.dk>

Content-Type: text/plain; charset=ISO-8859-1
--------
In message <emf3ec6632-2c7a-491d-9db4-ad235e66f19c@bombed>, "Adrien W. de Croy"
 writes:

>>This is exactly what you want to avoid when comparing with lots of strings.
>>It's generally more efficient to first compare lengths, then byte per byte
>>only if lengths match.

>I guess we're talking about length-prefixed data in this context though 
>so it would be an available optimisation.

The optimization is usually even greater than that[1].

The str{n}cmp() family is forced to do byte-by-byte comparisons,
because the terminating NUL may be the last accessible byte in
virtual memory, so use of any wider type might trigger an "unwarranted"
page fault or segmentation violation.

If you know the strings to be long, you can sometimes amortize the
cost of checking if you are safely inside a VM page, but it is rarely
a useful general optimization.

Memcmp() having been told up front how many bytes it can assume are
valid to access, can use wider types to reduce number of instructions
speeding things up.  The memory system usually has byte-steering
logic which means that the number of memory operations for data is
the same in either case.

Compare for instance:
	http://svnweb.freebsd.org/base/head/lib/libc/arm/string/strncmp.S?revision=194585&view=markup

to:
	http://svnweb.freebsd.org/base/head/lib/libc/arm/string/memcmp.S?revision=137464&view=markup

The "identical alignment" requirement is caused by the ARM architectures
strong alignment, but it is worth keeping in mind also in protocol design:
It may be very profitable in speed to align and pad all strings to 16 or
even 32 bits.

Poul-Henning

[1] I wrote an article related to this some time ago:
	http://queue.acm.org/detail.cfm?id=2010365

[2] One protocol I helped improve originally had a string representation
    which paid attention to this:

	{
	uint16_t	length;
	uint16_t	__pad;
	uint32_t	data[];
	}

    But we still shaved about 20% CPU time, by turning the '__pad' word
    into a simple hash of the string:  It eliminated almost all full
    string comparisons.

    As much as I like this trick, I'm not sure I see many places in
    HTTP/2 where it would make much of a difference.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Received on Saturday, 9 February 2013 22:09:56 UTC