- From: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Date: Sat, 09 Feb 2013 22:09:31 +0000
- To: "Adrien W. de Croy" <adrien@qbik.com>
- cc: "Willy Tarreau" <w@1wt.eu>, "Martin Nilsson" <nilsson@opera.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset=ISO-8859-1
--------
In message <emf3ec6632-2c7a-491d-9db4-ad235e66f19c@bombed>, "Adrien W. de Croy"
writes:
>>This is exactly what you want to avoid when comparing with lots of strings.
>>It's generally more efficient to first compare lengths, then byte per byte
>>only if lengths match.
>I guess we're talking about length-prefixed data in this context though
>so it would be an available optimisation.
The optimization is usually even greater than that[1].
The str{n}cmp() family is forced to do byte-by-byte comparisons,
because the terminating NUL may be the last accessible byte in
virtual memory, so use of any wider type might trigger an "unwarranted"
page fault or segmentation violation.
If you know the strings to be long, you can sometimes amortize the
cost of checking if you are safely inside a VM page, but it is rarely
a useful general optimization.
Memcmp() having been told up front how many bytes it can assume are
valid to access, can use wider types to reduce number of instructions
speeding things up. The memory system usually has byte-steering
logic which means that the number of memory operations for data is
the same in either case.
Compare for instance:
http://svnweb.freebsd.org/base/head/lib/libc/arm/string/strncmp.S?revision=194585&view=markup
to:
http://svnweb.freebsd.org/base/head/lib/libc/arm/string/memcmp.S?revision=137464&view=markup
The "identical alignment" requirement is caused by the ARM architectures
strong alignment, but it is worth keeping in mind also in protocol design:
It may be very profitable in speed to align and pad all strings to 16 or
even 32 bits.
Poul-Henning
[1] I wrote an article related to this some time ago:
http://queue.acm.org/detail.cfm?id=2010365
[2] One protocol I helped improve originally had a string representation
which paid attention to this:
{
uint16_t length;
uint16_t __pad;
uint32_t data[];
}
But we still shaved about 20% CPU time, by turning the '__pad' word
into a simple hash of the string: It eliminated almost all full
string comparisons.
As much as I like this trick, I'm not sure I see many places in
HTTP/2 where it would make much of a difference.
--
Poul-Henning Kamp | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG | TCP/IP since RFC 956
FreeBSD committer | BSD since 4.3-tahoe
Never attribute to malice what can adequately be explained by incompetence.
Received on Saturday, 9 February 2013 22:09:56 UTC