- From: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Date: Sat, 09 Feb 2013 22:09:31 +0000
- To: "Adrien W. de Croy" <adrien@qbik.com>
- cc: "Willy Tarreau" <w@1wt.eu>, "Martin Nilsson" <nilsson@opera.com>, "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: text/plain; charset=ISO-8859-1 -------- In message <emf3ec6632-2c7a-491d-9db4-ad235e66f19c@bombed>, "Adrien W. de Croy" writes: >>This is exactly what you want to avoid when comparing with lots of strings. >>It's generally more efficient to first compare lengths, then byte per byte >>only if lengths match. >I guess we're talking about length-prefixed data in this context though >so it would be an available optimisation. The optimization is usually even greater than that[1]. The str{n}cmp() family is forced to do byte-by-byte comparisons, because the terminating NUL may be the last accessible byte in virtual memory, so use of any wider type might trigger an "unwarranted" page fault or segmentation violation. If you know the strings to be long, you can sometimes amortize the cost of checking if you are safely inside a VM page, but it is rarely a useful general optimization. Memcmp() having been told up front how many bytes it can assume are valid to access, can use wider types to reduce number of instructions speeding things up. The memory system usually has byte-steering logic which means that the number of memory operations for data is the same in either case. Compare for instance: http://svnweb.freebsd.org/base/head/lib/libc/arm/string/strncmp.S?revision=194585&view=markup to: http://svnweb.freebsd.org/base/head/lib/libc/arm/string/memcmp.S?revision=137464&view=markup The "identical alignment" requirement is caused by the ARM architectures strong alignment, but it is worth keeping in mind also in protocol design: It may be very profitable in speed to align and pad all strings to 16 or even 32 bits. Poul-Henning [1] I wrote an article related to this some time ago: http://queue.acm.org/detail.cfm?id=2010365 [2] One protocol I helped improve originally had a string representation which paid attention to this: { uint16_t length; uint16_t __pad; uint32_t data[]; } But we still shaved about 20% CPU time, by turning the '__pad' word into a simple hash of the string: It eliminated almost all full string comparisons. As much as I like this trick, I'm not sure I see many places in HTTP/2 where it would make much of a difference. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Received on Saturday, 9 February 2013 22:09:56 UTC