- From: Roberto Polli <robipolli@gmail.com>
- Date: Mon, 13 Jul 2020 16:49:22 +0200
- To: Sergey Ponomarev <stokito@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP9qbHX4ety3zZBYFZb5dvWinpP-z=L0sp0hzDofjLFe_=qmdw@mail.gmail.com>
Hi Sergey, Digest header was introduced long ago via Rfc3230. We are just updating it... It's goal is different from etag though, but you can use digest-algorithms to compute strong etags. Consider though that digest changes when localising resources (eg. Via content-language) while weak etags probably won't. If someone thinks we should describe the relationship between digest and etags in the new spec we can do it. Have a nice day, R Il lun 13 lug 2020, 02:28 Sergey Ponomarev <stokito@gmail.com> ha scritto: > Hi, > > I just implemented ETag caching for BusyBox httpd which is a http server > for embedded devices like WiFi routers. > While implementing I had to choose what exactly should be generated as > ETag. > ETag is specified in https://tools.ietf.org/html/rfc2616#section-14.19 as > an opaque value and a server is free to generate it as it needs. > In the https://httpwg.org/specs/rfc7232.html#rfc.section.2.3 Conditional > Requests are better explained strategies to generate and compare ETags. > But even in the upcoming HTTP Caching draft-ietf-httpbis-cache-09 no any > practical details about ETag generation. > > I did small research and found out that all web servers do it in their own > way and this causes several problems: > 1. ETag may be badly or even wrongly generated. > 2. When two different servers e.g. Apache and Nginx are behind load > balancer then their ETags will be always discarded because they are > generated differently. That's why some sysadmins disable ETag on one of the > servers. > These problems can be easily fixed if HTTP specification will provide a > recommended way to generate ETags while keeping freedom of choice. > > Typical ETag is based on file's Last Modification Time and Size which can > be easily retrieved from the file system but can be a more strict hash or > checksum and sometimes a semantic version. > > Just a quick overview of typical algorithms used in webservers. > Consider we have a file with > * Size 1047 i.e. 417 in hex. > * MTime i.e. last modification on Mon, 06 Jan 2020 12:54:56 GMT which > is 1578315296 milliseconds in unix time or 1578315296666771000 nanoseconds. > * Inode which is a physical file number 66 i.e. 42 in hex > > Different webservers returns ETag like: > Nginx: "5e132e20-417" i.e. > "hex(MTime)-hex(Size)". Not configurable. > Apache/2.2: "42-417-59b782a99f493" i.e. "hex(INode)-hex(Size)-hex(MTime > in nanoseconds)". Can be configured but MTime anyway will be in nanos > http://httpd.apache.org/docs/2.4/mod/core.html#fileetag > Apache/2.4: "417-59b782a99f493" i.e. "hex(Size)-hex(MTime in > nanoseconds)" i.e. without INode which is friendly for load balancing when > identical file have different INode on different servers. > OpenWrt uhttpd: "42-417-5e132e20" i.e. > "hex(INode)-hex(Size)-hex(MTime)". Not configurable. > Tomcat 9: W/"1047-1578315296666" i.e. Weak"Size-MTime in Nanoseconds". > This is incorrect ETag because it should be strong as for a static file > i.e. octal compatibility. > LightHTTPD: most weird: "hashcode(42-1047-1578315296666771000)" i.e. > INode-Size-MTime but then reduced to a simple integer by hashcode. Can be > configured but you can only disable one part (etag.use-inode = "disabled") > > Hex numbers are used here so often because it's cheap to convert a decimal > number to a shorter hex string. > Inode while adding more guarantees makes load balancing not possible and > very fragile if you simply copied the file during application redeploy. > MTime in nanoseconds is not available on all platforms and we don't need > such granularity. Apache have reported bugs on this like > https://bz.apache.org/bugzilla/show_bug.cgi?id=55573 > The order MTime-Size or Size-MTime is also matters because MTime is more > likely changed so comparing ETag string may be faster for a dozen > CPU cycles. > Even if this is not a full checksum hash but definitely not a weak ETag. > This is enough to show that we expect octal compatibility for Range > requests. > Apache and Nginx shares almost all trafik in Internet but most static > files are shared via Nginx and it is not configurable. > > If I am not missing anything then it looks like Nginx uses the most > reasonable schema. And I used it for BusyBox httpd. > The whole ETag generated by printf("\"%" PRIx64 "-%" PRIx64 "\"", > last_mod, file_size) > > My proposition is to take Nginx schema and make it as a recommended > ETag algorithm. Or at least just to mention in rfc7232 as an example. > And other servers should have at least possibility to configure such ETag > form. > I'll try to engage other web servers teams into the discussion and 'll try > to create patches for them. > > While having the simple MTime-Size ETag algorithm solves a bunch of > problems but some systems wants to have more guarantees and they need hash > based ETags. > Any hash even MD5 or CRC32 is great to use as ETag. > > There is a draft of Digest Headers > https://github.com/httpwg/http-extensions/blob/master/draft-ietf-httpbis-digest-headers.md . > It's idea is similar to Subresource Integration (SRI). > And in fact instead of introducing the new Digest header we can just reuse > ETag header with prefix. > > Respectively instead of: > > Digest: sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo= > > We can use > > ETag: "sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo=" > > Client can easily parse ETag header and by prefix determine the way to > validate. > We'll have "structured ETag" and they are already supported by proxies. > > For the same file server can send two comma separated ETags: one MTimeSize > and additional digest based. Old clients just resend them via > If-None-Match. If a server like BusyBox can only validate MTimeSize Etag it > will validate it and ignore sha256 based ETag. > > BTW the file hashes can be stored ext4 in extended attributes to avoid > recalculating. > > Please tell your thoughts and opinions and share best practice for ETags. > > See also: > Apache code to generate ETag > https://searchcode.com/codesearch/view/28934406/ > LightHTTPD > https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/etag.c > > -- > Sergey Ponomarev <https://linkedin.com/in/stokito>, skype:stokito > > >
Received on Monday, 13 July 2020 14:49:48 UTC