- From: Roberto Polli <robipolli@gmail.com>
- Date: Mon, 13 Jul 2020 16:49:22 +0200
- To: Sergey Ponomarev <stokito@gmail.com>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
- Message-ID: <CAP9qbHX4ety3zZBYFZb5dvWinpP-z=L0sp0hzDofjLFe_=qmdw@mail.gmail.com>
Hi Sergey,
Digest header was introduced long ago via Rfc3230. We are just updating
it...
It's goal is different from etag though, but you can use digest-algorithms
to compute strong etags. Consider though that digest changes when
localising resources (eg. Via content-language) while weak etags probably
won't.
If someone thinks we should describe the relationship between digest and
etags in the new spec we can do it.
Have a nice day,
R
Il lun 13 lug 2020, 02:28 Sergey Ponomarev <stokito@gmail.com> ha scritto:
> Hi,
>
> I just implemented ETag caching for BusyBox httpd which is a http server
> for embedded devices like WiFi routers.
> While implementing I had to choose what exactly should be generated as
> ETag.
> ETag is specified in https://tools.ietf.org/html/rfc2616#section-14.19 as
> an opaque value and a server is free to generate it as it needs.
> In the https://httpwg.org/specs/rfc7232.html#rfc.section.2.3 Conditional
> Requests are better explained strategies to generate and compare ETags.
> But even in the upcoming HTTP Caching draft-ietf-httpbis-cache-09 no any
> practical details about ETag generation.
>
> I did small research and found out that all web servers do it in their own
> way and this causes several problems:
> 1. ETag may be badly or even wrongly generated.
> 2. When two different servers e.g. Apache and Nginx are behind load
> balancer then their ETags will be always discarded because they are
> generated differently. That's why some sysadmins disable ETag on one of the
> servers.
> These problems can be easily fixed if HTTP specification will provide a
> recommended way to generate ETags while keeping freedom of choice.
>
> Typical ETag is based on file's Last Modification Time and Size which can
> be easily retrieved from the file system but can be a more strict hash or
> checksum and sometimes a semantic version.
>
> Just a quick overview of typical algorithms used in webservers.
> Consider we have a file with
> * Size 1047 i.e. 417 in hex.
> * MTime i.e. last modification on Mon, 06 Jan 2020 12:54:56 GMT which
> is 1578315296 milliseconds in unix time or 1578315296666771000 nanoseconds.
> * Inode which is a physical file number 66 i.e. 42 in hex
>
> Different webservers returns ETag like:
> Nginx: "5e132e20-417" i.e.
> "hex(MTime)-hex(Size)". Not configurable.
> Apache/2.2: "42-417-59b782a99f493" i.e. "hex(INode)-hex(Size)-hex(MTime
> in nanoseconds)". Can be configured but MTime anyway will be in nanos
> http://httpd.apache.org/docs/2.4/mod/core.html#fileetag
> Apache/2.4: "417-59b782a99f493" i.e. "hex(Size)-hex(MTime in
> nanoseconds)" i.e. without INode which is friendly for load balancing when
> identical file have different INode on different servers.
> OpenWrt uhttpd: "42-417-5e132e20" i.e.
> "hex(INode)-hex(Size)-hex(MTime)". Not configurable.
> Tomcat 9: W/"1047-1578315296666" i.e. Weak"Size-MTime in Nanoseconds".
> This is incorrect ETag because it should be strong as for a static file
> i.e. octal compatibility.
> LightHTTPD: most weird: "hashcode(42-1047-1578315296666771000)" i.e.
> INode-Size-MTime but then reduced to a simple integer by hashcode. Can be
> configured but you can only disable one part (etag.use-inode = "disabled")
>
> Hex numbers are used here so often because it's cheap to convert a decimal
> number to a shorter hex string.
> Inode while adding more guarantees makes load balancing not possible and
> very fragile if you simply copied the file during application redeploy.
> MTime in nanoseconds is not available on all platforms and we don't need
> such granularity. Apache have reported bugs on this like
> https://bz.apache.org/bugzilla/show_bug.cgi?id=55573
> The order MTime-Size or Size-MTime is also matters because MTime is more
> likely changed so comparing ETag string may be faster for a dozen
> CPU cycles.
> Even if this is not a full checksum hash but definitely not a weak ETag.
> This is enough to show that we expect octal compatibility for Range
> requests.
> Apache and Nginx shares almost all trafik in Internet but most static
> files are shared via Nginx and it is not configurable.
>
> If I am not missing anything then it looks like Nginx uses the most
> reasonable schema. And I used it for BusyBox httpd.
> The whole ETag generated by printf("\"%" PRIx64 "-%" PRIx64 "\"",
> last_mod, file_size)
>
> My proposition is to take Nginx schema and make it as a recommended
> ETag algorithm. Or at least just to mention in rfc7232 as an example.
> And other servers should have at least possibility to configure such ETag
> form.
> I'll try to engage other web servers teams into the discussion and 'll try
> to create patches for them.
>
> While having the simple MTime-Size ETag algorithm solves a bunch of
> problems but some systems wants to have more guarantees and they need hash
> based ETags.
> Any hash even MD5 or CRC32 is great to use as ETag.
>
> There is a draft of Digest Headers
> https://github.com/httpwg/http-extensions/blob/master/draft-ietf-httpbis-digest-headers.md .
> It's idea is similar to Subresource Integration (SRI).
> And in fact instead of introducing the new Digest header we can just reuse
> ETag header with prefix.
>
> Respectively instead of:
>
> Digest: sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo=
>
> We can use
>
> ETag: "sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo="
>
> Client can easily parse ETag header and by prefix determine the way to
> validate.
> We'll have "structured ETag" and they are already supported by proxies.
>
> For the same file server can send two comma separated ETags: one MTimeSize
> and additional digest based. Old clients just resend them via
> If-None-Match. If a server like BusyBox can only validate MTimeSize Etag it
> will validate it and ignore sha256 based ETag.
>
> BTW the file hashes can be stored ext4 in extended attributes to avoid
> recalculating.
>
> Please tell your thoughts and opinions and share best practice for ETags.
>
> See also:
> Apache code to generate ETag
> https://searchcode.com/codesearch/view/28934406/
> LightHTTPD
> https://git.lighttpd.net/lighttpd/lighttpd1.4/src/branch/master/src/etag.c
>
> --
> Sergey Ponomarev <https://linkedin.com/in/stokito>, skype:stokito
>
>
>
Received on Monday, 13 July 2020 14:49:48 UTC