Re: JPEG-XL as Content-Encoding? from Alex Deymo on 2020-08-21 (ietf-http-wg@w3.org from July to September 2020)

From: Alex Deymo <deymo@google.com>
Date: Fri, 21 Aug 2020 16:36:46 +0200
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <CAGd9gwjWriKCRNNDkjfx0ME0L8v5qT3mO=6X1U2tDNYyXLtZ9w@mail.gmail.com>
Le ven. 21 août 2020 à 14:27, Julian Reschke <julian.reschke@gmx.de> a
écrit :

> > However, on top of that, the lossless recompression of JPEG files allows
> > you to get this ~20% gain for existing files. When you deploy a new
> > lossy codec there is the question of what to do with the existing
> > images. If you have a website with photos and want to convert your
> > already lossy JPEG files to a new codec to save storage and bandwidth
> > and you decide to decode them to pixels and encode them back to the new
> > format you will end up with more artifacts or worse compression density
> > trying to accurately represent the JPEG artifacts in the new codec,
> > whatever the codec is. It's impractical to do this lossy transcoding to
> > a new codec at large scale on existing images, each application would
> > need to evaluate whether they want to do this for existing images. This
> > story is different if you start with a large and high quality image
> > (like a JPEG from a camera) and want to encode in a smaller form for the
> > web, since there you already have a high quality file.
>
> That makes it sound a bit as if a losslessly-re-encoded JPG file is not
> a valid JXL file. Is that the case?
>

A losslessly recompressed JPEG is a valid JXL file. There's value in
conserving your JPEG files as lossless recompressed versions outside the
Content-Encoding world (like, converting your existing library in your
hard-drive).

What I meant here is that if you start with a large high quality image
(JPEG or RAW) and you encoded it in the past long time ago to a lower
resolution or lower quality JPEG for the web application, you introduced
certain specific JPEG-artifacts and discarded information about the
original file. In some sense, the damage to the image is done. If you
already did this, then you are limited in your options on how to further
compress this file because you don't know what the original file looked
like so you might be trying to accurately reproduce JPEG artifacts with a
new codec instead of accurately reproduce original image features, this is
where lossless recompression is a good idea.
Instead, if you still have the original file, you can produce a lower
quality or lower resolution JXL that's visually similar to the original
file (not visually similar to the low res JPEG in the previous case). This
would give you a better compression ratio for the visual quality (but it
would not give you a JPEG file right away).

What's not true is the opposite statement, and maybe that's where the
confusion is. Not every JXL is a losslessly-re-encoded JPEG, although you
can always do stuff like decode any JXL to pixels and encode it back to
JPEG but it would largely depend on how you encode back to JPEG what file
you end up with. The lossless recompression feature limits the options when
encoding the JXL and adds extra information to be able to deterministically
produce a certain JPEG file.


> ...
> > I think the only shocking thing about a content-encoding for JPEGs is
> > that it can't encode any arbitrary file only JPEGs, but if you look at
> > "general purpose" compressors like Brotli they still can't compress to a
> > smaller file every file; many binary files that are already compressed
> > like .zip or even a JPEG files (unless they have a large ICC) won't
> > compress to a smaller file so you just don't do it even if Brotli is
> > able to compress them to a ~similar size file.
> > ...
>
> That's indeed a concern. For the other currently registered encodings,
> you *can* apply them, but they do not necessarily help.
>
> This one can't be applied to any file type. One way to address this
> would be to tune the format that it *can* handle any file type (by just
> adding a tiny wrapper around it and preserving the actual octet stream
> within).


Yes you could add a tiny frame around to tell whether this was lossless
recompressed or not (maybe paying ~1 more byte), but isn't this basically
what the Content-Encoding header in the response is for anyway? I don't see
an application where this frame would help, the server side is not forced
to use the content-encoding and sending a file wrapped into another format
that adds no benefit would be a bit of a waste:
1. If we don't have this frame, you can call the function to do the
lossless encoding, if it returns with an error (like if the file is not a
JPEG) then you don't set Content-Encoding to jxl.
2. If we do have this frame, you always set the content encoding as jxl,
and then the function that would do the encoding does exactly the same
logic but stores the "jxl or raw" bit of information in the first byte
depending on whether it was able to re-encode it.
There's very little difference in how much you can already send to the
client before the encoding is done in either case and in general you know
very quickly whether the file can be encoded or not. Maybe all we need is a
function to tell very quickly whether we *can* encode it. I think this is
possible and relatively easy.  But I understand that this limitation may
need changes in how your server integrates a new content encoding since it
is not the same way that brotli for example was integrated; this is
something that can be addressed at the time of implementing support for
this content encoding in your server-side software.
My idea of how this would be implemented is more along the lines of already
having the jxl lossless file for static content and just serving it on
request or decoding+serving for clients not supporting it, given that you
get a significant benefit in storage size of static content (similar to
brotli_static setting in Nginx brotli).

That said, I should probably mention that according to the spec draft a
valid JPEG-1 file is also a valid JXL file, so it is really the non-JPEG
files that you can't re-encode and that part you can tell by looking at the
first few bytes, so we already have this frame information for old JPEG1 vs
JXL file.
Received on Friday, 21 August 2020 14:38:03 UTC