Abbreviation form for HTTP JSON Header Field Values? from Kazuho Oku on 2016-01-22 (ietf-http-wg@w3.org from January to March 2016)

From: Kazuho Oku <kazuhooku@gmail.com>
Date: Fri, 22 Jan 2016 16:27:38 +0900
To: Julian Reschke <julian.reschke@gmx.de>
Cc: HTTP Working Group <ietf-http-wg@w3.org>, Stefan Eissing <stefan.eissing@greenbytes.de>
Message-ID: <CANatvzwXfz5vYi-QAdExuQJTAa5zyXE5CTQ7WgHy1RgUn-StzA@mail.gmail.com>
Dear Mr. Julian F. Reschke,

Thank you for writing the HTTP JFV draft.

I love the concept, and would love to see it being used in all the
future header definitions once the JFV draft gets standardized.

And regarding the draft, is there any work to introduce abbreviation form?

I assume the biggest argument against JFV is that it cannot encode
simple things (i.e. objects mostly conveying default values) as simple
as in case we use tailor-made ABNF to define the syntax, and think
that having an abbreviation form defined in JFV (either as a
requirement or as an optional feature) will be a good thing to do.

Specifically, I would like to see the following transformations defined:

* rule 1) a single-element hash MAY be transformed to the value of the
single element if all of the following conditions are met:
 * the semantics state that the element is the only required element
 * the type of the element is not a hash

* rule 2) a single-element array MAY be represented with the single
element if the following condition is met:
 * the type of the element is not an array

With the rules, I believe it is possible reduce the redundancy imposed
by using JSON, while preserving the good aspects of JFV.

In the rest of the document, I will describe what made me believe such
abbreviation form should be defined and the impact on the decoder for
having the abbrevation form defined within the spec.  Examples using
popular HTTP headers are also provided.


My Use-Case
---

The reason I would like to see abbreviation forms in JFV comes from
the discussion with Stefan on how to define the `cache-digest` header.
Now, the disagreement between Stefan and me (please refer to
https://lists.w3.org/Archives/Public/ietf-http-wg/2016JanMar/0154.html)
is whether if we should encode a required component (in our case, the
digest value) outside of the attribute key-value pairs (option A), or
if we should define the component as a required element of the
attributes (option B).

Examples below show the headers encoded using the two options.  Each
three semantically corresponds to the other three.

```
Option A:
  cache-digest: base64encodedgcs
  cache-digest: base64encodedgcs; path="/foo"
  cache-digest: base64encodedgcs; path="/foo", anothergcs;
path="/foo"; type="if-modified-since"

Option B:
  cache-digest: fresh="base64encodedgcs"
  cache-digest: fresh="base64encodedgcs"; path="/foo"
  cache-digest: fresh="base64encodedgcs";
if-modified-since="anothergcs"; path="/foo"
```

As is shown in the examples, in the case of `cache-digest` header,
option A yields a more concise output in simple cases, while option B
yields smaller output in complex cases due to the fact that it is
possible to contain more than one GCS in a single set of atttributes.

In other words, this is a trade-off issue; and per my understanding
the current draft of JFV does not address the problem.
The draft always enforces the use of key-value pairs in this case.
Therefore, I anticipate that in the future we might see similiar
arguments for not using JFV when we are to define a new header.

Going back to the case of cache-digest header, ideally I would like
see the entry of the header to have the following characteristics:

* a cache-digest entry conveys one or more digests, together with
attributes that limit the scope of the contained digest (e.g. domain,
path)
* a digest is a base64-encoded bit-field of various algorithms (e.g.
GCS, Bloom filter), representing cache resources that fall into
certain category (e.g. fresh, stale-with-if-modified-since-header)

And the characteristics lead to a header like the following when JFV
is used, which look even more redundant for the simple cases.

```
cache-digest: {
                "digest" : [
                  { "value" : "base64encodedgcs" },   // omitted
defaults: category=fresh, encoding=gcs
                ]
              }

cache-digest: {
                "digest" : [
                  { "value" : "base64encodedgcs" },   // omitted
defaults: category=fresh, encoding=gcs
                ],
                "path"   : "/foo"
              }

cache-digest: {
                "digest" : [
                  { "value" : "base64encodedgcs" },   // omitted
defaults: category=fresh, encoding=gcs
                  { "value" : "anothergcs",
category="if-modified-since" }  // omitted defaults: encoding=gcs
                ],
                "path"   : "/foo"
              }
```

But if the aforementioned transformations were permitted within the
JFV spec, the headers will become much simpler with the
transformations applied:

```
cache-digest: "base64encodedgcs"
cache-digest: {
                "digest": "base64encodedgcs",
                "path"  : "/foo"
              }
cache-digest: {
                "digest": [
                  "base64encodedgcs",
                  { "value" : "anothergcs", category="if-modified-since" }
                "path"  : "/foo"
              }
```

In this example, the transformations yield a header representation
comparable to tailor-made ABNF (option A) for the simplest cases (as
shown in the first of the three headers).


Impact on the Decoding-side
---

Now that it has been shown that (at least in our case) defining the
transformations yield to a more concise output for simple use-cases,
let's move on to consider how large the impact of implementing such
transformations will be on the decoding-side.

Actually, the application-specific part of the decoder does not become
complex at all by adding support for the abbreviation form.

This is because the reverse transformations can be implemented at the
points where type checks were performed.  All the thing that the
decoder needs to do for supporting the abbreviation form is to convert
the value to the non-abbreviated type if it is not, instead of
throwing a decoding error.  As an example, the diff that adds support
for the abbreviation form to the decoder that handles the
previously-defined cache-digest header can be found at
https://gist.github.com/kazuho/c84fa23b26c606e55533/revisions#diff-b071c075a9788be737d99e9159092db8.


Other Examples
---

Consider using JFV for encoding the `Content-Type` header.
Without abbreviation form, it would look like:

  Content-Type: { "type": "text/html" }
  Content-Type: { "type": "text/html", "charset": "utf-8" }

With the abbrevation form, it can be like:

  Content-Type: "text/html"
  Content-Type: { "type": "text/html", "charset": "utf-8" }

Consider using JFV for encoding the `Accept-Encoding` header.
Without abbreviation form, it would look like:

  Accept-Encoding: { "encoding": "compress" }, { "encoding": "gzip" }
  Accept-Encoding: { "encoding": "gzip" }, { "encoding": "identity",
"q": 0.5 }, { "encoding": "*", "q": 0 }

With the abbreviation form, it can be like:

  Accept-Encoding: "compress", "gzip"
  Accept-Encoding: "gzip", { "encoding": "identity", "q": 0.5 }, {
"encoding": "*", "q": 0 }

As can be seen from the examples, if we support abbreviation form in
JFV it is possible to encode simple headers as simple as they are now.


Conclusion
---

Please consider supporting some kind of abbreviation form in JFV; I
believe that it would make JFV more attractive to the users, since
with abbreviation it is possible to offer both simplicity (for simple
cases) and extensibility (of JSON) at the same time.

-- 
Kazuho Oku
Received on Friday, 22 January 2016 07:28:08 UTC