Re: [w3ctag/design-reviews] `sec-metadata` (#280)

Regarding the header's size, we've made a few tweaks to the format in the last few weeks (dropped `target`, added a new value to `destination` (assuming that whatwg/fetch#755 is accepted), dropped `cause` from non-navigation requests, and @arturjanc wants to add a new 3-value enum for `mode` (in mikewest/sec-metadata#5)). Given those changes, the longest navigation request header would be 82 characters:

```http
Sec-Metadata: cause=user-activated, destination=nested-document, site=cross-origin
```

The longest subresource request would be 62 characters:

```http
Sec-Metadata: destination=serviceworker, site=cross-origin, mode=same-origin
```

Many requests will be shorter (e.g. `Sec-Metadata: destination=empty, site=same-site, mode=cors`), but let's take those as our baseline. They don't seem terrible to me. But, let's assume that they are, in fact, terrible. We have some options to make them shorter if we throw legibility out the window.

* `cause` can shift from a named [identifier](https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-08#section-3.8) model to a [boolean](https://tools.ietf.org/html/draft-ietf-httpbis-header-structure-08#section-3.10): `forced=!T` or `forced=!F`. That drops the maximum navigation request value's size down to 57 characters, and it's not too terrible to read.
* `site` can shift from a three-value enum to the numbers 0-2: `site=0`. That drops the maximum navigation request value's size down to 45 characters.
* `destination` can shift from a ~20 value enum to the numbers 0-20: `destination=10`. That drops the maximum navigation request value's size down to 32 characters.

```http
Sec-Metadata: forced=1, destination=10, site=0
```

This isn't terribly legible, and basically requires a lookup table for `destination`. At that point, since we're throwing legibility out the window, we may as well also set it on fire before its defenestration by encoding the data as a bitfield.

We've got a boolean, a 20-value enum, a 3-value enum, and a 5-value enum. Let's give ourselves a whole byte for `destination`, and pad the other values with a bit each, because who knows how much they'll change in the misty future, and we end up with:

<table>
  <tr>
    <th colspan="2">Cause</th>
    <th colspan="8">Destination</th>
    <th colspan="3">Site</th>
    <th colspan="4">Mode</th>
  </tr>
  <tr>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
     <td>1</td>
     <td>1</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
     <td>0</td>
     <td>0</td>
     <td>1</td>
     <td>0</td>
  </tr>
</table>

Which encodes as a binary structured header value in 6 characters as:

```http
Sec-Metadata: *AADq*
```

That's a direction we could go. Is it better? I'm not sure.

Let's look at @mnot's suggestions:

*   > Have the server opt into it. The usual mechanisms, usual problems. If only there were a metadata file that contained the server's preferences for browsers!

    The usual mechanisms (Client Hints, for example) might not be appropriate here, as they're moving to a delegation model that I think makes a lot of sense (see https://github.com/WICG/feature-policy/issues/129). That model would remove basically all value from this proposal, as it would be attacker controlled (and it seems unlikely that we can rely on attackers setting the evil bit for us).

    Chrome's actively developing [Origin Policy](https://wicg.github.io/origin-policy/), and I've heard rumblings that Firefox is interested in doing the same. I'm not particularly enthusiastic about waiting on this until those mechanisms are finished, given that it would allow us to address concrete threats today. I'm equally unenthusiastic about inventing a new opt-in mechanism alongside all the others we've invented recently.

    IMO, the value of the header is high. Its cost seems bearable, while the cost of another site-wide opt-in mechanism is not. But of course I'd think that. :)

*   > Split into multiple headers.

    If we binary-encode the data as above, we wouldn't want to split them up (I assume?).

    If we don't binary-encode the data, we'd shift from:

    ```http
    Sec-Metadata: cause="user-activated", destination="document", site="same-origin"
    ```

    To

    ```http
    Sec-Metadata-Cause: user-activated
    Sec-Metadata-Destination: document
    Sec-Metadata-Site: same-origin
    ```

    If that helps compression, great. It's not clear to me that it does? But you know more about the algorithm than I do.

*   > Don't make the directives so verbose, while still trying to maintain readability.
    
    As above, `cause` seems simple to model as a boolean. I don't know how to shorten the `destination` values in a way that maintains readability. `document` could be `doc`, I guess. And `nested-document` could be `iframe`. But `serviceworker` can't be `sw` because it needs to be distinguished from `sharedworker`. `style` can't be `s` because it needs to be distinguished from `script`. Is `sc` and `st` legible? I'm not sure. Likewise, `site` could be `so`/`ss`/`cs`. That seems neither legible nor efficient.

With the exception of shifting `cause` to a boolean (which seems fine either way?), I think I would prefer either the existing, legible format, or a binary-encoded bitfield. The middle ground doesn't seem like it addresses either human comprehension or efficiency.

What do y'all think?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/w3ctag/design-reviews/issues/280#issuecomment-438573887

Received on Wednesday, 14 November 2018 08:10:05 UTC