Re: Benjamin Kaduk's No Objection on draft-ietf-httpbis-client-hints-14: (with COMMENT)

On Wed, Jun 17, 2020 at 8:41 PM Benjamin Kaduk <kaduk@mit.edu> wrote:

> On Wed, Jun 17, 2020 at 10:47:34AM +0200, Yoav Weiss wrote:
> > Thanks for reviewing and apologies for the delayed reply :/
> >
> > Comments addressed below and incorporated into
> > https://github.com/httpwg/http-extensions/pull/1220
> > Your review would be appreciated :)
> >
> > On Tue, May 19, 2020 at 10:56 PM Benjamin Kaduk via Datatracker <
> > noreply@ietf.org> wrote:
> >
> > > Benjamin Kaduk has entered the following ballot position for
> > > draft-ietf-httpbis-client-hints-14: No Objection
> > >
> > > When responding, please keep the subject line intact and reply to all
> > > email addresses included in the To and CC lines. (Feel free to cut this
> > > introductory paragraph, however.)
> > >
> > >
> > > Please refer to
> https://www.ietf.org/iesg/statement/discuss-criteria.html
> > > for more information about IESG DISCUSS and COMMENT positions.
> > >
> > >
> > > The document, along with other ballot positions, can be found here:
> > > https://datatracker.ietf.org/doc/draft-ietf-httpbis-client-hints/
> > >
> > >
> > >
> > > ----------------------------------------------------------------------
> > > COMMENT:
> > > ----------------------------------------------------------------------
> > >
> > > Section 1
> > >
> > >    There are thousands of different devices accessing the web, each
> with
> > >    different device capabilities and preference information.  These
> > >    device capabilities include hardware and software characteristics,
> as
> > >    well as dynamic user and user agent preferences.  Historically,
> > >
> > > nit: should "user-agent" be hyphenated?
> > >
> >
> > In web specifications it typically isn't
> > <https://infra.spec.whatwg.org/#user-agent>. RFC 7231
> > <https://tools.ietf.org/html/rfc7231> also doesn't seem to hyphen it.
>
> [I guess I should have mentioned "compound adjective" here rather than
> below, whoops.]
>
> >
> > >    applications that wanted to allow the server to optimize content
> > >    delivery and user experience based on such capabilities had to rely
> > >    on passive identification (e.g., by matching the User-Agent header
> > >
> > > nit: it feels like "allow the server" would be something that involves
> > > granting permission or the client sending an active signal (as proposed
> > > by this document), as opposed to just the apaplication that "wanted the
> > > server to optimize" and had to make do with such limited signal as was
> > > already available.
> > >
> >
> > OK. Removing "allow the".
> >
> >
> > >
> > >    field (Section 5.5.3 of [RFC7231]) against an established database
> of
> > >    user agent signatures), use HTTP cookies [RFC6265] and URL
> > >
> > > nit: hyphenate user-agent again, used as an adjective.
> > >
> >
> > TIL: compound adjective
> > <https://www.grammarbook.com/punctuation/hyphens.asp#:~:text=Rule%201
> .,is%20called%20a%20compound%20adjective.&text=When%20a%20compound%20adjective%20follows,hyphen%20is%20usually%20not%20necessary.>
> > Done!
> >
> > >
> > >    o  User agent detection cannot reliably identify all static
> > >       variables, cannot infer dynamic user agent preferences, requires
> > >       external device database, is not cache friendly, and is reliant
> on
> > >
> > > nit: singular/plural mismatch ("an external device database" or
> > > "external device databases")
> > >
> >
> > Done
> >
> > >
> > >    o  Cookie-based approaches are not portable across applications and
> > >       servers, impose additional client-side latency by requiring
> > >       JavaScript execution, and are not cache friendly.
> > >
> > > (I think I missed a step in why a cookie-based approach inherently
> > > requires javascript execution, though maybe it doesn't matter.)
> > >
> >
> > Essentially, if you want to dynamically set your cookies based on
> > client-side information, you need javascript to do that.
>
> Ah, I think I am starting to see, now.  I had in my head a more simplistic
> model where "user-agent sends a bunch of headers to the server, and the
> server puts the result of its analysis in a cookie", which doesn't really
> stand up to detailed scrutiny.
>
> >
> > >    Proactive content negotiation (Section 3.4.1 of [RFC7231]) offers an
> > >    alternative approach; user agents use specified, well-defined
> request
> > >    headers to advertise their capabilities and characteristics, so that
> > >
> > > Chasing the reference, it's not clear that it supports quite this
> strong
> > > of a statement: in addition to the explicit negotiation fields, it also
> > > allows using implicit characteristics such as client IP address and
> > > User-Agent.
> > >
> >
> > Would ending that section with the following work?
> > ", so that servers can select (or formulate) an appropriate response,
> based
> > on those request headers (or on other, implicit characteristics)."
>
> Yes, that would help, thanks.
>
> >
> > > Section 2.1
> > >
> > >    access of third parties to those same header fields.  Without such
> an
> > >    opt-in, user agents SHOULD NOT send high-entropy hints, but MAY send
> > >    low-entropy ones [CLIENT-HINTS-INFRASTRUCTURE].
> > >
> > > It looks like the reference only defines a registry for low-entropy
> > > hints, and we are inferring that any hints not listed in that table are
> > > to be treated as "high-entropy".  Perhaps we could reword both
> > > directions of this directive to refer only to the registry of
> > > low-entropy hints (e.g., "SHOULD NOT send hints that are not listed in
> > > [registry]")?
> > >
> >
> > Makes sense.
> >
> >
> > >
> > >    Implementers need to be aware of the passive fingerprinting
> > >    implications when implementing support for Client Hints, and follow
> > >    the considerations outlined in the Security Considerations
> > >    (Section 4) section of this document.
> > >
> > > side note: in some sense the Accept-CH mechanism transforms it from a
> > > passive to an active fingerprinting mechanism.
> > >
> >
> > Good point! Removed "passive" here.
> >
> >
> > >
> > > Section 2.2
> > >
> > >    information in them.  When doing so, and if the resource is
> > >    cacheable, the server MUST also generate a Vary response header
> field
> > >    (Section 7.1.4 of [RFC7231]) to indicate which hints can affect the
> > >    selected response and whether the selected response is appropriate
> > >    for a later request.
> > >
> > > side note: I suspect the answer I want is already present with a
> > > detailed reading of RFC 7231, but I wonder if it's worth saying
> > > something here about whether the Vary response header could/should
> > > include registered client hint header field names that were not present
> > > in the request in question.
> > >
> >
> > https://tools.ietf.org/html/rfc7231#section-7.1.4 implies that Vary can
> be
> > set to header names that are missing from the request. ("or lack
> thereof")
> > I'm not sure we should mention that explicitly here.
>
> Ah, thanks.
>
> >
> > > Section 3.1
> > >
> > >    Based on the Accept-CH example above, which is received in response
> > >    to a user agent navigating to "https://example.com", and delivered
> > >    over a secure transport, a user agent will have to persist an
> Accept-
> > >    CH preference bound to "https://example.com".  It will then use it
> > >
> > > What level of requirement is implied by "will have to" here?  IIUC,
> it's
> > > just that "if anything is persisted, it must be keyed on" but with no
> > > obligation to do any persistence.  If so, perhaps a wording like "any
> > > persisted Accept-CH preference will be bound to" would be better?
> > >
> >
> > The normative requirement in the paragraph above it is SHOULD.
> > I'll modify the wording to your suggested one.
> >
> >
> > >
> > >    for navigations to e.g. "https://example.com/foobar.html", but not
> to
> > >    e.g. "https://foobar.example.com/".  It will similarly use the
> > >    preference for any same-origin resource requests (e.g. to
> > >
> > > nit: comma after "e.g." (throughout).
> > >
> >
> > OK
> >
> >
> > >
> > >    "https://example.com/image.jpg") initiated by the page constructed
> > >    from the navigation's response, but not to cross-origin resource
> > >    requests (e.g. "https://thirdparty.com/resource.js").  This
> > >    preference will not extend to resource requests initiated to
> > >    "https://example.com" from other origins (e.g. from navigations to
> > >    "https://other-example.com/").
> > >
> > > Perhaps thirdparty.example and other.example, to stay within the BCP32
> > > space?
> > >
> >
> > Done
> >
> >
> > >
> > > Section 3.2
> > >
> > >    When selecting a response based on one or more Client Hints, and if
> > >    the resource is cacheable, the server needs to generate a Vary
> > >    response header field ([RFC7234]) to indicate which hints can affect
> > >    the selected response and whether the selected response is
> > >    appropriate for a later request.
> > >
> > > Is BCP 14 language approprite here?
> > >
> >
> > Indeed. Changed to SHOULD.
> >
> >
> > >    Above example indicates that the cache key needs to include the Sec-
> > >    CH-Example header field.
> > >
> > > nit: please add the article "the" to make this a complete sentence.
> > >
> >
> > Yup
> >
> >
> > >
> > > Section 4
> > >
> > > While I don't expect that I can tell the major browser vendors anything
> > > new about the privacy considerations to client hints, I do think that
> we
> > > should give some guidance to implementors of other HTTP clients, who
> may
> > > not have such extensive depth of knowlege, on the general landscape in
> > > which this mechanism is set.  The subsections hereof do a great job
> > > covering a lot of relevant details and specific factors to consider;
> > > thank you!  I think it may also be appropriate to have some more
> generic
> > > lead-in text, noting that in the worst case, merely converting a
> passive
> > > fingerprinting mechanism to an active fingerprinting mechanism with
> > > server opt-in does not actually provide any privacy benefit (the worst
> > > case being when all servers ask for all the data and clients accede)!
> > > While we might hope that the need to jump through an extra hoop to
> > > access fingerprinting information might dissuade some servers from
> > > asking for it, it seems imprudent to assume that it will happen, so in
> > > order to obtain real privacy benefit there needs to be some additional
> > > policy controls in the client and in what hints are
> defined/implemented.
> > > As I mentioned already, we already have a lot of the details for how to
> > > apply such policy controls, and limitations to only define hints that
> > > expose information already available in other means; what I'd like to
> > > see is the high-level picture that ties them together.
> > >
> > >
> > OK. Added something. I'd appreciate your review to see if it matches what
> > you had in mind.
> >
> >
> > > Section 4.1
> > >
> > >    upon it.  The header-based opt-in means that we can remove passive
> > >    fingerprinting vectors, such as the User-Agent string (enabling
> > >    active access to that information through User-Agent Client Hints
> > >    [4]), or otherwise expose information already available through
> > >
> > > I think this [4] is the same as [UA-CH].
> > >
> >
> > It's pointing to a specific section of UA-CH. I'm not sure if this is
> > critical.
>
> I'm not, either; let's leave it to the RFC Editor.
>
> >
> > >
> > > Also, use of the first person ("we") is somewhat unusual in RFC style.
> > >
> >
> > Changed.
> >
> >
> > >
> > >    Therefore, features relying on this document to define Client Hint
> > >    headers MUST NOT provide new information that is otherwise not
> > >    available to the application via other means, such as existing
> > >    request headers, HTML, CSS, or JavaScript.
> > >
> > > As written, this is a fairly weird condition.  What constitutes
> > > "available to the application via other means"?  Does "put up an
> > > interstitial until the user provides the information in question"
> count?
> > >
> >
> > Changed to "not made available to the application by the user agent"
> >
> >
> > >
> > >    o  Entropy - Exposing highly granular data can be used to help
> > >       identify users across multiple requests to different origins.
> > >       Reducing the set of header field values that can be expressed, or
> > >       restricting them to an enumerated range where the advertised
> value
> > >       is close but is not an exact representation of the current value,
> > >
> > > nit: "close to" seems like it would scan better.
> > >
> >
> > Yup
> >
> >
> > >
> > >    Different features will be positioned in different points in the
> > >    space between low-entropy, non-sensitive and static information
> (e.g.
> > >    user agent information), and high-entropy, sensitive and dynamic
> > >    information (e.g. geolocation).  User agents need to consider the
> > >    value provided by a particular feature vs these considerations, and
> > >    MAY have different policies regarding that tradeoff on a per-feature
> > >    basis.
> > >
> > > How about on a per-origin basis (and, e.g., domain reputation)?  An
> > > "entropy budget" where an origin that asks for too many distinct hints
> > > won't get all of them?
> > >
> >
> > Those are definitely policies that user agents can apply (e.g. one
> concrete
> > proposal that looks a lot like your "entropy budget" is
> > https://github.com/bslassey/privacy-budget)
>
> Maybe "per-feature or other fine-grained basis"?  Just a thought, and I
> don't mind leaving it as-is.
>

Makes sense. Added!


>
> >
> > > (I also wonder if a descriptive "may wish to have" is better than the
> > > normative "MAY", here.)
> > >
> >
> > Sure.
> >
> > >
> > >    o  Implementers SHOULD restrict delivery of some or all Client Hints
> > >       header fields to the opt-in origin only, unless the opt-in origin
> > >       has explicitly delegated permission to another origin to request
> > >       Client Hints header fields.
> > >
> > > Am I reading things right that this document does not define any such
> > > delegation mechanisms but is just admitting the possibility of such
> > > mechanisms being defined in the future?  I'd suggest clarifying up in
> > > ยง2.1 with a parenthetical (akin to the "outlined below" note about the
> > > opt-in mechanism).
> > >
> >
> > Added an "(as outlined in {{CLIENT-HINTS-INFRASTRUCTURE}})" clarification
> > to 2.1
> >
> >
> > >    Implementers SHOULD support Client Hints opt-in mechanisms and MUST
> > >    clear persisted opt-in preferences when any one of site data,
> > >    browsing history, browsing cache, cookies, or similar, are cleared.
> > >
> > > Who is the target audience for this SHOULD?  If it's just "people
> > > implementing this document", it seems ineffectual, and if it's any
> > > broader scope it seems unenforcable.
> > >
> >
> > Removed the SHOULD here as it's already defined elsewhere that high
> entropy
> > hints require an opt-in.
> > Also changed "implementers" to "user agents".
> >
> >
> > > Section 4.3
> > >
> > >    Research into abuse of Client Hints might look at how HTTP responses
> > >    that contain Client Hints differ from those with different values,
> > >
> > > nit: what are "responses that contain Client Hints"?  We have discussed
> > > Accept-CH header fields in responses, and client hints in requests, but
> > > the only mention I recall of hints in responses was in the Vary header
> > > field, and it's not clear that that is what was intended.
> > >
> >
> > Good catch! Changed to "responses to requests that contain Client Hints".
> >
> >
> > > Section 5
> > >
> > >    While HTTP header compression schemes reduce the cost of adding HTTP
> > >    header fields, sending Client Hints to the server incurs an increase
> > >    in request byte size.  Servers SHOULD take that into account when
> > >
> > > nit: I wonder if this would be more clear as:
> > >
> > > % Sending Client Hints to the server incurs an increase in request byte
> > > % size.  Some of this increase can be mitigated by HTTP header
> > > % compression schemes, but each new hint will still lead to some
> > > % increased bandwidth usage.  Servers SHOULD [...]
> > >
> >
> > Changed.
> >
> > >
> > > Section 7.1
> > >
> > > I'm not sure I understand why [FETCH] is listed as a normative
> > > reference.
> > >
> >
> > Moved it to be informative.
> >
> >
> > >
> > > I find it amusing that we reference both 7231 and 7234 for Vary, though
> > > to my untrained eye the current references both seem appropriate in
> > > their respective locations.
> > >
> > > Section 7.2
> > >
> > > If [CLIENT-HINTS-INFRASTRUCTURE] is to be the source of truth for
> > > low-entropy (and, by deduction) high-entropy hints, it seems like it
> > > should be normative.
> > >
> >
> > Moved.
>
> Thanks for the updates!
> I will take a look at the github PR now.
>
> -Ben
>

Received on Thursday, 18 June 2020 10:48:03 UTC