- From: Willy Tarreau <w@1wt.eu>
- Date: Mon, 1 Aug 2016 12:44:05 +0200
- To: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Cc: HTTP Working Group <ietf-http-wg@w3.org>
On Mon, Aug 01, 2016 at 09:57:25AM +0000, Poul-Henning Kamp wrote: > -------- > In message <20160801085743.GB22715@1wt.eu>, Willy Tarreau writes: > > >That made me think that most of the header fields I'm seeing do not use > >non-ascii characters at all, I'd even say non-printable-ascii. Most of > >them contain : > > - host names (Host) > > - uris (Referer, Location) > > - user-agent strings (UA) > > - tokens (Connection, Accept, ...) > > - numbers > > > >Thus in fact I'm wondering if it's really worth focusing the efforts on > >non-ascii strings instead. > > My take is that the data-model and serialization should be general > and unconstrained, and the constraints be applied in a/the schema > for each individual header. But we're talking about protocol efficiency as well, which passes via taking into account what we have. We could for example consider the notion of "extended strings" which are only used for header fields which are not relevant to the protocol itself (eg: not used in accept/range/connection/...) and which would allow unicode to be safely transmitted. It might be used for user-agent if needed. Willy
Received on Monday, 1 August 2016 10:46:46 UTC