Re: consensus on :query ?

On 24 July 2014 11:19, Adrien de Croy <adrien@qbik.com> wrote:

>
> But it shouldn't require encoding to fit into a URI.  It's not just / and
> ? (rare in a real path since it's usually a wild-card character anyway)...
> it's more often unicode and spaces that had to be encoded.
>

HTTP defines URI schemes, which contains ASCII, not Unicode. We've never
mentioned HTTP IRIs, and I don't think it's reasonable at this stage to
unilaterally decide that all HTTP URIs must map to IRIs, via
percent-encoded UTF-8.

If you happen to map U+00E4 to "%C3%A4" when generating your URI that's
fine, but there's no mention in the HTTP spec (where it defines the URI
schemes) of un-percent-encoding such a URI, or that "%C3%A4" should map to
a UTF-8 "ä" (as opposed to, say, Windows-1252 "ä", or EBCDIC "Cu"). RFC
3986 says "a new URI scheme ... should first be encoded as octets according
to the UTF-8 character encoding; then ... percent-encoded" but I don't
think that applies here -- the http URI scheme pre-dates RFC 3986, even if
RFC 7230 doesn't.

​Similarly, not all implementations equate "+" and "%20"; I recall even
seeing some where they are interpreted as different classes of space
delimiters.​

If there were separate :path and :query headers, they'd have to both
contain ASCII strings, including any percent-encoded reserved or non-ASCII
octets. There's not much more we could do to binarise them. (Maybe decoding
non-reserved "%XX" triplets into single octets, although I think that's
worse than just leaving them as they are, especially with our Huffman codes
that do good things with % and 0-9, A-F.)


-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/

Received on Thursday, 24 July 2014 02:04:41 UTC