W3C home > Mailing lists > Public > ietf-http-wg@w3.org > October to December 2011

Is the ABNF for <request-target> in HTTPbis too general?

From: Adrian Custer <acuster@gmail.com>
Date: Fri, 14 Oct 2011 16:45:47 +0000
To: ietf-http-wg@w3.org
Message-ID: <1318610706.6696.430.camel@driftaway>
Editors of the HTTPbis specification,


In looking at the <request-target> element of the ABNF definition which
spans both HTTPbis Part 1 and RFC 3986, it appears that the ABNF allows
<Request-Line> elements of the illegal form
    GET http://server:80some/non/rooted/path?andquery
or
    GET http://server:80?query
both of which are missing the leading "/" character in the path.

This arises from the ABNF reusing the <hier-part> from the <URI>
definition which is too flexible when defining an <absolute-URI> element
for a <request-target>. Indeed, the only difference between the <URI>
and the <absolute-URI> elements in RFC 3986 is the presence of the
fragment which does not appear related to any notion of 'absolute.'




Starting with the definition of the <request-target>

     ;HTTPbis, section 4.1
  Request-Line   = Method SP request-target SP HTTP-Version CRLF

    ;HTTPbis, section 4.1.2
  request-target = "*"
                    / absolute-URI
                    / ( path-absolute [ "?" query ] )
                    / authority

we consider only those built with <absolute-URI>

    ;HTTPbis, Appendix B
  absolute-URI = <absolute-URI, defined in [RFC3986], Section 4.3>

so moving to RFC 3986, we have

    ;RFC 3986, section 4.3
  absolute-URI  = scheme ":" hier-part [ "?" query ]

    ;RFC 3986, Appendix A
  hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

these last two elements of <hier-part> seem to me to be illegal in HTTP
request messages, giving rise to my two original examples. (In passing,
clarifying parenthesization would have been useful in this definition.)




Merely for completeness in this discussion:

    ;RFC 3986, Appendix A
  authority     = [ userinfo "@" ] host [ ":" port ]
  path-abempty  = *( "/" segment )
  path-absolute = "/" [ segment-nz *( "/" segment ) ]
  path-rootless = segment-nz *( "/" segment )
  path-empty    = 0<pchar>
  segment       = *pchar
  segment-nz    = 1*pchar
  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"




This definition of <hier-part> surely arises to enable the definition of
the general URI

    ;RFC 3986, Appendix A
  URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

where it works since URIs are quite general. However, having
<path-rootless> and <path-empty> as options of the absolute-URI seems to
violate the notion of 'absolute paths' in absolute URI's and to violate
the spirit, if not the letter, of the discussion of absolute paths in
HTTPbis, part 1, section 4.1.2. Also, it seems to mean that the only
difference between absolute and general URI elements comes from the
optional element, [ "#" fragment ], which appears contrary to the
absolute path implied by the 'absolute' in the element name. Was this,
perhaps, part of the motivation in HTTPbis for defining its own
<http-URI> and <https-URI> without using the <hier-part> element?




As a quick guess, it would seem that <request-target> could be redefined
to:

  request-target = "*"
                    / http-URI
                    / https-URI
                    / ( path-absolute [ "?" query ] )
                    / authority

thereby solving the issue. Alternatively, we need a <abs-hier-part> and
a <new-absolute-URI> of the form:

  new-absolute-URI  = scheme ":" abs-hier-part [ "?" query ]

  abs-hier-part  = "//" authority path-abempty
                 / path-absolute

ignoring the element names which I pick arbitrarily for this discussion.
However, since I do not know of any schemes other than 'http' and
'https' being allowed in the HTTP request message <Request-Line>, the
former seems more appropriate, if inflexible going forwards.




While this issue seems valid to me, it is likely that I am confused
since my head is spinning from my long winded effort perusing a whole
stack of Internet standards and the late hour of the night. Now onto
sorting out the various definitions of <query> which is what I am really
after.


cheers and thanks,
~adrian


P.S. I'll try to keep an eye on the list archive but am not on the list.
Received on Tuesday, 18 October 2011 10:32:11 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Friday, 27 April 2012 06:51:48 GMT