Re: Declarative HTTP Spec Test Suite

Mohammed,

I am the author of https://github.com/ibnesayeed/webserver-tester repo,
which is one of the HTTP testing frameworks listed on the wiki page
discussed here. I created it to automate testing of student projects for a
web server design course https://cs531-f22.github.io/lectures/ I offered a
few times in which my students were tasked to implement a spec-compliant
HTTP/1.1 web server. The repo is not updated in a while because it works
well for the task.

While the repo contains a handful of test-suites targeted to course
assignments, the framework is generic enough to support many kinds of
HTTP requests/responses scenarios, including pipelines and auth workflows.
However, one needs to write necessary test-cases in Python, so it is not a
language-independent option as such. That said, the DSL for HTTP requests
is based on plain files with <PLACEHOLDERS> that are populated before the
request is made.

The system has both a CLI and an HTTP API (and a web UI) to perform tests
and see reports. While the test-cases are written in Python, it can be used
to test a running instance of a web server in any language (my students
were free to implement their web servers in the language of their choice).
It does report common issues like byte-size mismatch, non-standard
line-endings for headers, and spurious spaces where they are not supposed
to be.

It does have some limitations that may limit its usefulness for the purpose
you described, such as: 1) it only supports the text-based protocols (i.e.,
HTTP/1.1 and earlier), and 2) it cannot test HTTP clients as it is an HTTP
client itself.


Regards,

--
Dr. Sawood Alam
Research Lead, Wayback Machine
Internet Archive



On Wed, May 29, 2024 at 5:02 PM Mohammed Al Sahaf <
mohammed@caffeinatedwonders.com> wrote:

> Good catch! I did a retest with HTTP/1 and HTTP/2 split. I excluded h2c
> because setting it up is a hassle for me given my convenient tools; plus
> it's unsupported by browsers, so perhaps only curl will have results on
> that. The result appears more consistent now.
>
> +---------+---------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------+
> |         |                                                           HTTP/1                                                          |                                      HTTP/2                                        |
> |         +-------------------------------------------------------------+-------------------------------------------------------------+------------+-----------------------------------------------------------------------+
> |         |                             HTTP                            |                            HTTPS                            | HTTP (h2c) |                                HTTPS                                  |
> +=========+=============================================================+=============================================================+============+=======================================================================+
> | curl    | `curl: (18) transfer closed with 2 bytes remaining to read` | `curl: (18) transfer closed with 2 bytes remaining to read` |     N/A    | `(92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)` |
> |         |                                                             |                                                             |            |                                                                       |
> |         | response payload displayed                                  | response payload displayed                                  |            | response payload displayed                                            |
> +---------+-------------------------------------------------------------+-------------------------------------------------------------+------------+-----------------------------------------------------------------------+
> | Firefox | `NS_ERROR_PARTIAL_TRANSFER`                                 | `NS_ERROR_PARTIAL_TRANSFER`                                 |     N/A    | displays the full payload without reporting any errors                |
> |         |                                                             |                                                             |            |                                                                       |
> |         | response payload displayed                                  | response payload displayed                                  |            |                                                                       |
> +---------+-------------------------------------------------------------+-------------------------------------------------------------+------------+-----------------------------------------------------------------------+
> | Chrome  | `(failed)net::ERR_CONTENT_LENGTH_MISMATCH`                  | `(failed)net::ERR_CONTENT_LENGTH_MISMATCH`                  |     N/A    | displays the full payload without reporting any errors                |
> |         |                                                             |                                                             |            |                                                                       |
> |         | nothing displayed                                           | nothing displayed                                           |            |                                                                       |
> +---------+-------------------------------------------------------------+-------------------------------------------------------------+------------+-----------------------------------------------------------------------+
>
> Given the feedback on the other branches of this thread, I think it's best to scope my proposals to servers. As mentioned elsewhere, clients are harder to test. To quote Willi Tarreau:
>
> > testing clients is very difficult because contrary to
> > servers which just have to respond to sollicitations, someone has to act
> > on the client to run the desired tests, so the approach is different
> > (and different between various clients), and I'm not convinced that a
> > same test collection would work for all implementations due to this.
>
>
> All the best,
> Mohammed
>
> On Tuesday, May 28th, 2024 at 4:47 PM, David Benjamin <
> davidben@chromium.org> wrote:
>
> The results in scenario 2 sound off. Chrome shouldn't treat
> ERR_CONTENT_LENGTH_MISMATCH differently between HTTP and HTTPS. I'm not
> familiar with their implementation, but the Firefox results similarly don't
> make sense. Indeed it's quite important to enforce this over HTTPS, for
> HTTP/1.1, because that is what defends against truncation attacks. (In
> principle, TLS has the close_notify alert, but close_notify is, in
> practice, a fiction for HTTPS. Instead we must rely on in-protocol
> termination signals. For HTTP/1.1, one of those signals is Content-Length.)
> Also it's generally on HTTPS that one can be more strict, not less, because
> there are fewer intermediaries to worry about.
>
> Given some of your errors mention HTTP/2, I suspect you are comparing
> apples to oranges, and your HTTPS tests are testing HTTP/2. You mention the
> Go standard library, but keep in mind that Go automatically enables HTTP/2.
> The Content-Length header means very different things between HTTP/1.1 and
> HTTP/2. In HTTP/1.1, it is a critical part of framing and needs to be
> checked at that layer. (HTTP/1.1's framing is incredibly fragile. "Text"
> protocols are wonderful.) In HTTP/2, it has no impact on framing and was
> historically[0] considered advisory. The spec now considers it invalid,
> otherwise an h2-to-h1 intermediary will have problems. [1] discusses this.
> But that's where this mess with receivers enforcing dates to. Provided it
> doesn't cause you to turn around and send mis-framed HTTP/1.1, it is
> more-or-less safe, if sloppy, to accept it in HTTP/2.
>
> David
>
> [0]
> https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-http2-00#section-3.2.2
> [1] https://www.rfc-editor.org/rfc/rfc9110#section-8.6-11
>
> On Mon, May 27, 2024 at 7:30 AM Mohammed Al Sahaf <
> mohammed@caffeinatedwonders.com> wrote:
>
>> Hello,
>>
>> This is a proposal that is triggered by some of my involvement with the Caddy
>> web server <https://caddyserver.com/> project. We (Caddy team) have been
>> working towards developing a declarative test suite for the Caddy server.
>> The discussions (particularly a comment
>> <https://github.com/caddyserver/caddy/pull/6255#issuecomment-2088632219>
>> by a user) led me to believe it's best to bring up the HTTP spec compliance
>> parts with the HTTP WG for better insight and to have a common well-being
>> check for all members of the HTTP community.
>>
>> There are numerous RFCs governing HTTP and the behavior of its citizens.
>> Compliance to the RFCs is only validated through interoperability or manual
>> eyeing of the RFCs against the implementation. The RFCs, for good reasons,
>> are walls of texts and are akin to legalese when it comes to
>> interpretation. Consequently, nuanced sections are possibly missed without
>> visible failures due to being an edge-case. Having a specification defined
>> as a test suite in a declarative language removes much of ambiguity and
>> enables validation of conformance by the HTTP citizens. I hope to turn this
>> into an official proposal, but I'd like to put the draft forward for
>> discussion to solidify the approach and the scope first.
>>
>> *Motivation*:
>>
>> Conformance is an assurance of compatibility across the various
>> components of the web and gives confidence of breakage if any of them were
>> to change behavior or if the HTTP semantics were to change. Conformity can
>> also assist in optimization efforts. If the behavior is known for sure in
>> advance, certain optimizations can be applied.
>>
>> Secondly, it unifies the expectations of the community. Let's take for
>> example the HTTP semantics of the Content-Length header as defined in RFC
>> 9110 <https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length>.
>> The RFC states when servers and user-agents SHOULD, SHOULD NOT, MAY, MAY
>> NOT, AND MUST NOT send the Content-Length header, but it does not
>> specify how should either of them (server and user-agent) handle cases of
>> mismatch between content-length header value and actual content length of
>> the payload. I have run an unscientific poll on Twitter about the assumed
>> ideal behavior of a client if the Content-Length value does not match
>> the actual content-length of the body.
>>
>> First poll <https://twitter.com/MohammedSahaf/status/1792267681032253683>:
>> What's the ideal HTTP client (e.g. curl, browser) behavior when the
>> server includes more bytes in response body than stated in the
>> content-length header? e.g. "Content-Length: 2", actual body length: 3.
>>
>> *Responses* (19 responses):
>>
>>    - Ignore header; read fully (4 votes; 21.1%)
>>
>>
>>    - Read till content-length value (6 votes; 31.6%)
>>    -
>> *Abort/reject (9 votes; 47.4%) *
>>
>>
>> *Reality*:
>> When testing this scenario, I found the following:
>>
>>
>>    - curl aborts the connection, reporting "(18) transfer closed with 1
>>    bytes remaining to read" for *plaintext HTTP* connection, and "(92)
>>    HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)" for
>>    *HTTPS* connections.
>>
>>
>>
>>    - Firefox fails the transfer on *plaintext HTTP* with
>>    "NS_ERROR_NET_PARTIAL_TRANSFER"; but with *HTTPS* connection, it only
>>    reads and displays payload per the number of bytes stated in the header
>>    value.
>>
>>
>>
>>    - Chrome fails the transfer on *plaintext HTTP* with
>>    "(failed)net::ERR_CONTENT_LENGTH_MISMATCH"; but with *HTTPS*
>>    connection, it ignores the header value and displays the full payload.
>>
>>
>> Second Poll
>> <https://twitter.com/MohammedSahaf/status/1792267687411831063>: What's
>> the ideal HTTP client (e.g. curl, browser) behavior when the server
>> includes fewer bytes in response body than stated in the content-length
>> header? e.g. "Content-Length: 5", actual body length: 3
>>
>> *Responses* (18 responses):
>>
>>    - Ignore header; read 3 (7 votes; 38.9%)
>>
>>
>>    - Pad; with what? (0 votes; 0%)
>>
>>
>>    - *Reject/abort (9 votes; 50%)*
>>
>>
>>    - Other; comment (2 votes; 11.1%, none of them elaborated)
>>
>>
>> *Reality*:
>> When testing this scenario, I found the following:
>>
>>    - curl aborts the connection, reporting "(18) transfer closed with 1
>>    bytes remaining to read" for *plaintext HTTP* connection; *for HTTPS*,
>>    it prints the payload in full preceded by the message "(92) HTTP/2
>>    stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)".
>>
>>
>>
>>    - Firefox displays the full payload *for HTTPS* connections without
>>    reporting any errors. For *plaintext HTTP,* it displays the full
>>    payload but reports an error "NS_ERROR_NET_PARTIAL_TRANSFER"
>>
>>
>>
>>    - Chrome displays the full payload *for HTTPS* connections without
>>    reporting any errors. For *plaintext HTTP,* it fails to load the
>>    content and reports the error
>>    "(failed)net::ERR_CONTENT_LENGTH_MISMATCH".
>>
>>
>> Third Poll <https://twitter.com/MohammedSahaf/status/1792267692700827955>:
>> Assuming it's possible... What's the ideal HTTP client (e.g. curl,
>> browser) behavior when the server includes negative value in the
>> content-length header? e.g. "Content-Length: -2"
>>
>> *Responses* (21 responses):
>>
>>    - Ignore value (7 votes; 33.3%)
>>
>>
>>    -
>> *Reject/abort (14 votes; 66.7%) *
>>
>>
>> *Reality*:
>> I couldn't in effect test the scenario. I'm using Caddy for all the
>> scenarios, and the Go standard library doesn't set the Content-Length
>> header if it's set to a value of less than 0 (source
>> <https://github.com/golang/go/blob/377646589d5fb0224014683e0d1f1db35e60c3ac/src/net/http/server.go#L1201>
>> )
>>
>> Conclusion: The variation observed in the user agents and the, albeit
>> unscientific, poll responses show a lack of consensus on the expected
>> behavior. Each agent (human or machine) apply their own interpretations and
>> assumptions to the protocol. The disagreement makes the evolution of the
>> protocol difficult because of the varying expectations.
>>
>> *Method*:
>>
>> The test suite should be defined in declarative format that can be easily
>> composed by humans and read by machines. The declarative,
>> programming-language-agnostic format allows developers (RFC developers and
>> software developers) of all backgrounds to contribute without a
>> programming-language-based gate. In my research, I have found the
>> open-source tool hurl <https://hurl.dev/> to be a suitable tool for
>> defining HTTP server specification and to test it. It defines its own DSL
>> for the request/response patterns and may be run with --test flag along
>> with --report-{json,html,tap} to produce test results.
>>
>> The testing effort may be implemented in phases. The first phase is to
>> author the test suite in a common, public repository. This makes it
>> accessible for web server developers to clone it and run the test suite
>> against their own software. The second phase is to provide an interface for
>> automated testing and a UI to display the conformance summary of each web
>> server submitted to the list.
>>
>> *Challenges*:
>>
>> *Agnostic Tooling*: The first challenge is to find an HTTP client that
>> implements HTTP Semantics RFCs perfectly, otherwise its own idiosyncrasies
>> will get in the way of the validation. One would be tempted to default to
>> curl, especially that hurl uses curl under the hood. However, there is a
>> chance curl may have its own set of HTTP idiosyncrasies that may affect
>> the results of the test execution. Changes to curl are probably not
>> desired unless the subject behavior is confirmed to be a defect.
>> Involvement of curl is voluntary to curl, and the team may be looped and
>> involved into this initiative for comments if desired.
>>
>> *Suitable DSL (Domain Specific Language)*: The hurl DSL is decent. In my
>> experiment for a proof-of-concept, I found it lacking a few functions or
>> operations to be perfectly suitable, e.g. indicating optionality. hurl
>> has the advantage of its DSL grammar being defined as a spec with
>> deterministic parsing. Extensions and/or changes to hurl to accommodate
>> this effort is up to the hurl developers.
>>
>> *Prior Art*:
>>
>> There's a wiki page title HTTP Testing Resources
>> <https://github.com/httpwg/wiki/wiki/HTTP-Testing-Resources> under the
>> github.com/httpwg/wiki repository. The page contains the following note:
>>
>>    -
>>
>>    *Note that there is no official conformance test suite. These tests
>>    have not been vetted for correctness by the HTTP Working Group, and the
>>    authority for conformance is always the relevant specification.*
>>
>> Indeed some of the listed projects have not been updated for a while.
>> Worthy of note on the page is the cache-tests.fyi project, REDBot, and
>> httplint (which REDBot uses).
>>
>> The REDBot project (by Mark Nottingham) was used by one of Caddy users to
>> report a gap <https://github.com/caddyserver/caddy/issues/5849> in the
>> conformance, which was subsequently fixed. Using REDBot requires pointing
>> at a particular resource.
>>
>> The cache-tests.fyi is of keen interest for some inspiration of design.
>> The test suite is close in essence to this proposal, which is a declarative
>> suite that can be run by cache system developers to validate their
>> conformance to the cache related RFCs. The Souin
>> <https://github.com/darkweak/souin> caching system runs the
>> cache-tests.fyi test suite on every pull request to note its conformance
>> level and to watch for variations. The display of each system, the use
>> case, and its conformance status allows the users and the developers to
>> take appropriate actions. The website and the UI can be a phase-2
>> (long-term goal) of this proposal, but the details of how to set up the
>> system and run the tests can be postponed until more information is known
>> about the test suite itself.
>>
>> *Humble Attempt*:
>>
>> To test the idea and develop a proof-of-concept, I have managed to
>> convert 4 tests from REDBot (procedural, Python-based) to Hurl (declarative
>> format). The test suite contains a collection of test sets in nested
>> directory structure. The suite declares the URL it will call for each test
>> case so web servers can be configured accordingly for the subject URL. The
>> GitHub repository for the PoC is here: github.com/mohammed90/
>> http-semantics-test-suite. The repository currently does not have a
>> license applied as it's only for display of PoC, though I am inclined to
>> apply Apache-2 or any other open-source-compatible license once the
>> approach is agreed and finalized.
>>
>> All the best,
>> Mohammed
>> Blog <https://www.caffeinatedwonders.com/> | LinkedIn
>> <https://www.linkedin.com/in/mohammedalsahaf/>
>>
>
>

Received on Wednesday, 29 May 2024 21:56:29 UTC