Declarative HTTP Spec Test Suite

Hello,

This is a proposal that is triggered by some of my involvement with the [Caddy web server](https://caddyserver.com/) project. We (Caddy team) have been working towards developing a declarative test suite for the Caddy server. The discussions (particularly a [comment](https://github.com/caddyserver/caddy/pull/6255#issuecomment-2088632219) by a user) led me to believe it's best to bring up the HTTP spec compliance parts with the HTTP WG for better insight and to have a common well-being check for all members of the HTTP community.

There are numerous RFCs governing HTTP and the behavior of its citizens. Compliance to the RFCs is only validated through interoperability or manual eyeing of the RFCs against the implementation. The RFCs, for good reasons, are walls of texts and are akin to legalese when it comes to interpretation. Consequently, nuanced sections are possibly missed without visible failures due to being an edge-case. Having a specification defined as a test suite in a declarative language removes much of ambiguity and enables validation of conformance by the HTTP citizens. I hope to turn this into an official proposal, but I'd like to put the draft forward for discussion to solidify the approach and the scope first.

Motivation:

Conformance is an assurance of compatibility across the various components of the web and gives confidence of breakage if any of them were to change behavior or if the HTTP semantics were to change. Conformity can also assist in optimization efforts. If the behavior is known for sure in advance, certain optimizations can be applied.

Secondly, it unifies the expectations of the community. Let's take for example the HTTP semantics of the Content-Length​ header as defined in [RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length)​. The RFC states when servers and user-agents SHOULD, SHOULD NOT, MAY, MAY NOT, AND MUST NOT send the Content-Length​ header, but it does not specify how should either of them (server and user-agent) handle cases of mismatch between content-length header value and actual content length of the payload. I have run an unscientific poll on Twitter about the assumed ideal behavior of a client if the Content-Length​ value does not match the actual content-length of the body.

[First poll](https://twitter.com/MohammedSahaf/status/1792267681032253683): What's the ideal HTTP client (e.g. curl, browser) behavior when the server includes more bytes in response body than stated in the content-length header? e.g. "Content-Length: 2", actual body length: 3.

Responses (19 responses):

- Ignore header; read fully (4 votes; 21.1%)

- Read till content-length value (6 votes; 31.6%)
- Abort/reject (9 votes; 47.4%)

Reality:
When testing this scenario, I found the following:

- curl​ aborts the connection, reporting "(18) transfer closed with 1 bytes remaining to read"​ for plaintext HTTP connection, and "(92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)"​ for HTTPS connections.

- Firefox fails the transfer on plaintext HTTP with "NS_ERROR_NET_PARTIAL_TRANSFER"​; but with HTTPS connection, it only reads and displays payload per the number of bytes stated in the header value.

- Chrome fails the transfer on plaintext HTTP with "(failed)net::ERR_CONTENT_LENGTH_MISMATCH"​; but with HTTPS connection, it ignores the header value and displays the full payload.

[Second Poll](https://twitter.com/MohammedSahaf/status/1792267687411831063): What's the ideal HTTP client (e.g. curl, browser) behavior when the server includes fewer bytes in response body than stated in the content-length header? e.g. "Content-Length: 5", actual body length: 3

Responses (18 responses):

- Ignore header; read 3 (7 votes; 38.9%)

- Pad; with what? (0 votes; 0%)

- Reject/abort (9 votes; 50%)

- Other; comment (2 votes; 11.1%, none of them elaborated)

Reality:
When testing this scenario, I found the following:

- curl​ aborts the connection, reporting "(18) transfer closed with 1 bytes remaining to read"​ for plaintext HTTP connection; for HTTPS, it prints the payload in full preceded by the message "(92) HTTP/2 stream 1 was not closed cleanly: PROTOCOL_ERROR (err 1)"​.

- Firefox displays the full payload for HTTPS connections without reporting any errors. For plaintext HTTP, it displays the full payload but reports an error "NS_ERROR_NET_PARTIAL_TRANSFER"

- Chrome displays the full payload for HTTPS connections without reporting any errors. For plaintext HTTP, it fails to load the content and reports the error "(failed)net::ERR_CONTENT_LENGTH_MISMATCH"​.

[Third Poll](https://twitter.com/MohammedSahaf/status/1792267692700827955): Assuming it's possible... What's the ideal HTTP client (e.g. curl, browser) behavior when the server includes negative value in the content-length header? e.g. "Content-Length: -2"

Responses (21 responses):

- Ignore value (7 votes; 33.3%)

- Reject/abort (14 votes; 66.7%)

Reality:
I couldn't in effect test the scenario. I'm using Caddy for all the scenarios, and the Go standard library doesn't set the Content-Length​ header if it's set to a value of less than 0 ([source](https://github.com/golang/go/blob/377646589d5fb0224014683e0d1f1db35e60c3ac/src/net/http/server.go#L1201))

Conclusion: The variation observed in the user agents and the, albeit unscientific, poll responses show a lack of consensus on the expected behavior. Each agent (human or machine) apply their own interpretations and assumptions to the protocol. The disagreement makes the evolution of the protocol difficult because of the varying expectations.

Method:

The test suite should be defined in declarative format that can be easily composed by humans and read by machines. The declarative, programming-language-agnostic format allows developers (RFC developers and software developers) of all backgrounds to contribute without a programming-language-based gate. In my research, I have found the open-source tool [hurl](https://hurl.dev/)​ to be a suitable tool for defining HTTP server specification and to test it. It defines its own DSL for the request/response patterns and may be run with --test​ flag along with --report-{json,html,tap}​ to produce test results.

The testing effort may be implemented in phases. The first phase is to author the test suite in a common, public repository. This makes it accessible for web server developers to clone it and run the test suite against their own software. The second phase is to provide an interface for automated testing and a UI to display the conformance summary of each web server submitted to the list.

Challenges:

Agnostic Tooling: The first challenge is to find an HTTP client that implements HTTP Semantics RFCs perfectly, otherwise its own idiosyncrasies will get in the way of the validation. One would be tempted to default to curl​, especially that hurl​ uses curl​ under the hood. However, there is a chance curl​ may have its own set of HTTP idiosyncrasies that may affect the results of the test execution. Changes to curl​ are probably not desired unless the subject behavior is confirmed to be a defect. Involvement of curl​ is voluntary to curl​, and the team may be looped and involved into this initiative for comments if desired.

Suitable DSL (Domain Specific Language): The hurl​ DSL is decent. In my experiment for a proof-of-concept, I found it lacking a few functions or operations to be perfectly suitable, e.g. indicating optionality. hurl​ has the advantage of its DSL grammar being defined as a spec with deterministic parsing. Extensions and/or changes to hurl​ to accommodate this effort is up to the hurl​ developers.

Prior Art:

There's a wiki page title [HTTP Testing Resources](https://github.com/httpwg/wiki/wiki/HTTP-Testing-Resources) under the github.com/httpwg/wiki repository. The page contains the following note:

-

Note that there is no official conformance test suite. These tests have not been vetted for correctness by the HTTP Working Group, and the authority for conformance is always the relevant specification.

Indeed some of the listed projects have not been updated for a while. Worthy of note on the page is the cache-tests.fyi project, REDBot, and httplint (which REDBot uses).

The REDBot project (by Mark Nottingham) was used by one of Caddy users [to report a gap](https://github.com/caddyserver/caddy/issues/5849) in the conformance, which was subsequently fixed. Using REDBot requires pointing at a particular resource.

The [cache-tests.fyi](https://cache-tests.fyi/) is of keen interest for some inspiration of design. The test suite is close in essence to this proposal, which is a declarative suite that can be run by cache system developers to validate their conformance to the cache related RFCs. The [Souin](https://github.com/darkweak/souin) caching system runs the cache-tests.fyi test suite on every pull request to note its conformance level and to watch for variations. The display of each system, the use case, and its conformance status allows the users and the developers to take appropriate actions. The website and the UI can be a phase-2 (long-term goal) of this proposal, but the details of how to set up the system and run the tests can be postponed until more information is known about the test suite itself.

Humble Attempt:

To test the idea and develop a proof-of-concept, I have managed to convert 4 tests from REDBot (procedural, Python-based) to Hurl (declarative format). The test suite contains a collection of test sets in nested directory structure. The suite declares the URL it will call for each test case so web servers can be configured accordingly for the subject URL. The GitHub repository for the PoC is here: github.com/mohammed90/http-semantics-test-suite. The repository currently does not have a license applied as it's only for display of PoC, though I am inclined to apply Apache-2 or any other open-source-compatible license once the approach is agreed and finalized.

All the best,
Mohammed
[Blog](https://www.caffeinatedwonders.com/) | [LinkedIn](https://www.linkedin.com/in/mohammedalsahaf/)

Received on Monday, 27 May 2024 11:25:54 UTC