- From: Alex Rousskov <rousskov@measurement-factory.com>
- Date: Mon, 1 May 2017 14:49:42 -0600
- To: HTTP working group mailing list <ietf-http-wg@w3.org>
On 05/01/2017 11:41 AM, Poul-Henning Kamp wrote: > Alex Rousskov writes: >>>> [...] and we should focus on a small set of unambiguous elements [...] >>> That is simply not possible for existing headers. >> Can we define all existing headers using a small set of unambiguous >> elements? Probably not, but that should not be our goal. We only need to >> cover common usage of common fields plus (nearly) all future ones. > We can come very close, but you will always need to know what > header you are parsing to know how to parse it. > > This is the "bespoke parsers" the draft talks about. It is not an all-or-nothing choice. For example, if the draft grammar is good enough, I can write an efficient generic key=value parser that handles arbitrary key=value pairs and then reuse that general parser in 20 bespoke field parsers "as is", without letting it know which field is being parsed. Will that approach cover all legacy headers? No. However, it may cover the vast majority of the headers my agent needs. On the other hand, if the draft grammar is not good enough, then I will not be able to create and reuse an efficient generic key=value parser (without redefining the grammar so that it becomes usable for that purpose) and may end up with either a slow "everything is a token" generic parser or end up with 20 bespoke field parsers, each parsing its own key=value flavor. Neither is terrible, but we should be able to do better. Also, if the grammar says that everything that looks like a number is a number and nothing else is, then I _can_ correctly (and efficiently!) parse a number field without knowing that it is a number field. Will I occasionally parse an extension field as a number while the sender was sending what is an identifier to her? Yes! Should the draft care about such rare cases that hurt nobody? No, it should not (as long as those cases are indeed rare and hurt nobody). > It would be a giant leap forward, if we could respecify the > existing headers in a format which would allow those parsers > to be machine generated directly from the spec. I still think it is possible to define a grammar like that. Side note: I do not think we should limit ourselves to grammars that some existing parser generator tool(s) can grok without any human involvement. That 100% automation goal is likely to limit a human-friendly grammar too much, making it less useful in practice. >>> Current header definitions make 3.14159265 a valid "token": >> >> Yes, but _we_ do not have to define 3.14159265 as both "number" and >> "identifier". > For existing headers: No, we don't get to narrow their definition > that way. Why not?! For example, I do not understand why you insist that the draft cannot write "q=" number when defining an Accept field _and_ exclude "number" from "identifier". I do not see why doing something like that will break or prohibit something _important_. Please give an example of such breakage. >> Yes, if I am building a generic parser that wants to understand senders >> intent in _all_ legacy cases, then my parse tree cannot use some of the >> draft elements "as is", but since understanding senders intent in all >> cases is an unsolvable problem in legacy HTTP, we should not focus on >> that esoteric use case too much! > I don't think it is unsolveable, after all, the current headers have > ABNF definitions and they may be flawed, but they are not random. It is unsolvable because an extension header does not have a (known to a generic parser) ABNF. Sure, I can find the right ABNF if I want to support that extension in my agent and add a custom parser for it (perhaps even generating one from that ABNF!), but a generic parser can say nothing about senders intent in this case. The extension field is just an opaque blob (or a list of blobs, but even that is a risky step!) to such a parser. The legacy grammar does not allow a generic parser to discover the extension field structure. We can use a more strict grammar elements to provide building blocks for the interesting/important cases. Alex.
Received on Monday, 1 May 2017 20:50:12 UTC