Re: identifier in draft-ietf-httpbis-header-structure-01.txt from Alex Rousskov on 2017-05-01 (ietf-http-wg@w3.org from April to June 2017)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Mon, 1 May 2017 14:49:42 -0600
To: HTTP working group mailing list <ietf-http-wg@w3.org>
Message-ID: <80607bb9-4e60-b4a3-c709-7c876556fff6@measurement-factory.com>
On 05/01/2017 11:41 AM, Poul-Henning Kamp wrote:
> Alex Rousskov writes:
>>>> [...] and we should focus on a small set of unambiguous elements [...]

>>> That is simply not possible for existing headers.

>> Can we define all existing headers using a small set of unambiguous
>> elements? Probably not, but that should not be our goal. We only need to
>> cover common usage of common fields plus (nearly) all future ones.

> We can come very close, but you will always need to know what
> header you are parsing to know how to parse it.
> 
> This is the "bespoke parsers" the draft talks about.

It is not an all-or-nothing choice. For example, if the draft grammar is
good enough, I can write an efficient generic key=value parser that
handles arbitrary key=value pairs and then reuse that general parser in
20 bespoke field parsers "as is", without letting it know which field is
being parsed. Will that approach cover all legacy headers? No. However,
it may cover the vast majority of the headers my agent needs.

On the other hand, if the draft grammar is not good enough, then I will
not be able to create and reuse an efficient generic key=value parser
(without redefining the grammar so that it becomes usable for that
purpose) and may end up with either a slow "everything is a token"
generic parser or end up with 20 bespoke field parsers, each parsing its
own key=value flavor. Neither is terrible, but we should be able to do
better.

Also, if the grammar says that everything that looks like a number is a
number and nothing else is, then I _can_ correctly (and efficiently!)
parse a number field without knowing that it is a number field. Will I
occasionally parse an extension field as a number while the sender was
sending what is an identifier to her? Yes! Should the draft care about
such rare cases that hurt nobody? No, it should not (as long as those
cases are indeed rare and hurt nobody).


> It would be a giant leap forward, if we could respecify the
> existing headers in a format which would allow those parsers
> to be machine generated directly from the spec.

I still think it is possible to define a grammar like that. Side note: I
do not think we should limit ourselves to grammars that some existing
parser generator tool(s) can grok without any human involvement. That
100% automation goal is likely to limit a human-friendly grammar too
much, making it less useful in practice.


>>> Current header definitions make 3.14159265 a valid "token":
>>
>> Yes, but _we_ do not have to define 3.14159265 as both "number" and
>> "identifier".

> For existing headers: No, we don't get to narrow their definition
> that way.

Why not?! For example, I do not understand why you insist that the draft
cannot write

  "q=" number

when defining an Accept field _and_ exclude "number" from "identifier".
I do not see why doing something like that will break or prohibit
something _important_. Please give an example of such breakage.


>> Yes, if I am building a generic parser that wants to understand senders
>> intent in _all_ legacy cases, then my parse tree cannot use some of the
>> draft elements "as is", but since understanding senders intent in all
>> cases is an unsolvable problem in legacy HTTP, we should not focus on
>> that esoteric use case too much!

> I don't think it is unsolveable, after all, the current headers have
> ABNF definitions and they may be flawed, but they are not random.

It is unsolvable because an extension header does not have a (known to a
generic parser) ABNF.

Sure, I can find the right ABNF if I want to support that extension in
my agent and add a custom parser for it (perhaps even generating one
from that ABNF!), but a generic parser can say nothing about senders
intent in this case. The extension field is just an opaque blob (or a
list of blobs, but even that is a risky step!) to such a parser. The
legacy grammar does not allow a generic parser to discover the extension
field structure. We can use a more strict grammar elements to provide
building blocks for the interesting/important cases.


Alex.
Received on Monday, 1 May 2017 20:50:12 UTC