Re: identifier in draft-ietf-httpbis-header-structure-01.txt from Alex Rousskov on 2017-05-01 (ietf-http-wg@w3.org from April to June 2017)

From: Alex Rousskov <rousskov@measurement-factory.com>
Date: Mon, 1 May 2017 09:44:59 -0600
To: HTTP working group mailing list <ietf-http-wg@w3.org>
Message-ID: <59f3d7c9-ecbc-6f32-0bad-985095f80bba@measurement-factory.com>
On 04/30/2017 12:50 PM, Poul-Henning Kamp wrote:
>>> Header Structure is not a syntactical specification, 

>> You have fooled several people into thinking that the draft specifies
>> syntax rules (among other things). 
>> [...]
>> IMHO, you have to decide where to stop:

> Exactly!

We may be talking past each other: You are discussing whether defining a
sequence of named dictionaries (in addition to the sequence building
blocks) is a good idea. I am discussing ambiguous grammar used to define
those building blocks. Both problems/questions are important and
answering one does not answer the other.


> The only syntax ("parser level") rule is "The data model of Common
> Structure is an ordered sequence of named dictionaries."

Perhaps our "parser" and "syntax" definitions differ, but the draft has
many other syntax ("parser level") rules IMO. The rule you cite as the
only one, is a parser-level rule only if it is based on syntax rules
that allow the parser (including the lexer/tokenizer, if any) to extract
the building blocks like "identifiers", "numbers", and "integers". I was
talking about needless ambiguity in that extraction.


> Maybe it should stop instead at the "lexical" level, and only define
> a family of "tokens", and leave it to the individual header-specifying
> documents to define their precise order?

A. If the goal is defining as many existing fields using as few rules as
possible, then forcing all fields into h1-common-structure is probably
the right approach. I do not think this should be the goal though (more
on that below), and having One Rule For All is irrelevant when it comes
to writing efficient HTTP parsers: If I want efficiency, I am not going
to define Date NG as a list of ambiguous elements!

B. If the goal is limiting needless token inventions while allowing
efficient parser designs, then h1-common-structure (as defined now) is
irrelevant and we should focus on a small set of unambiguous elements
(i.e., defining and optimizing syntactical vocabulary).

IMHO, goal A is invalid as stated. We should not pursue it! Instead, we
should turn it inside out and use h1-common-structure not as a rigid
structure for all future fields, but as an "understanding tool" of past
ones. Yes, we can describe many old fields (and many future ones!), as
h1-common-structure, and that insight _is_ important, but it is not that
useful for building efficient production parsers and defining new fields
(that can be efficiently parsed and correctly interpreted).

After we define a small set of unambiguous elements (goal B) for
building field grammars and define popular fields using our elements, we
can still limit needless structure inventions by adopting something like
the following (unpolished!) rules for any future field grammar:

* MUST be unambiguous
* MUST use flat comma-separated lists for any repetition
* SHOULD be derived from the grammar elements defined in the draft
* SHOULD avoid repetition
* SHOULD reuse one of the existing field grammars

As an HTTP agent developer, I do not really care much if a new field I
need to interpret uses one of the N supported field grammars or adds an
N+1st one: It is easy for me to add efficient support for another
(draft-compliant!) field grammar to my software because all the building
blocks are there, and I naturally have to add code to store that field
parsed/interpreted value anyway.

In summary: Keep something like h1-common-structure as an insight but
remove it from the syntax-level rules. The syntax level rules in the
draft should not attempt to unify everything under One Rule but provide
excellent building blocks for handling past and future fields. N good
rules is much better than one unusable rule! For future fields, focus on
unambiguous/efficient/correct parsing/interpretation, using new
constructs (like the inverted ">...<" bracketing you invented) as needed.


Cheers,

Alex.
Received on Monday, 1 May 2017 15:45:31 UTC