- From: Alex Rousskov <rousskov@measurement-factory.com>
- Date: Mon, 1 May 2017 09:44:59 -0600
- To: HTTP working group mailing list <ietf-http-wg@w3.org>
On 04/30/2017 12:50 PM, Poul-Henning Kamp wrote: >>> Header Structure is not a syntactical specification, >> You have fooled several people into thinking that the draft specifies >> syntax rules (among other things). >> [...] >> IMHO, you have to decide where to stop: > Exactly! We may be talking past each other: You are discussing whether defining a sequence of named dictionaries (in addition to the sequence building blocks) is a good idea. I am discussing ambiguous grammar used to define those building blocks. Both problems/questions are important and answering one does not answer the other. > The only syntax ("parser level") rule is "The data model of Common > Structure is an ordered sequence of named dictionaries." Perhaps our "parser" and "syntax" definitions differ, but the draft has many other syntax ("parser level") rules IMO. The rule you cite as the only one, is a parser-level rule only if it is based on syntax rules that allow the parser (including the lexer/tokenizer, if any) to extract the building blocks like "identifiers", "numbers", and "integers". I was talking about needless ambiguity in that extraction. > Maybe it should stop instead at the "lexical" level, and only define > a family of "tokens", and leave it to the individual header-specifying > documents to define their precise order? A. If the goal is defining as many existing fields using as few rules as possible, then forcing all fields into h1-common-structure is probably the right approach. I do not think this should be the goal though (more on that below), and having One Rule For All is irrelevant when it comes to writing efficient HTTP parsers: If I want efficiency, I am not going to define Date NG as a list of ambiguous elements! B. If the goal is limiting needless token inventions while allowing efficient parser designs, then h1-common-structure (as defined now) is irrelevant and we should focus on a small set of unambiguous elements (i.e., defining and optimizing syntactical vocabulary). IMHO, goal A is invalid as stated. We should not pursue it! Instead, we should turn it inside out and use h1-common-structure not as a rigid structure for all future fields, but as an "understanding tool" of past ones. Yes, we can describe many old fields (and many future ones!), as h1-common-structure, and that insight _is_ important, but it is not that useful for building efficient production parsers and defining new fields (that can be efficiently parsed and correctly interpreted). After we define a small set of unambiguous elements (goal B) for building field grammars and define popular fields using our elements, we can still limit needless structure inventions by adopting something like the following (unpolished!) rules for any future field grammar: * MUST be unambiguous * MUST use flat comma-separated lists for any repetition * SHOULD be derived from the grammar elements defined in the draft * SHOULD avoid repetition * SHOULD reuse one of the existing field grammars As an HTTP agent developer, I do not really care much if a new field I need to interpret uses one of the N supported field grammars or adds an N+1st one: It is easy for me to add efficient support for another (draft-compliant!) field grammar to my software because all the building blocks are there, and I naturally have to add code to store that field parsed/interpreted value anyway. In summary: Keep something like h1-common-structure as an insight but remove it from the syntax-level rules. The syntax level rules in the draft should not attempt to unify everything under One Rule but provide excellent building blocks for handling past and future fields. N good rules is much better than one unusable rule! For future fields, focus on unambiguous/efficient/correct parsing/interpretation, using new constructs (like the inverted ">...<" bracketing you invented) as needed. Cheers, Alex.
Received on Monday, 1 May 2017 15:45:31 UTC