- From: Poul-Henning Kamp <phk@phk.freebsd.dk>
- Date: Fri, 11 Nov 2016 13:42:54 +0000
- To: Julian Reschke <julian.reschke@gmx.de>
- cc: HTTP Working Group <ietf-http-wg@w3.org>
-------- In message <f8b3b877-6b6e-f002-f237-311e91b86d82@gmx.de>, Julian Reschke writes: > ascii_string = * %x20-7e > # This is a "safe" string in the sense that it > # contains no control characters or multi-byte > # sequences. If that is not fancy enough, use > # unicode_string. > > unicode_string = * unicode_codepoint > # XXX: Is there a place to import this from ? > # Unrestricted unicode, because there is no sane > # way to restrict or otherwise make unicode "safe". > >It's not clear why there's even a distinction... To give designers of HTTP headers a trivial way to define strings which are "safe" and free from needless complexity vs. strings where the recipient should be prepared to deal with BOM, RLM and "MAN IN BUSINESS SUIT LEVITATING". >Also, it needs to be stated whether the grammar is octet or character >based. For an abstract datamodel, the latter probably makes more sense. The abstract data model is abstract, so it is obviously neither. For the h1 serialization, I don't see how it makes a difference, unless somebody is running HTTP/1 in EBCDIC or Morse-code ? > h1_common-structure-header = > ( field-name ":" OWS ">" h1_common_structure "<" ) > # Self-identifying HTTP headers > ( field-name ":" OWS h1_common_structure ) / > # legacy HTTP headers on white-list, see {{iana}} > >Do not mix message block ABNF with field value ABNF. Just define what's >inside the field value. > > h1_element = identifier * (";" identifier ["=" h1_value]) > >Shouldn't the second "identifier" be "token"? yes, probably. >How would a generic recipient decide whether it needs to handle "\u"? >What's the point of having different ABNF productions? Based on the definition/data-dictionary of the header in question. Remember: This is only the data-model/h1-serialization, for each HTTP header, it will (still) be necessary to define what the data is/can be. >Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear >that this is a good idea for HTTP/1, Neither is it obvious that it is going to cause any problems. For reasons of transmission efficiency, I'm not keen on mandating \uXXXX for all non-ascii unicode unless experimentation on the live indicates that we have to, or if we decide for reasons of purity that HTTP/1 can never have the high bit set. Either way, another good reason to keep the "safe" string type. >Introduction. > > h1_common_structure = ">" h1_common_structure "<" > >That's a bit too recursive No, that is deliberately making recursion possible, as the simplest possible way to define complex datastructures. >(speaking of which: "_" isn't allowed in ABNF names) Noted. -- Poul-Henning Kamp | UNIX since Zilog Zeus 3.20 phk@FreeBSD.ORG | TCP/IP since RFC 956 FreeBSD committer | BSD since 4.3-tahoe Never attribute to malice what can adequately be explained by incompetence.
Received on Friday, 11 November 2016 13:43:25 UTC