- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Fri, 11 Nov 2016 13:45:44 +0100
- To: Poul-Henning Kamp <phk@critter.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
On 2016-10-30 19:59, Poul-Henning Kamp wrote: > Updated in preparation for WG discussion in Seoul. > > Minor changes only. > > Github repo: https://github.com/bsdphk/id_common_structure Thanks! Find below some (unstructured) feedback: For textual formats, such as JSON, the format must first be neutered to not violate field-value's ABNF, and then workarounds added to reintroduce the features just lost, for instance UNICODE strings, and suddenly it is no longer JSON anymore. That's misleading. a) yes, it needs to be "neutered", but that's a simple character replacement operation, so it can be run on an existing JSON text. b) the features that are lost are essentially line breaks and raw non-ASCII characters; however, the transmitted string would *still* be JSON. ascii_string = * %x20-7e # This is a "safe" string in the sense that it # contains no control characters or multi-byte # sequences. If that is not fancy enough, use # unicode_string. unicode_string = * unicode_codepoint # XXX: Is there a place to import this from ? # Unrestricted unicode, because there is no sane # way to restrict or otherwise make unicode "safe". It's not clear why there's even a distinction... Also, it needs to be stated whether the grammar is octet or character based. For an abstract datamodel, the latter probably makes more sense. h1_common-structure-header = ( field-name ":" OWS ">" h1_common_structure "<" ) # Self-identifying HTTP headers ( field-name ":" OWS h1_common_structure ) / # legacy HTTP headers on white-list, see {{iana}} Do not mix message block ABNF with field value ABNF. Just define what's inside the field value. h1_element = identifier * (";" identifier ["=" h1_value]) Shouldn't the second "identifier" be "token"? h1_ascii_string = DQUOTE *( ( "\" DQUOTE ) / ( "\" "\" ) / 0x20-21 / 0x23-5B / 0x5D-7E ) DQUOTE # This is a proper subset of h1_unicode_string # NB only allowed backslash escapes are \" and \\ h1_unicode_string = DQUOTE *( ( "\" DQUOTE ) ( "\" "\" ) / ( "\" "u" 4*HEXDIG ) / 0x20-21 / 0x23-5B / 0x5D-7E / UTF8-2 / UTF8-3 / UTF8-4 ) DQUOTE # This is UTF8 with HTTP1 unfriendly codepoints # (00-1f, 7f) neutered with \uXXXX escapes. How would a generic recipient decide whether it needs to handle "\u"? What's the point of having different ABNF productions? Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear that this is a good idea for HTTP/1, but if it is, it could be done in JSON as well, which would eliminate one of the counter arguments in the Introduction. h1_common_structure = ">" h1_common_structure "<" That's a bit too recursive (speaking of which: "_" isn't allowed in ABNF names) Best regards, Julian
Received on Friday, 11 November 2016 12:46:19 UTC