Re: New Version Notification for draft-kamp-httpbis-structure-01.txt (fwd)

In message <>, Julian Reschke writes:

>        ascii_string = * %x20-7e
>                # This is a "safe" string in the sense that it
>                # contains no control characters or multi-byte
>                # sequences.  If that is not fancy enough, use
>                # unicode_string.
>        unicode_string = * unicode_codepoint
>                # XXX: Is there a place to import this from ?
>                # Unrestricted unicode, because there is no sane
>                # way to restrict or otherwise make unicode "safe".
>It's not clear why there's even a distinction...

To give designers of HTTP headers a trivial way to define strings
which are "safe" and free from needless complexity vs. strings where
the recipient should be prepared to deal with BOM, RLM and "MAN

>Also, it needs to be stated whether the grammar is octet or character 
>based. For an abstract datamodel, the latter probably makes more sense.

The abstract data model is abstract, so it is obviously neither.

For the h1 serialization, I don't see how it makes a difference, unless
somebody is running HTTP/1 in EBCDIC or Morse-code ?

>        h1_common-structure-header =
>                ( field-name ":" OWS ">" h1_common_structure "<" )
>                        # Self-identifying HTTP headers
>                ( field-name ":" OWS h1_common_structure ) /
>                        # legacy HTTP headers on white-list, see {{iana}}
>Do not mix message block ABNF with field value ABNF. Just define what's 
>inside the field value.
>        h1_element = identifier * (";" identifier ["=" h1_value])
>Shouldn't the second "identifier" be "token"?

yes, probably.

>How would a generic recipient decide whether it needs to handle "\u"? 
>What's the point of having different ABNF productions?

Based on the definition/data-dictionary of the header in question.

Remember:  This is only the data-model/h1-serialization, for each
HTTP header, it will (still) be necessary to define what the data
is/can be.

>Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear 
>that this is a good idea for HTTP/1, 

Neither is it obvious that it is going to cause any problems.

For reasons of transmission efficiency, I'm not keen on mandating
\uXXXX for all non-ascii unicode unless experimentation on the live
indicates that we have to, or if we decide for reasons of purity
that HTTP/1 can never have the high bit set.

Either way, another good reason to keep the "safe" string type.

>        h1_common_structure = ">" h1_common_structure "<"
>That's a bit too recursive

No, that is deliberately making recursion possible, as the simplest
possible way to define complex datastructures.

>(speaking of which: "_" isn't allowed in ABNF names)


Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Received on Friday, 11 November 2016 13:43:25 UTC