Re: New Version Notification for draft-kamp-httpbis-structure-01.txt (fwd) from Poul-Henning Kamp on 2016-11-11 (ietf-http-wg@w3.org from October to December 2016)

From: Poul-Henning Kamp <phk@phk.freebsd.dk>
Date: Fri, 11 Nov 2016 13:42:54 +0000
To: Julian Reschke <julian.reschke@gmx.de>
cc: HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <34041.1478871774@critter.freebsd.dk>

--------
In message <f8b3b877-6b6e-f002-f237-311e91b86d82@gmx.de>, Julian Reschke writes:

>        ascii_string = * %x20-7e
>                # This is a "safe" string in the sense that it
>                # contains no control characters or multi-byte
>                # sequences.  If that is not fancy enough, use
>                # unicode_string.
>
>        unicode_string = * unicode_codepoint
>                # XXX: Is there a place to import this from ?
>                # Unrestricted unicode, because there is no sane
>                # way to restrict or otherwise make unicode "safe".
>
>It's not clear why there's even a distinction...

To give designers of HTTP headers a trivial way to define strings
which are "safe" and free from needless complexity vs. strings where
the recipient should be prepared to deal with BOM, RLM and "MAN
IN BUSINESS SUIT LEVITATING".

>Also, it needs to be stated whether the grammar is octet or character 
>based. For an abstract datamodel, the latter probably makes more sense.

The abstract data model is abstract, so it is obviously neither.

For the h1 serialization, I don't see how it makes a difference, unless
somebody is running HTTP/1 in EBCDIC or Morse-code ?

>        h1_common-structure-header =
>                ( field-name ":" OWS ">" h1_common_structure "<" )
>                        # Self-identifying HTTP headers
>                ( field-name ":" OWS h1_common_structure ) /
>                        # legacy HTTP headers on white-list, see {{iana}}
>
>Do not mix message block ABNF with field value ABNF. Just define what's 
>inside the field value.
>
>        h1_element = identifier * (";" identifier ["=" h1_value])
>
>Shouldn't the second "identifier" be "token"?

yes, probably.

>How would a generic recipient decide whether it needs to handle "\u"? 
>What's the point of having different ABNF productions?

Based on the definition/data-dictionary of the header in question.

Remember:  This is only the data-model/h1-serialization, for each
HTTP header, it will (still) be necessary to define what the data
is/can be.

>Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear 
>that this is a good idea for HTTP/1, 

Neither is it obvious that it is going to cause any problems.

For reasons of transmission efficiency, I'm not keen on mandating
\uXXXX for all non-ascii unicode unless experimentation on the live
indicates that we have to, or if we decide for reasons of purity
that HTTP/1 can never have the high bit set.

Either way, another good reason to keep the "safe" string type.

>Introduction.
>
>        h1_common_structure = ">" h1_common_structure "<"
>
>That's a bit too recursive

No, that is deliberately making recursion possible, as the simplest
possible way to define complex datastructures.

>(speaking of which: "_" isn't allowed in ABNF names)

Noted.

-- 
Poul-Henning Kamp       | UNIX since Zilog Zeus 3.20
phk@FreeBSD.ORG         | TCP/IP since RFC 956
FreeBSD committer       | BSD since 4.3-tahoe    
Never attribute to malice what can adequately be explained by incompetence.

Received on Friday, 11 November 2016 13:43:25 UTC