Re: New Version Notification for draft-kamp-httpbis-structure-01.txt (fwd) from Julian Reschke on 2016-11-11 (ietf-http-wg@w3.org from October to December 2016)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Fri, 11 Nov 2016 13:45:44 +0100
To: Poul-Henning Kamp <phk@critter.freebsd.dk>, HTTP Working Group <ietf-http-wg@w3.org>
Message-ID: <f8b3b877-6b6e-f002-f237-311e91b86d82@gmx.de>

On 2016-10-30 19:59, Poul-Henning Kamp wrote:
> Updated in preparation for WG discussion in Seoul.
>
> Minor changes only.
>
> Github repo: 	https://github.com/bsdphk/id_common_structure

Thanks!

Find below some (unstructured) feedback:

    For textual formats, such as JSON, the format must first be neutered
    to not violate field-value's ABNF, and then workarounds added to
    reintroduce the features just lost, for instance UNICODE strings, and
    suddenly it is no longer JSON anymore.

That's misleading.

a) yes, it needs to be "neutered", but that's a simple character 
replacement operation, so it can be run on an existing JSON text.

b) the features that are lost are essentially line breaks and raw 
non-ASCII characters; however, the transmitted string would *still* be JSON.

        ascii_string = * %x20-7e
                # This is a "safe" string in the sense that it
                # contains no control characters or multi-byte
                # sequences.  If that is not fancy enough, use
                # unicode_string.

        unicode_string = * unicode_codepoint
                # XXX: Is there a place to import this from ?
                # Unrestricted unicode, because there is no sane
                # way to restrict or otherwise make unicode "safe".

It's not clear why there's even a distinction...

Also, it needs to be stated whether the grammar is octet or character 
based. For an abstract datamodel, the latter probably makes more sense.

        h1_common-structure-header =
                ( field-name ":" OWS ">" h1_common_structure "<" )
                        # Self-identifying HTTP headers
                ( field-name ":" OWS h1_common_structure ) /
                        # legacy HTTP headers on white-list, see {{iana}}

Do not mix message block ABNF with field value ABNF. Just define what's 
inside the field value.

        h1_element = identifier * (";" identifier ["=" h1_value])

Shouldn't the second "identifier" be "token"?

        h1_ascii_string = DQUOTE *(
                        ( "\" DQUOTE ) /
                        ( "\" "\" ) /
                        0x20-21 /
                        0x23-5B /
                        0x5D-7E
                        ) DQUOTE
                # This is a proper subset of h1_unicode_string
                # NB only allowed backslash escapes are \" and \\

        h1_unicode_string = DQUOTE *(
                        ( "\" DQUOTE )
                        ( "\" "\" ) /
                        ( "\" "u" 4*HEXDIG ) /
                        0x20-21 /
                        0x23-5B /
                        0x5D-7E /
                        UTF8-2 /
                        UTF8-3 /
                        UTF8-4
                        ) DQUOTE
                # This is UTF8 with HTTP1 unfriendly codepoints
                # (00-1f, 7f) neutered with \uXXXX escapes.

How would a generic recipient decide whether it needs to handle "\u"? 
What's the point of having different ABNF productions?

Also: this puts raw non-ASCII UTF-8 in the string value. It's not clear 
that this is a good idea for HTTP/1, but if it is, it could be done in 
JSON as well, which would eliminate one of the counter arguments in the 
Introduction.

        h1_common_structure = ">" h1_common_structure "<"

That's a bit too recursive (speaking of which: "_" isn't allowed in ABNF 
names)

Best regards, Julian

Received on Friday, 11 November 2016 12:46:19 UTC