Re: WiSH: A General Purpose Message Framing over Byte-Stream Oriented Wire Protocols (HTTP) from Loïc Hoguin on 2016-10-21 (ietf-http-wg@w3.org from October to December 2016)

From: Loïc Hoguin <essen@ninenines.eu>
Date: Fri, 21 Oct 2016 18:34:00 +0200
To: Takeshi Yoshino <tyoshino@google.com>
Cc: "ietf-http-wg@w3.org Group" <ietf-http-wg@w3.org>, Wenbo Zhu <wenboz@google.com>
Message-ID: <5541be74-afcc-6aef-404e-63acb2f608eb@ninenines.eu>
On 10/21/2016 03:04 PM, Takeshi Yoshino wrote:
> <snip>
> Good point. I was thinking about using the media type suffix convention
> like +json, +xml, etc. (RFC 6839, RFC 7303) when we need to represent
> the type of the messages. As the WebSocket subprotocol is required to
> conform to the token ABNF, they can be embedded into the subtype part.

The problem with "+json" and "+xml" is that they do not represent the 
type of the messages, just their encoding.

This is why we have media types like "application/problem+json" instead 
of using "application/json" for everything, where "application/problem" 
indicates what the representation contains, and "+json" indicates its 
encoding.

Going back to WiSH, having application/webstream+json would only give 
information about the encoding of the response and frames bodies, not 
what they actually contain or how an endpoint should process the frames. 
We would still need more info to process it.

> Regarding negotiation, my impression to the subprotocol mechanism of RFC
> 6455 has been that it's not really necessary thing. The server and JS
> (or native application) may negotiate application level sub-protocol
> using e.g. URL (some parameters in the query part or even the path part
> can encode such info) and the initial message sent following the
> handshake response immediately without any latency loss (this topic was
> discussed at the HyBi WG sometimes e.g.
> https://www.ietf.org/mail-archive/web/hybi/current/msg10347.html
> <https://www.ietf.org/mail-archive/web/hybi/current/msg10347.html>).

The good thing is that it's standard. There's only one way to select a 
sub-protocol, and you are guaranteed failure (or a normal HTTP response) 
if the endpoint doesn't speak the sub-protocol you want.

The use of a media type is precious to me as it makes it possible to 
write an autonomous client. Making the sub-protocol as part of the media 
type simplifies things greatly.

There are potential issues though and a number of things need to be defined:

* The method used to perform the request is important. A client will not 
be able to speak to the server if the method is GET. Similarly, no 
communication can occur if the method is HEAD. I would advise defining 
behavior for both GET and POST methods.

* The GET method would be used to enable a server->client stream, 
similar to text/event-stream, except with binary frames supported. The 
difference being the lack of event IDs and Last-Event-ID header. In some 
cases it doesn't matter.

* The POST method would be used to enable a bidirectional stream. But 
this implies that the client uses Content-Type: application/webstream in 
the request, along with the Accept header. Otherwise the server has no 
way to know what the request body will contain. Let content negotiation 
deal with both headers, it's already well defined.

* Technically this would allow client->server and server->client streams 
to select and use a different sub-protocol. I'm not sure it's worth 
preventing this in the spec; instead let the servers decide if they want 
to allow this. But we probably need to mention it.

* By the way, don't know if consistency is desirable, by maybe calling 
it application/web-stream is better. Maybe not.

* The HEAD method behaves as usual. The PUT method is probably not 
compatible with doing this. PATCH and DELETE are not compatible AFAIK.

> <snip>
> Oh, interesting. Does this mean that on the browser a JSON codec that
> takes / produces ArrayBuffers is used?

I have no idea about browsers, sorry. :-) Not dealing with them often. 
But as far as I understand it involves using a different type to get 
binaries, yes. I have no idea how this is done.

>     Which brings me to my question: do you think it could be worth
>     adding a note to implementers that perhaps they should consider
>     optionally disabling the UTF-8 check when JSON is expected for text
>     frames?
>
>
> We could have removed the valid UTF-8 requirement of RFC 6455. This is
> something need to be enforced at the binding between the Web API and a
> WebSocket protocol stack, but not at the protocol level.
>
> Until
> https://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-00
> <https://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-00>,
> text frames had to be a valid UTF-8 (strictly speaking, data had to be
> 0xFF-free), but not from the version 01.
>
> We can choose to omit the valid UTF-8 requirement from the WiSH spec and
> instead have it in the spec for gluing WiSH with the WebSocket API in
> the future. Then, server implementors won't be explicitly encouraged to
> check validness when implementing WiSH.

That sounds like a good idea. I think it should still be mentioned in 
the protocol spec (basically explaining what you said and referring to 
another document) so that implementors are aware of the UTF-8 check and 
decide whether to implement it. If it was only written in a browser RFC 
I would most likely never have seen it, personally.

-- 
Loïc Hoguin
https://ninenines.eu
Received on Friday, 21 October 2016 16:34:42 UTC