[whatwg] WebSocket bufferedAmount includes overhead or not from Niklas Beischer on 2010-03-31 (public-whatwg-archive@w3.org from March 2010)

From: Niklas Beischer <no@opera.com>
Date: Wed, 31 Mar 2010 09:06:47 +0300
Message-ID: <op.vae7dlezdopkj4@fozzie.gothenburg.osa>
On Tue, 30 Mar 2010 17:22:07 +0300, Jonas Sicking <jonas at sicking.cc> wrote:

> On Tue, Mar 30, 2010 at 1:51 AM, Niklas Beischer <no at opera.com> wrote:
>> On Tue, 30 Mar 2010 09:19:33 +0300, Jonas Sicking <jonas at sicking.cc>  
>> wrote:
>>>
>>> On Wed, Mar 24, 2010 at 2:33 PM, Ian Hickson <ian at hixie.ch> wrote:
>>>>
>>>> On Tue, 23 Mar 2010, Anne van Kesteren wrote:
>>>>>
>>>>> We (Opera) would prefer this too. I.e. to not impose details of the
>>>>> protocol on the API.
>>>>
>>>> If we're exposing nothing from the protocol, does that mean we  
>>>> shouldn't
>>>> be exposing that the string converts to UTF-8 either?
>>>
>>> While exposing the fact that strings are sent as UTF-8 does say
>>> something about the protocol, I think it's still much more protocol
>>> independent than including the message headers. The string has to be
>>> serialized in some way, and it seems unlikely that any newly developed
>>> protocol in the foreseeable future would use anything other than UTF-8
>>> as serialization.
>>
>> True, but if bufferedAmount does not byte for byte (or character for
>> character) match what is fed to the API, what does a few bytes
>> representing the current overhead matter? IIRC EcmaScript uses UTF-16,
>> which means that serialization will in most cases make the number of
>> actually buffered bytes differ from the number of bytes in the original
>> message buffer.
>
> EcmaScript doesn't do any serialization so I'm not sure what you mean  
> here?

I meant the serialization in the WebSocket. Unless the protocol  
implementation keeps track of exactly how its serialized buffer differs  
 from the original buffer it will not be able to give a correct answer to  
how much of the original buffer is left to transfer.


>> And just because we currently use UTF-8 for
>> serialization doesn't mean that will always be the case. Thus API users
>> cannot rely on performing their own conversion to UTF-8 just to find out
>> exactly how many characters in their message have been sent.
>
> My point was that using anything but UTF-8 is unlikely. So the
> opposite of what you're saying here.

So you're saying binary is out of the question?


>> The fact remains that, unless we want to force implementors of the
>> protocol to match each byte sent over the network with a specific
>> character in the original message handed to the API, bufferedAmount  
>> cannot
>> represent something unaffected by the protocol. And if we allow
>> bufferedAmount to be affected by the protocol, why not let it be decided
>> by the implementation whether or not to include protocol overhead?
>
> Making it implementation dependent is likely to lead to website
> incompatibilities. Such as:
>
> ws = new WebSocket(...);
> ws.onopen = function() {
>   ws.send(someString);
>   if (ws.bufferedAmount > X) {
>     doStuff();
>   }
> };
>
> If this is implementation dependent then the above could reliably call
> doStuff in one implementation, but reliably not another.

No, not reliably. The value of bufferedAmount may also depend on the  
system the implementation is running on as well as its current network  
connection. One combination might yield completely different results from  
another. If the developer is not aware of that his code may generate  
website incompatibilities between the same implementation running in two  
different environments. Or even in the same environment but at different  
times due to different network connection. Yes, my suggestion will add  
another variable to the equation. And yes, I admit, that is most of the  
time a really bad idea. But I maintain that it should make little  
difference in this case.

  /niklas


-- 
Niklas Beischer
Software Developer
Opera Software
Received on Tuesday, 30 March 2010 23:06:47 UTC