Re: Overlap between StreamReader and FileReader from Aymeric Vitte on 2013-07-31 (public-webapps@w3.org from July to September 2013)

From: Aymeric Vitte <vitteaymeric@gmail.com>
Date: Thu, 01 Aug 2013 00:27:50 +0200
To: Jonas Sicking <jonas@sicking.cc>
CC: Domenic Denicola <domenic@domenicdenicola.com>, Anne van Kesteren <annevk@annevk.nl>, Takeshi Yoshino <tyoshino@google.com>, Feras Moussa <feras.moussa@hotmail.com>, Travis Leithead <travis.leithead@microsoft.com>, Alex Russell <slightlyoff@google.com>, "Web Applications Working Group WG (public-webapps@w3.org)" <public-webapps@w3.org>, "i@izs.me" <i@izs.me>
Message-ID: <51F98F66.7080904@gmail.com>

I read quickly the thread but it seems like this is exactly the issue I 
had doing [1].

The use case was just decoding utf-8 html chunked buffers and modifying 
the content on the fly to stream it somewhere else.

It had to work inside browsers and with node (which as far as I know 
does not handle correctly this case, but I did not check latest evolutions)

The solution was [2], TextEncoder/Decoder with a super usefull streaming 
option.

[1] https://www.github.com/Ayms/node-Tor
[2] http://code.google.com/p/stringencoding/

Regards

Aymeric

Le 31/07/2013 21:20, Jonas Sicking a écrit :
> On Wed, Jul 31, 2013 at 10:17 AM, Domenic Denicola
> <domenic@domenicdenicola.com> wrote:
>> From: Anne van Kesteren [annevk@annevk.nl]
>>
>>> It seems though that if you can change the way bytes are consumed while reading a stream you will end up with problematic scenarios. E.g. you consume 2 bytes of a 4-byte utf-8 sequence. Then switch to reading code points... Instantiating a ByteStream or TextStream in advance would address that.
>> Yes, and I think I would actually prefer such an API honestly. But IIRC Jonas earlier wanted to be able to do both binary and text in the same stream (did he have a specific use case?), and presumably that motivated Node's design as well.
> I don't have very concrete use-cases in mind. But basically
> consumption of any format that contains both textual and binary data.
> If we don't think the world contains enough such formats to worry
> about, then maybe my use case isn't strong enough.
>
> I think both pdf and various microsoft document formats fall into this
> category though.
>
>> I guess you can just say that if you're in binary mode, you should know what you're doing, and know precisely when is the correct time to switch to string mode. If you switch in the middle of a four-byte sequence, you presumably meant to do so, and deserve to get back the mangled characters that result.
>>
>> To make this work might require some kind of "put the bytes back" primitive, to avoid a situation where you read "too far" in binary mode and want to back up a bit before you engage string mode. I guess this is Node.js's [unshift][1].
> Note that the "read too far" issue isn't text specific. When consuming
> any format which uses a terminator (null or any more complicated
> pattern) you will have to consume in minimal chunks, often
> byte-by-byte, to make sure you don't go past that terminator.
>
>> It would be cool to avoid all this though and just read either bytes or strings, without allowing switching. (Maybe, feed the byte stream into a string decoder transform, and get back a string stream?)
> Being able to convert between text and binary streams do work well
> when the whole stream is either textual or binary. It's not clear to
> me how to do it if you are dealing with a stream that contains both.
> Though I'd be interested to see proposals.
>
> / Jonas
>

-- 
jCore
Email :  avitte@jcore.fr
iAnonym : http://www.ianonym.com
node-Tor : https://www.github.com/Ayms/node-Tor
GitHub : https://www.github.com/Ayms
Web :    www.jcore.fr
Extract Widget Mobile : www.extractwidget.com
BlimpMe! : www.blimpme.com

Received on Wednesday, 31 July 2013 22:28:34 UTC