Re: Bin Module does not work well with Streams

Hi Adam,

For me, it would be fine if the currents constraints were relaxed.

Regarding errors and streamed data, we encountered similar challenges
with other functions (e.g. with file:read-text), and we are passing on
error codes to streaming functions. I don’t know if that makes sense
for eXist-db (our implementation of bin:part supports no streaming).

All the best,
Christian



On Wed, Jun 7, 2017 at 5:09 PM, Adam Retter <adam.retter@googlemail.com> wrote:
> Okay so if I understand, you are saying, don't do the check up front,
> do the check afterwards and report if less than size was read?
>
> I can see the argument that there is value there for the user,
> however, it is very hard to implement for us because of the streaming
> nature.
>
> If we consider bin:part#3 it takes a xs:base64Binary and returns a
> xs:base64Binary. Internally for us it takes a stream and returns a
> stream, also we don't actually do anything with the stream until it is
> actually realised, this makes tracking the error very hard, in the
> face of nested functions on xs:base64Binary. I will give some thought
> to how we can catch the underlying IOException and relate it to the
> correct expression; it's tricky because effectively the `stream`
> escapes the scope of the enclosing expression.
>
> On 7 June 2017 at 10:39, Michael Kay <mike@saxonica.com> wrote:
>> array:subarray#3 has the same problem.
>>
>> I would have thought bin:part#3 is usually going to be used to read a chunk of say 4 or 8 bytes, in which case you want to know if it's reading off the end. I guess there's a scenario where you're reading TLV data and L is long. You still want an error if it takes you off the end. I don't think anyone's going to complain much if the error is deferred, but if they wanted to just read to the end of the stream, they would have used bin:part#2.
>>
>> Michael Kay
>> Saxonica
>>
>>
>>> On 7 Jun 2017, at 13:05, Adam Retter <adam.retter@googlemail.com> wrote:
>>>
>>> Hi there,
>>>
>>> I am at present implementing the bin module in eXist-db. However there
>>> are a few things in the spec which do not play nice when working with
>>> streams.
>>>
>>> In eXist a xs:base64Binary or xs:hexBinary is represented internally
>>> by a stream. We do this because binary values can be very large, for
>>> example when working with digital video or similar, as such it is
>>> undesirable to have to load all the binary data into memory to be able
>>> to work with it.
>>>
>>> My main issue is with the definitions of when bin:index-out-of-range
>>> should be thrown.
>>>
>>> If we consider just one definition of bin:index-out-of-range, the
>>> function bin:decode-string states:
>>>
>>> [bin:index-out-of-range] is raised if $offset is negative or $offset +
>>> $size is larger than the size of the binary data of $in.
>>>
>>> The problem with this is that we cannot perform the second check
>>> ($offset + $size < bin:length($in)) tup-front without reading the
>>> entire data stream of $in. Reading the entire datastream of $in is
>>> undesirable, as our streams also have efficient random positioning
>>> features, which otherwise allow us to efficiently just read a region
>>> of the stream.
>>>
>>> May I suggest that this constraint would be better relaxed, so that
>>> the definition for that function would be like:
>>>
>>> [bin:index-out-of-range] is raised if $offset is negative.
>>>
>>> If $offset + $size is greater than the size of $in, I think it is fine
>>> to just return data of length bin:length($in) - $offset.
>>>
>>> How does that sound?
>>>
>>>
>>>
>>>
>>> --
>>> Adam Retter
>>>
>>> skype: adam.retter
>>> tweet: adamretter
>>> http://www.adamretter.org.uk
>>>
>>
>
>
>
> --
> Adam Retter
>
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
>

Received on Wednesday, 7 June 2017 15:32:54 UTC