W3C home > Mailing lists > Public > public-expath@w3.org > June 2017

Re: Bin Module does not work well with Streams

From: Christian Grün <christian.gruen@gmail.com>
Date: Wed, 7 Jun 2017 17:32:00 +0200
Message-ID: <CAP94bnNuONYAERVNoFWxG0v8E+8Xr_7KXwiu5=KgqM7xnNGucA@mail.gmail.com>
To: Adam Retter <adam.retter@googlemail.com>
Cc: Michael Kay <mike@saxonica.com>, EXPath ML <public-expath@w3.org>
Hi Adam,

For me, it would be fine if the currents constraints were relaxed.

Regarding errors and streamed data, we encountered similar challenges
with other functions (e.g. with file:read-text), and we are passing on
error codes to streaming functions. I don’t know if that makes sense
for eXist-db (our implementation of bin:part supports no streaming).

All the best,

On Wed, Jun 7, 2017 at 5:09 PM, Adam Retter <adam.retter@googlemail.com> wrote:
> Okay so if I understand, you are saying, don't do the check up front,
> do the check afterwards and report if less than size was read?
> I can see the argument that there is value there for the user,
> however, it is very hard to implement for us because of the streaming
> nature.
> If we consider bin:part#3 it takes a xs:base64Binary and returns a
> xs:base64Binary. Internally for us it takes a stream and returns a
> stream, also we don't actually do anything with the stream until it is
> actually realised, this makes tracking the error very hard, in the
> face of nested functions on xs:base64Binary. I will give some thought
> to how we can catch the underlying IOException and relate it to the
> correct expression; it's tricky because effectively the `stream`
> escapes the scope of the enclosing expression.
> On 7 June 2017 at 10:39, Michael Kay <mike@saxonica.com> wrote:
>> array:subarray#3 has the same problem.
>> I would have thought bin:part#3 is usually going to be used to read a chunk of say 4 or 8 bytes, in which case you want to know if it's reading off the end. I guess there's a scenario where you're reading TLV data and L is long. You still want an error if it takes you off the end. I don't think anyone's going to complain much if the error is deferred, but if they wanted to just read to the end of the stream, they would have used bin:part#2.
>> Michael Kay
>> Saxonica
>>> On 7 Jun 2017, at 13:05, Adam Retter <adam.retter@googlemail.com> wrote:
>>> Hi there,
>>> I am at present implementing the bin module in eXist-db. However there
>>> are a few things in the spec which do not play nice when working with
>>> streams.
>>> In eXist a xs:base64Binary or xs:hexBinary is represented internally
>>> by a stream. We do this because binary values can be very large, for
>>> example when working with digital video or similar, as such it is
>>> undesirable to have to load all the binary data into memory to be able
>>> to work with it.
>>> My main issue is with the definitions of when bin:index-out-of-range
>>> should be thrown.
>>> If we consider just one definition of bin:index-out-of-range, the
>>> function bin:decode-string states:
>>> [bin:index-out-of-range] is raised if $offset is negative or $offset +
>>> $size is larger than the size of the binary data of $in.
>>> The problem with this is that we cannot perform the second check
>>> ($offset + $size < bin:length($in)) tup-front without reading the
>>> entire data stream of $in. Reading the entire datastream of $in is
>>> undesirable, as our streams also have efficient random positioning
>>> features, which otherwise allow us to efficiently just read a region
>>> of the stream.
>>> May I suggest that this constraint would be better relaxed, so that
>>> the definition for that function would be like:
>>> [bin:index-out-of-range] is raised if $offset is negative.
>>> If $offset + $size is greater than the size of $in, I think it is fine
>>> to just return data of length bin:length($in) - $offset.
>>> How does that sound?
>>> --
>>> Adam Retter
>>> skype: adam.retter
>>> tweet: adamretter
>>> http://www.adamretter.org.uk
> --
> Adam Retter
> skype: adam.retter
> tweet: adamretter
> http://www.adamretter.org.uk
Received on Wednesday, 7 June 2017 15:32:54 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 June 2017 15:32:54 UTC