W3C home > Mailing lists > Public > public-expath@w3.org > March 2013

Re: Draft of Binary module

From: Adam Retter <adam@exist-db.org>
Date: Wed, 13 Mar 2013 14:29:03 +0000
Message-ID: <CAJKLP9ana-gpOCJ3=1t_G-v4sWRBkhcOc=7_HEc++Qhda7NksQ@mail.gmail.com>
To: Michael Kay <mike@saxonica.com>
Cc: public-expath@w3.org
>> Wow thats quite comprehensive :-)
>> I will need to digest it fully yet, but I have a few initial questions -
>>
>> 1) Why the use of xs:hexBinary when most other EXPath function
>> libraries (and in fact most 3rd party XQuery functions) I have seen
>> use xs:base64Binary? Converting from one to the other is something
>> that you *really* dont want to have to do, especially for large files!
>
> I think converting between hexBinary and base64Binary should be pretty much
> a no-op for most processors: the internal representation of the value is
> likely to be an immutable byte array, and conversion just means creating a
> new wrapper around the byte array. But it's a user inconvenience. Actually
> for input parameters, I don't see why we shouldn't accept either form.

Hehe well of course it depends on implementation. I would advise from
experience that you should not keep the data in RAM as an immutable
byte array, certainly in eXist this is what we used to do, as soon as
you have a few large files you will quickly run out of memory and
crash your processor. So what we rather do now, is in fact use an
InputStream with some clever stuff to make it re-readable at any time,
keeping a minimal amount in RAM and the rest either on disk or
elsewhere - this is all open source of course, so if your interested I
can point you at the code which should be re-useable outside eXist
too.

My concern regarding the conversion was rather that if a user puts an
intermediate step between xs:hexBinary and xs:base64Binary, perhaps
xs:string - it may not be clear to them but they will pay a heavy
price to encode the raw binary, to hexBinary, decode it and then
re-encode it as base64Binary. I really wish we could just settle on
one binary type in XQuery. I would be interested to know a little
history about why we have two of them Mike?

>> Im just reading through the rest now, my main concern is that these
>> operations can be done efficiently. I have been re-working the
>> implementation of the common Java code for the EXPath http module to
>> support streaming of large binary values and large string values. We
>> have customers that want to work with binary and text documents that
>> are several gigabytes each from XQuery.
>>
>>
> Interesting question. I don't know how efficient direct access to binary
> files is; if it's OK, then one could easily have an internal implementation
> of a base64Binary value that's mapped directly to a file rather than to
> memory, and perform all the operations directly on the file.

Yup. See above.

> But if
> efficiency means maintaining a current position in the file and reading
> what's at the current position, then that complicates the interface
> considerably. It could be done using higher-order functions, but would be a
> bit mind-blowing. Although we've got functions with side-effects in the File
> module, they are external side-effects, and I'd be reluctant to design
> anything with internal side-effects, e.g. on the current position of a file
> handle.

Well with the EXPath HTTP Client module, I did not need to re-design
the function signatures, merely I adapted the implementation to do
streaming. So my comment was more, that I would have a look and see if
there was anything we could not do using the proposed binary function
signatures in a streaming manner...

> Michael Kay
> Saxonica
>
>



-- 
Adam Retter

eXist Developer
{ United Kingdom }
adam@exist-db.org
irc://irc.freenode.net/existdb
Received on Wednesday, 13 March 2013 14:29:34 GMT

This archive was generated by hypermail 2.3.1 : Wednesday, 13 March 2013 14:29:34 GMT