- From: Adam Retter <adam@exist-db.org>
- Date: Wed, 13 Mar 2013 17:20:34 +0000
- To: Michael Kay <mike@saxonica.com>
- Cc: public-expath@w3.org
A second stream of consciousness: I feel very uneasy about representing octets using integers. So I ask myself why? Is it history - i.e. eXist and Zorba (maybe others) represent octets as octal strings and allow for the conversion of octal to/from decimal with functions. An octet is implicitly in Base 8, so why would I then want to manipulate it as thought it were Base 10 (but without converting it to Base 10). This just doesnt make sense to me - If I understand correctly - bin:binary-to-octets(xs:hexBinary("FFFF")) would give me (255, 255) The problem is that I now have two Base 8 values in a Base 10 data type, and all of the operators I have to work with this stuff (xs:integer) operates on Base 10 and so will not understand what I am trying to do with my Base 2 stuff. Rather if you want to go to base10 then bin:binary-to-radix(xs:hexBinary("FFFF"), 10) could give me 65535. At least when its in Base 10 I can do math with it using the standard XQuery operators and then convert it base to xs:hexBinary if I wish too. I do not like values in disguise! All of the bitwise operators seem to operate on xs:hexBinary, so how would I even work with these "octets" as integers? Perhaps we dont even need an binary-to-octets function? Is there a use-case? If we need it, perhaps a better representation of 'octets' is to use a representation which is in the correct base i.e. base 8, and so which should use an octal string (which could start with a 0 or similar) e.g. - bin:binary-to-octets(xs:hexBinary("FFFF")) would give me "0177777" I recognise that the underlying problem is that XQuery does not have a 'byte' type, however using xs:integer for this feels wrong to me somehow (at least if you dont convert to the correct base). Also from a selfish implementation point of view - using an xs:integer for representing a single byte is very wasteful in terms of memory. xs:integer is unbounded in scope, and will typically be at least 32 bits, whereas a byte is just 8 bits and a character in a string can be just 8 or 16 bits. On 13 March 2013 14:29, Adam Retter <adam@exist-db.org> wrote: >>> Wow thats quite comprehensive :-) >>> I will need to digest it fully yet, but I have a few initial questions - >>> >>> 1) Why the use of xs:hexBinary when most other EXPath function >>> libraries (and in fact most 3rd party XQuery functions) I have seen >>> use xs:base64Binary? Converting from one to the other is something >>> that you *really* dont want to have to do, especially for large files! >> >> I think converting between hexBinary and base64Binary should be pretty much >> a no-op for most processors: the internal representation of the value is >> likely to be an immutable byte array, and conversion just means creating a >> new wrapper around the byte array. But it's a user inconvenience. Actually >> for input parameters, I don't see why we shouldn't accept either form. > > Hehe well of course it depends on implementation. I would advise from > experience that you should not keep the data in RAM as an immutable > byte array, certainly in eXist this is what we used to do, as soon as > you have a few large files you will quickly run out of memory and > crash your processor. So what we rather do now, is in fact use an > InputStream with some clever stuff to make it re-readable at any time, > keeping a minimal amount in RAM and the rest either on disk or > elsewhere - this is all open source of course, so if your interested I > can point you at the code which should be re-useable outside eXist > too. > > My concern regarding the conversion was rather that if a user puts an > intermediate step between xs:hexBinary and xs:base64Binary, perhaps > xs:string - it may not be clear to them but they will pay a heavy > price to encode the raw binary, to hexBinary, decode it and then > re-encode it as base64Binary. I really wish we could just settle on > one binary type in XQuery. I would be interested to know a little > history about why we have two of them Mike? > >>> Im just reading through the rest now, my main concern is that these >>> operations can be done efficiently. I have been re-working the >>> implementation of the common Java code for the EXPath http module to >>> support streaming of large binary values and large string values. We >>> have customers that want to work with binary and text documents that >>> are several gigabytes each from XQuery. >>> >>> >> Interesting question. I don't know how efficient direct access to binary >> files is; if it's OK, then one could easily have an internal implementation >> of a base64Binary value that's mapped directly to a file rather than to >> memory, and perform all the operations directly on the file. > > Yup. See above. > >> But if >> efficiency means maintaining a current position in the file and reading >> what's at the current position, then that complicates the interface >> considerably. It could be done using higher-order functions, but would be a >> bit mind-blowing. Although we've got functions with side-effects in the File >> module, they are external side-effects, and I'd be reluctant to design >> anything with internal side-effects, e.g. on the current position of a file >> handle. > > Well with the EXPath HTTP Client module, I did not need to re-design > the function signatures, merely I adapted the implementation to do > streaming. So my comment was more, that I would have a look and see if > there was anything we could not do using the proposed binary function > signatures in a streaming manner... > >> Michael Kay >> Saxonica >> >> > > > > -- > Adam Retter > > eXist Developer > { United Kingdom } > adam@exist-db.org > irc://irc.freenode.net/existdb -- Adam Retter eXist Developer { United Kingdom } adam@exist-db.org irc://irc.freenode.net/existdb
Received on Wednesday, 13 March 2013 17:21:05 UTC