Comments on binary specification

Excellent work!

Comments:

4.1 bin:unparsed-binary, rules para 1 "string representation" -> "binary 
representation".

This function seems to duplicate file:read-binary() in the File module. 
Let's reuse that instead. Also, since the file module has chosen 
base64binary rather than hexBinary as its binary data type, let's use 
that (it's unlikely to affect the implementation, since the value space 
of both types is the same and conversion is probably a no-op except for 
the type label).

6.2 bin:unpack-string. This function is largely the composition of 
binary-subsequence and decode-string, which makes it largely a 
convenience function. The exception is that it can extract a subsequence 
based on the presence of a terminator; which suggests the need for a 
primitive such as bin:terminated-subsequence($in, $offset, $terminator), 
or perhaps more primitive still bin:find($in, $offset, $search) which 
returns the (relative?) offset of the first occurrence of $search after 
the specified $offset.

With the function as specified, you can get a zero-terminated string, 
but it's hard to tell how many bytes you have read, which makes it 
difficult to move the read position forward to get the next string after 
the terminator.

If the function is retained, then in the $size=0 case it needs to say 
whether the 0 octet is included in the result.

7 - numeric data.

I wish there were a way of doing this with fewer functions. I think I'd 
be inclined to keep float and double, and replace all the integer ones with

unpack-signed-integer($in, $offset, $length) => integer
unpack-unsigned-integer($in, $offset, $length) => integer
pack-integer($in, $length)

Apart from anything else, this allows you to read for example a 3-byte 
integer, which isn't that uncommon in binary formats. Also, looking at 
Saxon's PTree binary format as a use-case, there are integers whose size 
isn't known statically, which would make the supplied functions very 
unwieldy. We would have to say that handling $length>8 is 
implementation-dependent.

8 - bitwise

You say the shorter operand is padded, but you don't say whether the 
padding is on the left or the right. I can see use cases for both. 
Perhaps we should require both to be the same length, and provide 
pad-left($in, $size) and pad-right($in, $size).

In 8.4 binary-not the signature incorrectly gives the function name as 
binary-and.

9 - serialization

I'm not happy that this departs from the spec of xsl;result-document 
which states:

The xsl:result-document instruction is used to create a final result 
tree. The content of the xsl:result-document element is a sequence 
constructor for the children of the document node of the tree. A 
document node is created, and the sequence obtained by evaluating the 
sequence constructor is used to construct the content of the document, 
as described in 5.7.1 Constructing Complex Content.

I think the EXPath file module already gives us file:write-binary; let's 
just re-use that. It has the advantage of working in XQuery as well (and 
it's free of the paternalistic XSLT rules which stop you writing twice 
to the same URI, etc).

Michael Kay
Saxonica

Received on Wednesday, 13 March 2013 10:02:19 UTC