Numeric 'strings' to binary forms

Part of the EXPath-binary discussion around xs:base64Binary extensions 
involves the issue of defining simple 'binary' constants for generation 
of binary forms and comparison, especially used with other functions in 
the package. Whilst extending the XPath constant grammar to include 0x45 
might be nice, we know that's a VERY big task, and for the purposes of 
binary manipulation most of what we may want to do involves building 
small sections of binary data either to add to an output or compare 
against some input, using the library of functions being proposed.

For example, in decoding a JPEG file by sectors, you need to check that 
the first octet of the sector is '0xFF', and then decode the meaning of 
that sector from the second octet. In XSLT you want to say something like:

<xsl:when test="bin:subsequence($stream,$pointer,1) = bin:hex('FF')">
     <xsl:variable name="type" 
select="bin:subsequence($stream,$pointer+1,1)" as="xs:base64Binary"/>
     <xsl:choose>
         <xsl:when test="$type = bin:hex('E0')"> .. JFIF header....
         <xsl:when test="$type = bin:hex('DB')">... Quantisation table....
etc.

Since we can compare xs:base64Binary values for equality directly, then 
for such cases we don't need any conversion to integers or strings, if 
the simple 'binary string constructors' (e.g. bin:hex()) generate 
xs:base64Binary. Sub-octet comparisons can be done by suitable masking 
with e.g. bin:and($comp,bin:binary('111')) to examine only the bottom 
three bits...

We might care to consider whether we need any more than just three forms 
(as originally mooted by Michael Kay):

  * bin:hex('11223F4E')  =>  "ESI/Tg==" (when serialised base64)
  * bin:binary('1000111010101') => "EdU="
  * bin:octal('11223047') => "JSYn"

All have signature bin:xxx(/xs:string?/) as /xs:base64Binary?/ and pad 
(effective zeroes) from the left up to the required octet boundary. The 
'numbers' in the string are considered bigendian, ie. the earlier digits 
occupy earlier locations in the resultant byte stream. (Those who want 
to live with octal have to suffer the consequences of their 3-bit length 
- but there again without octal we'd never have had the delight of the 
PDP-11 ;-))

A more general function which detected the 'base' from the start of the 
string (e.g. 0x, 0...) can be easily written as a compound, but I would 
expect that in most cases you'd know what base you wanted to use, as 
they are mostly expected to be used to generate constants.


And such constants would not be limited to short lengths either:

    bin:find(value as /xs:base64Binary?/,  search as /xs:base64Binary/,
    offset as /xs:integer/) as /xs:integer?/

has been proposed to search in binary forms. So to find the first 
quantisation table in a JPEG:

    bin:find($jpeg,bin:hex('FFDB'),0)

should do the trick.

Notes:

  * hexBinary forms can be defined in XSLT by applying
    bin:to-hexBinary(in /as xs:base64Binary*/) /as xs:hexBinary?/ and
    bin:from-hexBinary(in /as xs:hexBinary*/)/as xs:base64Binary/?
    wrapper functions. (Separate discussion on their semantics.)
  * I'm also advocating that bin:binary-/functionName/() as the prefix
    is redundant on 'binary' and makes the function call to long to read
    comfortably.
  * bin:binary() isn't very comfortable as a (very overloaded) name, but
    I can't think of anything else.


-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd

Received on Tuesday, 9 July 2013 16:18:34 UTC