- From: John Lumley <john@saxonica.com>
- Date: Tue, 09 Jul 2013 17:18:07 +0100
- To: public-expath@w3.org
- Message-ID: <51DC37BF.6040402@saxonica.com>
Part of the EXPath-binary discussion around xs:base64Binary extensions
involves the issue of defining simple 'binary' constants for generation
of binary forms and comparison, especially used with other functions in
the package. Whilst extending the XPath constant grammar to include 0x45
might be nice, we know that's a VERY big task, and for the purposes of
binary manipulation most of what we may want to do involves building
small sections of binary data either to add to an output or compare
against some input, using the library of functions being proposed.
For example, in decoding a JPEG file by sectors, you need to check that
the first octet of the sector is '0xFF', and then decode the meaning of
that sector from the second octet. In XSLT you want to say something like:
<xsl:when test="bin:subsequence($stream,$pointer,1) = bin:hex('FF')">
<xsl:variable name="type"
select="bin:subsequence($stream,$pointer+1,1)" as="xs:base64Binary"/>
<xsl:choose>
<xsl:when test="$type = bin:hex('E0')"> .. JFIF header....
<xsl:when test="$type = bin:hex('DB')">... Quantisation table....
etc.
Since we can compare xs:base64Binary values for equality directly, then
for such cases we don't need any conversion to integers or strings, if
the simple 'binary string constructors' (e.g. bin:hex()) generate
xs:base64Binary. Sub-octet comparisons can be done by suitable masking
with e.g. bin:and($comp,bin:binary('111')) to examine only the bottom
three bits...
We might care to consider whether we need any more than just three forms
(as originally mooted by Michael Kay):
* bin:hex('11223F4E') => "ESI/Tg==" (when serialised base64)
* bin:binary('1000111010101') => "EdU="
* bin:octal('11223047') => "JSYn"
All have signature bin:xxx(/xs:string?/) as /xs:base64Binary?/ and pad
(effective zeroes) from the left up to the required octet boundary. The
'numbers' in the string are considered bigendian, ie. the earlier digits
occupy earlier locations in the resultant byte stream. (Those who want
to live with octal have to suffer the consequences of their 3-bit length
- but there again without octal we'd never have had the delight of the
PDP-11 ;-))
A more general function which detected the 'base' from the start of the
string (e.g. 0x, 0...) can be easily written as a compound, but I would
expect that in most cases you'd know what base you wanted to use, as
they are mostly expected to be used to generate constants.
And such constants would not be limited to short lengths either:
bin:find(value as /xs:base64Binary?/, search as /xs:base64Binary/,
offset as /xs:integer/) as /xs:integer?/
has been proposed to search in binary forms. So to find the first
quantisation table in a JPEG:
bin:find($jpeg,bin:hex('FFDB'),0)
should do the trick.
Notes:
* hexBinary forms can be defined in XSLT by applying
bin:to-hexBinary(in /as xs:base64Binary*/) /as xs:hexBinary?/ and
bin:from-hexBinary(in /as xs:hexBinary*/)/as xs:base64Binary/?
wrapper functions. (Separate discussion on their semantics.)
* I'm also advocating that bin:binary-/functionName/() as the prefix
is redundant on 'binary' and makes the function call to long to read
comfortably.
* bin:binary() isn't very comfortable as a (very overloaded) name, but
I can't think of anything else.
--
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd
Received on Tuesday, 9 July 2013 16:18:34 UTC