- From: John Lumley <john@saxonica.com>
- Date: Tue, 09 Jul 2013 17:18:07 +0100
- To: public-expath@w3.org
- Message-ID: <51DC37BF.6040402@saxonica.com>
Part of the EXPath-binary discussion around xs:base64Binary extensions involves the issue of defining simple 'binary' constants for generation of binary forms and comparison, especially used with other functions in the package. Whilst extending the XPath constant grammar to include 0x45 might be nice, we know that's a VERY big task, and for the purposes of binary manipulation most of what we may want to do involves building small sections of binary data either to add to an output or compare against some input, using the library of functions being proposed. For example, in decoding a JPEG file by sectors, you need to check that the first octet of the sector is '0xFF', and then decode the meaning of that sector from the second octet. In XSLT you want to say something like: <xsl:when test="bin:subsequence($stream,$pointer,1) = bin:hex('FF')"> <xsl:variable name="type" select="bin:subsequence($stream,$pointer+1,1)" as="xs:base64Binary"/> <xsl:choose> <xsl:when test="$type = bin:hex('E0')"> .. JFIF header.... <xsl:when test="$type = bin:hex('DB')">... Quantisation table.... etc. Since we can compare xs:base64Binary values for equality directly, then for such cases we don't need any conversion to integers or strings, if the simple 'binary string constructors' (e.g. bin:hex()) generate xs:base64Binary. Sub-octet comparisons can be done by suitable masking with e.g. bin:and($comp,bin:binary('111')) to examine only the bottom three bits... We might care to consider whether we need any more than just three forms (as originally mooted by Michael Kay): * bin:hex('11223F4E') => "ESI/Tg==" (when serialised base64) * bin:binary('1000111010101') => "EdU=" * bin:octal('11223047') => "JSYn" All have signature bin:xxx(/xs:string?/) as /xs:base64Binary?/ and pad (effective zeroes) from the left up to the required octet boundary. The 'numbers' in the string are considered bigendian, ie. the earlier digits occupy earlier locations in the resultant byte stream. (Those who want to live with octal have to suffer the consequences of their 3-bit length - but there again without octal we'd never have had the delight of the PDP-11 ;-)) A more general function which detected the 'base' from the start of the string (e.g. 0x, 0...) can be easily written as a compound, but I would expect that in most cases you'd know what base you wanted to use, as they are mostly expected to be used to generate constants. And such constants would not be limited to short lengths either: bin:find(value as /xs:base64Binary?/, search as /xs:base64Binary/, offset as /xs:integer/) as /xs:integer?/ has been proposed to search in binary forms. So to find the first quantisation table in a JPEG: bin:find($jpeg,bin:hex('FFDB'),0) should do the trick. Notes: * hexBinary forms can be defined in XSLT by applying bin:to-hexBinary(in /as xs:base64Binary*/) /as xs:hexBinary?/ and bin:from-hexBinary(in /as xs:hexBinary*/)/as xs:base64Binary/? wrapper functions. (Separate discussion on their semantics.) * I'm also advocating that bin:binary-/functionName/() as the prefix is redundant on 'binary' and makes the function call to long to read comfortably. * bin:binary() isn't very comfortable as a (very overloaded) name, but I can't think of anything else. -- *John Lumley* MA PhD CEng FIEE john@saxonica.com <mailto:john@saxonica.com> on behalf of Saxonica Ltd
Received on Tuesday, 9 July 2013 16:18:34 UTC