- From: John Lumley <john@saxonica.com>
- Date: Mon, 15 Jul 2013 17:19:14 +0100
- To: public-expath@w3.org
- Message-ID: <51E42102.8000408@saxonica.com>
*Warning - heated arguments may ensue... * In the first draft for Binary Module <http://expath.org/spec/binary>, conversion between binary and numeric (integer, float, double) forms were proposed to default to 'little-endian', with an extended form of the conversion functions (additional argument) supporting big-endian storage in the binary form, e.g. |bin:unpack-float($bin)| is 'little-endian', |bin:unpack-float($bin,true())| is 'big-endian'.) There appear to be no comments raised about this, that I can find. Given that the choice of default influences 'code length' quite considerably (or we could define suitably named wrappers), there should be some discussion/consensus of which default choice should be made. One way to think about it is where the 'number' that has to be packed or unpacked arose from, or is going to. One can argue that in the environment of XLST/XQuery/XPath execution, any endianess in the machine numbers defined as constants or computed as |xs:number| types is irrelevant, as they have no sense of 'address' accessible through the X* execution model. Whilst the x86 architecture is little-endian, and ARM is now bi-endian, does this have significant impact on performance at the higher levels of the execution models we anticipate for this module? If the binary data is being consumed from some outside source or written to the same, then we have a variety of different contexts: * Numbers in network data (RFC 1700 <http://www.ietf.org/rfc/rfc1700.txt>) are usually expected to be big-endian. * Image formats vary: JPEG and PNG <http://www.w3.org/TR/2003/REC-PNG-20031110/#7Integers-and-byte-order> are big, BMP and GIF are small, TIFF can be either (and indicates which type with a specific palindromic marker) * Audio and video formats can vary seriously. * Some formats that appear binary(ish) don't have endian issues: e.g. Postscript and PDF have (uncompressed) numbers encoded as ASCII (decimal) strings. Other notes: In cases where endianess can vary between data instances, such as TIFF, some global or tunnelled variable (XSLT) could be set and referenced, e.g.: |<xsl:variable name="BIG" select="bin:subsequence($tiff,/location/,/2/) = bin:hex(|'4D4D')"/> ... bin:unpack-unsigned-integer($tiff,/$loc/,/$len/,/$BIG/)...... or in XPath3.0, curried functions could be used: ... <xsl:variable name="bin:unpack-uint" select="bin:unpack-unsigned-integer(?,?,?,$BIG)"/> ... $bin:unpack-uint($tiff,/$loc/,/$len/)...... I assume binary-order-marker (BOM) labelling of encoded XML is not relevant to this issue, as they won't be generated or consumed in a binary manner. Unless of course a multi-encoding data source is encountered, when binary pre-splitting may be required. Question: So the question is: what are use cases that would make /extensive/ use of numeric packing and unpacking into binary file forms, presumably for interfacing with other (non-network?) applications? And what are the endianness defaults for such applications? [My preference would be big-endian, if only for network-order compatibility and also given that the proposed 'string-constant' functions |bin:hex('FACE9D78')|, etc. will treat their numbers as big-endian...] *John Lumley* MA PhD CEng FIEE john@saxonica.com <mailto:john@saxonica.com> on behalf of Saxonica Ltd
Received on Monday, 15 July 2013 16:19:36 UTC