Binary module - find()

In commenting on the original draft proposal, Mike Kay suggested a find 
function something like:

    bin:find($in as xs:base64Binary?, $offset as xs:integer, $pattern as
    xs:base64Binary) as xs:integer?

would be useful to search through binary data and as a complement to 
bin:decode-string(). In putting together some spec. proposal changes and 
a trial implementation for this, a few points arose that may be worth a 
little discussion.

At first I thought: this is an equivalent of fn:index-of($seq,$search), 
which returns a sequence of indicies of members of $seq that are equal 
to $search. But what we are proposing is /not quite the same/: we're 
looking for occurences of a sequence of pattern bytes in the input byte 
sequence, whereas fn:index-of treats singleton 'matching'. In our case  
if we decided to return all the 'matches', we could have overlap:

    bin:find((3,4,4,4,4,5),0,(4,4)) => (1,2,3)

(I have used the octet representation as I think most of us can't read 
base64 directly ;-). $offset is zero-based )

That led me to think that it had more in parallel with substring 
matching, but then realised there isn't any function fn:find-substring() 
- the closest would be build some compound function using 
fn:substring-before() or fn:tokenize($in,$pattern) and examine the 
return values.

My suggestion is that we stay with the bin:find() function as declared 
above, just returning the index of the /first/ occurence, or empty if 
none. Those who want /all/ can build a compound iterative/recursive 
function using bin:find():

    <xsl:function name="bin:find-all" as="xs:integer*">
             <xsl:param name="data" as="xs:base64Binary?"/>
             <xsl:param name="offset" as="xs:integer"/>
             <xsl:param name="pattern" as="xs:base64Binary"/>
            <xsl:sequence
                 select="let $found := bin:find($data,$offset,$pattern)
    return
                 if($found) then ($found,
                     if($found + 1 lt bin:length($data)) then
    bin:find-all($data,$found + 1,$pattern) else ())
                     else ()"/>
       </xsl:function>

John
-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd

Received on Friday, 19 July 2013 09:47:20 UTC