findword.py

Disasterously slow. I was going to use this perhaps as a bit of base
code for a Busy Spunge/Blogspace/Memesh style application--one of
those things that I've been wanting to do for years--but on second
thoughts I don't even like the approach. I found that "grep" is many
times faster, even though it searches by line and not by file. And the
main problem that I'm having at the moment is not with the datamodel,
per se, but the interface: I've no idea how to get the sort of GUI
that I want at yet keep it portable and so forth. And I'd rather work
on a line-mode style interface...

http://www.w3.org/DesignIssues/Conversations

...since we've already good information management tools that use
that. Hmm... that reminds me of the little `u note` hack that I set up
not so long ago:-

#!/usr/bin/sh
echo $* \(`u timenow`\) >> notes

(u timenow just outputs a YYYYMMDD-HHmmSS string).

I've been thinking about JavaScript and SVG: it's fairly well
supported, and one can do great things with the combination if Jim
Ley's stuff is anything to go by. But I haven't really got time to
spend on this: I should just collect all my snippets of notes that
I've made over the months, and dump them into www-archive.

Anyway, here's findword.py. GPL 2: share and enjoy (ha).

<findword.py>: [[
#!/usr/bin/python
"""
findword.py, by Sean B. Palmer

find words, much like a search engine
"""

import sys, re

# { 'word': [list of items] }

scanWords = re.compile(r'[A-Za-z0-9]+').findall

def build(filenames):
   mappings = {}

   for fn in filenames:
      s = open(fn).read()
      for word in scanWords(s):
         if mappings.has_key(word):
            mappings[word][fn] = 0
         else: mappings[word] = {fn:0}

   return mappings

def common(mappings, words):
   assert len(words) > 0

   lengths = {}
   for word in words:
      length = len(mappings[word].keys())
      if not lengths.has_key(length):
         lengths[length] = [word]
      else: lengths[length].append(word)
   lens = lengths.keys()
   lens.sort()
   swords = []
   for x in lens:
      swords.extend(lengths[x])

   candidates = {}
   first, rest = swords[0], swords[1:]
   for item in mappings[first].keys():
      candidates[item] = 0

   while 1:
      if len(rest) == 0: break
      first, rest = rest[0], rest[1:]
      for item in candidates.keys():
         if not mappings[first].has_key(item):
            del candidates[item]

   return candidates.keys()

def test():
   x = { 'm': {'x':0, 'y':0, 'z':0, 'p':0},
         'n': {'p':0, 'q':0, 'r':0, 'x':0},
         'o': {'y':0, 'p':0, 's':0, 't':0, 'w':0, 'z':0} }
   print common(x, ['m', 'n'])
   print common(x, ['m', 'n', 'o'])
   print common(x, ['m'])
   print common(x, ['n', 'o'])

def main(argv):
   if argv[0].endswith('make'):
      print `build(argv[1:])`
   elif argv[0].endswith('find'):
      mappings = eval(open(argv[1]).read())
      words = argv[2:]
      print common(mappings, words)
   else: print __doc__

if __name__=="__main__":
   main(sys.argv[1:])
]]

--
Sean B. Palmer, <http://purl.org/net/sbp/>
"phenomicity by the bucketful" - http://miscoranda.com/

Received on Thursday, 13 February 2003 10:34:19 UTC