Re: Is it just me or does this seem incredibly slow? from Garrett Wollman on 2006-02-13 (semantic-web@w3.org from February 2006)

From: Garrett Wollman <wollman+semantic-web@bimajority.org>
Date: Mon, 13 Feb 2006 14:07:07 -0500
To: semantic-web@w3.org
Message-ID: <17392.55515.737953.627143@khavrinen.csail.mit.edu>
So here's what I ended up doing....

My photo gallery application, as I described a month or so ago, now
generates an RDF index file in addition to the HTML index file.  I use
cwm to apply hoisting rules based on the OWL definitions of the
ontologies that I'm using and some domain-specific knowledge.  The
resulting index pages are then squished together (also using cwm), and
then imported into a Redland hash store.  This part of the process is
automated using a Makefile:

------------------------------------------------------------------------
SITE=           /home/gallery/data
URL=            http://gallery.bostonradio.org
CWM=            /usr/local/bin/cwm.py
CWMFLAGS=       --closure=e
RDFPROC_ENV=    RDFPROC_STORAGE_OPTIONS="hash-type='bdb',dir='.',index-predicates='yes'"
ONTOLOGIES=     http://xmlns.com/foaf/0.1/index.rdf \
                http://meta.bostonradio.org/radio \
                http://meta.bostonradio.org/fcc \
                http://www.holygoat.co.uk/owl/2005/05/photo/ \
                http://daml.umbc.edu/ontologies/ittalks/address

INDEX_FILES!=   cd $(SITE); find * -name index.rdf
DB_FILES=       p2so.db po2s.db so2p.db sp2o.db

all:            stamp-db-loaded

.for index in $(INDEX_FILES)
all-triples.rdf: ${.OBJDIR}/${index:.rdf=.n3}

${.OBJDIR}/${index:.rdf=.n3}:   ${SITE}/${index} ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3
        mkdir -p $$(dirname ${.OBJDIR}/${index})
        ${CWM} ${CWMFLAGS} --rdf ${ONTOLOGIES} ${URL}/${index} \
                --n3=r ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3 \
                --think >${.OBJDIR}/${index:.rdf=.n3}.new
        mv -f ${.OBJDIR}/${index:.rdf=.n3}.new ${.OBJDIR}/${index:.rdf=.n3}
.endfor

all-triples.rdf:
        ${CWM} ${CWMFLAGS} -n3 ${.ALLSRC} --rules --data --rdf >${.TARGET}.new
        mv -f ${.TARGET}.new ${.TARGET}

stamp-db-loaded: all-triples.rdf
        ${RDFPROC_ENV} rdfproc -n loading parse all-triples.rdf
.for db in ${DB_FILES}
        mv -f loading-${db} production-${db}
.endfor
        touch stamp-db-loaded
------------------------------------------------------------------------

This whole process, with five photo galleries currently indexed, takes
about an hour (nearly all of which is cwm crunching on the data).  The
hoisting rules are things like this:

{ ?x photo:concreteRepresentationOf ?y .
  ?y ?p ?sub } => { ?x ?p ?sub }.

After creating the full store, there then came the question of
building systems to query it, which led very quickly (or not) to my
original message in this thread, wondering why my SPARQL queries were
running so slowly.  Faced with the prospect of debugging the query
engine, I decided instead that it would be more fruitful to use some
of my domain-specific knowledge to write an explicit query function in
a procedural language.  After casting around somewhat, I found the
(undocumented) medium-level Ruby bindings for Redland, and with some
amount of hacking came up with a process that generates results at a
reasonable speed:

------------------------------------------------------------------------
require 'rdf/redland'
require 'rdf/redland/store'
require 'namespaces'

module GalleryMetadata
  include Namespaces

  Store = Redland::HashStore.new('bdb', 'production', '/home/gallery/index', false,
                                 false, false)
  Model = Redland::Model.new(Store)

# stuff elided

  # Find things (normally blank nodes) of type TYPE.  If a block is given,
  # include only those things for which the block evaluates to true.
  def things_of_type(type)
    things = []
    subjects(RDF['type'], type) do |node|
      if block_given?
        if yield node
          things << node
        end
      else
        things << node
      end
    end
    things.uniq!
    return things
  end

  # Find photos depicting DEPICTED_THINGS.  For each thing given, find
  # all of the photographs which show it.  Yields the smallest non-thumbnail
  # image file for each photo.
  def find_photos_for(depicted_things)
    # For each depicted_thing, find all the photos which aren't
    # thumbnails but do depict the 
    abstract_photos = {}
    depicted_things.each do |object|
      subjects(FOAF['depicts'], object) do |photo|
        if include?(photo, RDF['type'], PHOTO['ImageFile'])
          photo = ImageFile.new(photo)
          objects(photo.node, PHOTO['concreteRepresentationOf']) do |abstract|
            unless photo.thumbnail?
              # The same blank node may appear as multiple Ruby
              # objects, so we have to make sure to use the name
              # and not the object when using blank nodes as hash
              # keys.
              abstract = abstract.to_s
              unless abstract_photos.has_key?(abstract)
                abstract_photos[abstract] = []
              end
              abstract_photos[abstract] << photo
            end
          end
        end
      end
    end

    result_set = []
    abstract_photos.each do |photo, imageset|
      if imageset.length > 0
        smallest = imageset.min do |a, b|
          a.width * a.height <=> b.width * b.height
        end
        if block_given?
          yield smallest
        end
        result_set << smallest
      end
    end

    return result_set
  end
end
------------------------------------------------------------------------

Then a simple query application, like
<http://gallery.bostonradio.org/cgi-bin/person.cgi>, boils down quite
nicely to:

------------------------------------------------------------------------
  people = things_of_type(FOAF['Person']) do |person|
    include?(person, FOAF['name'], cgi['q'])
  end

  items = []
  find_photos_for(people) do |image|
    thumb = image.thumbnail
    items << {
      :desc => Amrita.a({:href => image.description}) do
        image.title
      end,
      :thumb => Amrita.a({:href => image.description}) do
        Amrita.e(:img, :src => thumb.node.uri.to_s,
                 :width => thumb.width,
                 :height => thumb.height,
                 :alt => "A photo depicting #{cgi['q']}")
      end
    }
  end

  do_template(cgi, "OK", "images.html",
              { :name => cgi['q'], :items => items, :count => items.length })
------------------------------------------------------------------------

-GAWollman
Received on Monday, 13 February 2006 19:07:18 UTC