W3C home > Mailing lists > Public > semantic-web@w3.org > February 2006

Re: Is it just me or does this seem incredibly slow?

From: Garrett Wollman <wollman+semantic-web@bimajority.org>
Date: Mon, 13 Feb 2006 14:07:07 -0500
Message-ID: <17392.55515.737953.627143@khavrinen.csail.mit.edu>
To: semantic-web@w3.org

So here's what I ended up doing....

My photo gallery application, as I described a month or so ago, now
generates an RDF index file in addition to the HTML index file.  I use
cwm to apply hoisting rules based on the OWL definitions of the
ontologies that I'm using and some domain-specific knowledge.  The
resulting index pages are then squished together (also using cwm), and
then imported into a Redland hash store.  This part of the process is
automated using a Makefile:

SITE=           /home/gallery/data
URL=            http://gallery.bostonradio.org
CWM=            /usr/local/bin/cwm.py
CWMFLAGS=       --closure=e
RDFPROC_ENV=    RDFPROC_STORAGE_OPTIONS="hash-type='bdb',dir='.',index-predicates='yes'"
ONTOLOGIES=     http://xmlns.com/foaf/0.1/index.rdf \
                http://meta.bostonradio.org/radio \
                http://meta.bostonradio.org/fcc \
                http://www.holygoat.co.uk/owl/2005/05/photo/ \

INDEX_FILES!=   cd $(SITE); find * -name index.rdf
DB_FILES=       p2so.db po2s.db so2p.db sp2o.db

all:            stamp-db-loaded

.for index in $(INDEX_FILES)
all-triples.rdf: ${.OBJDIR}/${index:.rdf=.n3}

${.OBJDIR}/${index:.rdf=.n3}:   ${SITE}/${index} ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3
        mkdir -p $$(dirname ${.OBJDIR}/${index})
        ${CWM} ${CWMFLAGS} --rdf ${ONTOLOGIES} ${URL}/${index} \
                --n3=r ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3 \
                --think >${.OBJDIR}/${index:.rdf=.n3}.new
        mv -f ${.OBJDIR}/${index:.rdf=.n3}.new ${.OBJDIR}/${index:.rdf=.n3}

        ${CWM} ${CWMFLAGS} -n3 ${.ALLSRC} --rules --data --rdf >${.TARGET}.new
        mv -f ${.TARGET}.new ${.TARGET}

stamp-db-loaded: all-triples.rdf
        ${RDFPROC_ENV} rdfproc -n loading parse all-triples.rdf
.for db in ${DB_FILES}
        mv -f loading-${db} production-${db}
        touch stamp-db-loaded

This whole process, with five photo galleries currently indexed, takes
about an hour (nearly all of which is cwm crunching on the data).  The
hoisting rules are things like this:

{ ?x photo:concreteRepresentationOf ?y .
  ?y ?p ?sub } => { ?x ?p ?sub }.

After creating the full store, there then came the question of
building systems to query it, which led very quickly (or not) to my
original message in this thread, wondering why my SPARQL queries were
running so slowly.  Faced with the prospect of debugging the query
engine, I decided instead that it would be more fruitful to use some
of my domain-specific knowledge to write an explicit query function in
a procedural language.  After casting around somewhat, I found the
(undocumented) medium-level Ruby bindings for Redland, and with some
amount of hacking came up with a process that generates results at a
reasonable speed:

require 'rdf/redland'
require 'rdf/redland/store'
require 'namespaces'

module GalleryMetadata
  include Namespaces

  Store = Redland::HashStore.new('bdb', 'production', '/home/gallery/index', false,
                                 false, false)
  Model = Redland::Model.new(Store)

# stuff elided

  # Find things (normally blank nodes) of type TYPE.  If a block is given,
  # include only those things for which the block evaluates to true.
  def things_of_type(type)
    things = []
    subjects(RDF['type'], type) do |node|
      if block_given?
        if yield node
          things << node
        things << node
    return things

  # Find photos depicting DEPICTED_THINGS.  For each thing given, find
  # all of the photographs which show it.  Yields the smallest non-thumbnail
  # image file for each photo.
  def find_photos_for(depicted_things)
    # For each depicted_thing, find all the photos which aren't
    # thumbnails but do depict the 
    abstract_photos = {}
    depicted_things.each do |object|
      subjects(FOAF['depicts'], object) do |photo|
        if include?(photo, RDF['type'], PHOTO['ImageFile'])
          photo = ImageFile.new(photo)
          objects(photo.node, PHOTO['concreteRepresentationOf']) do |abstract|
            unless photo.thumbnail?
              # The same blank node may appear as multiple Ruby
              # objects, so we have to make sure to use the name
              # and not the object when using blank nodes as hash
              # keys.
              abstract = abstract.to_s
              unless abstract_photos.has_key?(abstract)
                abstract_photos[abstract] = []
              abstract_photos[abstract] << photo

    result_set = []
    abstract_photos.each do |photo, imageset|
      if imageset.length > 0
        smallest = imageset.min do |a, b|
          a.width * a.height <=> b.width * b.height
        if block_given?
          yield smallest
        result_set << smallest

    return result_set

Then a simple query application, like
<http://gallery.bostonradio.org/cgi-bin/person.cgi>, boils down quite
nicely to:

  people = things_of_type(FOAF['Person']) do |person|
    include?(person, FOAF['name'], cgi['q'])

  items = []
  find_photos_for(people) do |image|
    thumb = image.thumbnail
    items << {
      :desc => Amrita.a({:href => image.description}) do
      :thumb => Amrita.a({:href => image.description}) do
        Amrita.e(:img, :src => thumb.node.uri.to_s,
                 :width => thumb.width,
                 :height => thumb.height,
                 :alt => "A photo depicting #{cgi['q']}")

  do_template(cgi, "OK", "images.html",
              { :name => cgi['q'], :items => items, :count => items.length })

Received on Monday, 13 February 2006 19:07:18 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:44:55 UTC