- From: Garrett Wollman <wollman+semantic-web@bimajority.org>
- Date: Mon, 13 Feb 2006 14:07:07 -0500
- To: semantic-web@w3.org
So here's what I ended up doing.... My photo gallery application, as I described a month or so ago, now generates an RDF index file in addition to the HTML index file. I use cwm to apply hoisting rules based on the OWL definitions of the ontologies that I'm using and some domain-specific knowledge. The resulting index pages are then squished together (also using cwm), and then imported into a Redland hash store. This part of the process is automated using a Makefile: ------------------------------------------------------------------------ SITE= /home/gallery/data URL= http://gallery.bostonradio.org CWM= /usr/local/bin/cwm.py CWMFLAGS= --closure=e RDFPROC_ENV= RDFPROC_STORAGE_OPTIONS="hash-type='bdb',dir='.',index-predicates='yes'" ONTOLOGIES= http://xmlns.com/foaf/0.1/index.rdf \ http://meta.bostonradio.org/radio \ http://meta.bostonradio.org/fcc \ http://www.holygoat.co.uk/owl/2005/05/photo/ \ http://daml.umbc.edu/ontologies/ittalks/address INDEX_FILES!= cd $(SITE); find * -name index.rdf DB_FILES= p2so.db po2s.db so2p.db sp2o.db all: stamp-db-loaded .for index in $(INDEX_FILES) all-triples.rdf: ${.OBJDIR}/${index:.rdf=.n3} ${.OBJDIR}/${index:.rdf=.n3}: ${SITE}/${index} ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3 mkdir -p $$(dirname ${.OBJDIR}/${index}) ${CWM} ${CWMFLAGS} --rdf ${ONTOLOGIES} ${URL}/${index} \ --n3=r ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3 \ --think >${.OBJDIR}/${index:.rdf=.n3}.new mv -f ${.OBJDIR}/${index:.rdf=.n3}.new ${.OBJDIR}/${index:.rdf=.n3} .endfor all-triples.rdf: ${CWM} ${CWMFLAGS} -n3 ${.ALLSRC} --rules --data --rdf >${.TARGET}.new mv -f ${.TARGET}.new ${.TARGET} stamp-db-loaded: all-triples.rdf ${RDFPROC_ENV} rdfproc -n loading parse all-triples.rdf .for db in ${DB_FILES} mv -f loading-${db} production-${db} .endfor touch stamp-db-loaded ------------------------------------------------------------------------ This whole process, with five photo galleries currently indexed, takes about an hour (nearly all of which is cwm crunching on the data). The hoisting rules are things like this: { ?x photo:concreteRepresentationOf ?y . ?y ?p ?sub } => { ?x ?p ?sub }. After creating the full store, there then came the question of building systems to query it, which led very quickly (or not) to my original message in this thread, wondering why my SPARQL queries were running so slowly. Faced with the prospect of debugging the query engine, I decided instead that it would be more fruitful to use some of my domain-specific knowledge to write an explicit query function in a procedural language. After casting around somewhat, I found the (undocumented) medium-level Ruby bindings for Redland, and with some amount of hacking came up with a process that generates results at a reasonable speed: ------------------------------------------------------------------------ require 'rdf/redland' require 'rdf/redland/store' require 'namespaces' module GalleryMetadata include Namespaces Store = Redland::HashStore.new('bdb', 'production', '/home/gallery/index', false, false, false) Model = Redland::Model.new(Store) # stuff elided # Find things (normally blank nodes) of type TYPE. If a block is given, # include only those things for which the block evaluates to true. def things_of_type(type) things = [] subjects(RDF['type'], type) do |node| if block_given? if yield node things << node end else things << node end end things.uniq! return things end # Find photos depicting DEPICTED_THINGS. For each thing given, find # all of the photographs which show it. Yields the smallest non-thumbnail # image file for each photo. def find_photos_for(depicted_things) # For each depicted_thing, find all the photos which aren't # thumbnails but do depict the abstract_photos = {} depicted_things.each do |object| subjects(FOAF['depicts'], object) do |photo| if include?(photo, RDF['type'], PHOTO['ImageFile']) photo = ImageFile.new(photo) objects(photo.node, PHOTO['concreteRepresentationOf']) do |abstract| unless photo.thumbnail? # The same blank node may appear as multiple Ruby # objects, so we have to make sure to use the name # and not the object when using blank nodes as hash # keys. abstract = abstract.to_s unless abstract_photos.has_key?(abstract) abstract_photos[abstract] = [] end abstract_photos[abstract] << photo end end end end end result_set = [] abstract_photos.each do |photo, imageset| if imageset.length > 0 smallest = imageset.min do |a, b| a.width * a.height <=> b.width * b.height end if block_given? yield smallest end result_set << smallest end end return result_set end end ------------------------------------------------------------------------ Then a simple query application, like <http://gallery.bostonradio.org/cgi-bin/person.cgi>, boils down quite nicely to: ------------------------------------------------------------------------ people = things_of_type(FOAF['Person']) do |person| include?(person, FOAF['name'], cgi['q']) end items = [] find_photos_for(people) do |image| thumb = image.thumbnail items << { :desc => Amrita.a({:href => image.description}) do image.title end, :thumb => Amrita.a({:href => image.description}) do Amrita.e(:img, :src => thumb.node.uri.to_s, :width => thumb.width, :height => thumb.height, :alt => "A photo depicting #{cgi['q']}") end } end do_template(cgi, "OK", "images.html", { :name => cgi['q'], :items => items, :count => items.length }) ------------------------------------------------------------------------ -GAWollman
Received on Monday, 13 February 2006 19:07:18 UTC