- From: Garrett Wollman <wollman+semantic-web@bimajority.org>
- Date: Mon, 13 Feb 2006 14:07:07 -0500
- To: semantic-web@w3.org
So here's what I ended up doing....
My photo gallery application, as I described a month or so ago, now
generates an RDF index file in addition to the HTML index file. I use
cwm to apply hoisting rules based on the OWL definitions of the
ontologies that I'm using and some domain-specific knowledge. The
resulting index pages are then squished together (also using cwm), and
then imported into a Redland hash store. This part of the process is
automated using a Makefile:
------------------------------------------------------------------------
SITE= /home/gallery/data
URL= http://gallery.bostonradio.org
CWM= /usr/local/bin/cwm.py
CWMFLAGS= --closure=e
RDFPROC_ENV= RDFPROC_STORAGE_OPTIONS="hash-type='bdb',dir='.',index-predicates='yes'"
ONTOLOGIES= http://xmlns.com/foaf/0.1/index.rdf \
http://meta.bostonradio.org/radio \
http://meta.bostonradio.org/fcc \
http://www.holygoat.co.uk/owl/2005/05/photo/ \
http://daml.umbc.edu/ontologies/ittalks/address
INDEX_FILES!= cd $(SITE); find * -name index.rdf
DB_FILES= p2so.db po2s.db so2p.db sp2o.db
all: stamp-db-loaded
.for index in $(INDEX_FILES)
all-triples.rdf: ${.OBJDIR}/${index:.rdf=.n3}
${.OBJDIR}/${index:.rdf=.n3}: ${SITE}/${index} ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3
mkdir -p $$(dirname ${.OBJDIR}/${index})
${CWM} ${CWMFLAGS} --rdf ${ONTOLOGIES} ${URL}/${index} \
--n3=r ${.CURDIR}/owl.n3 ${.CURDIR}/photo.n3 \
--think >${.OBJDIR}/${index:.rdf=.n3}.new
mv -f ${.OBJDIR}/${index:.rdf=.n3}.new ${.OBJDIR}/${index:.rdf=.n3}
.endfor
all-triples.rdf:
${CWM} ${CWMFLAGS} -n3 ${.ALLSRC} --rules --data --rdf >${.TARGET}.new
mv -f ${.TARGET}.new ${.TARGET}
stamp-db-loaded: all-triples.rdf
${RDFPROC_ENV} rdfproc -n loading parse all-triples.rdf
.for db in ${DB_FILES}
mv -f loading-${db} production-${db}
.endfor
touch stamp-db-loaded
------------------------------------------------------------------------
This whole process, with five photo galleries currently indexed, takes
about an hour (nearly all of which is cwm crunching on the data). The
hoisting rules are things like this:
{ ?x photo:concreteRepresentationOf ?y .
?y ?p ?sub } => { ?x ?p ?sub }.
After creating the full store, there then came the question of
building systems to query it, which led very quickly (or not) to my
original message in this thread, wondering why my SPARQL queries were
running so slowly. Faced with the prospect of debugging the query
engine, I decided instead that it would be more fruitful to use some
of my domain-specific knowledge to write an explicit query function in
a procedural language. After casting around somewhat, I found the
(undocumented) medium-level Ruby bindings for Redland, and with some
amount of hacking came up with a process that generates results at a
reasonable speed:
------------------------------------------------------------------------
require 'rdf/redland'
require 'rdf/redland/store'
require 'namespaces'
module GalleryMetadata
include Namespaces
Store = Redland::HashStore.new('bdb', 'production', '/home/gallery/index', false,
false, false)
Model = Redland::Model.new(Store)
# stuff elided
# Find things (normally blank nodes) of type TYPE. If a block is given,
# include only those things for which the block evaluates to true.
def things_of_type(type)
things = []
subjects(RDF['type'], type) do |node|
if block_given?
if yield node
things << node
end
else
things << node
end
end
things.uniq!
return things
end
# Find photos depicting DEPICTED_THINGS. For each thing given, find
# all of the photographs which show it. Yields the smallest non-thumbnail
# image file for each photo.
def find_photos_for(depicted_things)
# For each depicted_thing, find all the photos which aren't
# thumbnails but do depict the
abstract_photos = {}
depicted_things.each do |object|
subjects(FOAF['depicts'], object) do |photo|
if include?(photo, RDF['type'], PHOTO['ImageFile'])
photo = ImageFile.new(photo)
objects(photo.node, PHOTO['concreteRepresentationOf']) do |abstract|
unless photo.thumbnail?
# The same blank node may appear as multiple Ruby
# objects, so we have to make sure to use the name
# and not the object when using blank nodes as hash
# keys.
abstract = abstract.to_s
unless abstract_photos.has_key?(abstract)
abstract_photos[abstract] = []
end
abstract_photos[abstract] << photo
end
end
end
end
end
result_set = []
abstract_photos.each do |photo, imageset|
if imageset.length > 0
smallest = imageset.min do |a, b|
a.width * a.height <=> b.width * b.height
end
if block_given?
yield smallest
end
result_set << smallest
end
end
return result_set
end
end
------------------------------------------------------------------------
Then a simple query application, like
<http://gallery.bostonradio.org/cgi-bin/person.cgi>, boils down quite
nicely to:
------------------------------------------------------------------------
people = things_of_type(FOAF['Person']) do |person|
include?(person, FOAF['name'], cgi['q'])
end
items = []
find_photos_for(people) do |image|
thumb = image.thumbnail
items << {
:desc => Amrita.a({:href => image.description}) do
image.title
end,
:thumb => Amrita.a({:href => image.description}) do
Amrita.e(:img, :src => thumb.node.uri.to_s,
:width => thumb.width,
:height => thumb.height,
:alt => "A photo depicting #{cgi['q']}")
end
}
end
do_template(cgi, "OK", "images.html",
{ :name => cgi['q'], :items => items, :count => items.length })
------------------------------------------------------------------------
-GAWollman
Received on Monday, 13 February 2006 19:07:18 UTC