- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Thu, 03 Oct 2002 10:46:42 +0100
- To: public-esw@w3.org
Trip Report by Dave Beckett on visit to MINDSWAP group,
University of Maryland, College Park, MD, USA
2002-09-23 to 2002-09-27
[ This trip was partially funded by SWAD Europe ]
Monday 2002-09-23
The first working day, I had mostly with Bijan Parsia (hosting me)
going over the background and the mindswap groups' projects. The
group:
http://www.mindswap.org/
is based at the MINDlab at the University of Maryland (UMD),
College Park which is just North of Washington DC. The group is
relatively new and last year Professor Jim Hendler started teaching
the first semantic web classes to the UMD students, several of who
now work for the group.
Jim is best known for his work on SHOE (Simple HTML Ontology
Extensions) - http://www.cs.umd.edu/projects/plus/SHOE/ - but has a
background in agents, robots and knowledge representation. He was
seconded to DARPA for several years to run the Darpa Agent Markup
Language (DAML) project - http://www.daml.org/ - which later fed
into the DAML+OIL ontology language, that became the basis of the
W3C's work. The latter is being developed by the Web Ontology
Working Group (WOWG) which Jim co-chairs:
http://www.w3.org/2001/sw/WebOnt/
and the language is now called OWL (Web Ontology Language).
The group now works on several projects related to DAML+OIL/OWL,
RDF and Parka, more of which later. They have created several RDF
and Ontology markup tools for desktop use that allow attaching of
ontological information (properties, classes) to descriptions,
creating instance data. The two main ones are:
SMORE - Semantic Markup, Ontology and RDF Editor
http://www.mindswap.org/~aditkal/editor.shtml
RIC - RDF Instance Creator
http://www.mindswap.org/~mhgrove/RIC/RIC.shtml
which are both Java+Swing applications.
The Parka system - http://www.mindswap.org/2002/parka/ -
is a knowledge base (in the AI sense) and has been deployed and
used commercially for several years; but can be considered a large
triple store, thus suitable for RDF/OWL storage. Parka has
recently been released as open source software by Mindswap -
http://www.mindswap.org/2002/parka/ - but is a little raw at
present. They have been talking to me about this while I was
working on the SWAD Europe storage report:
http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/
and it looked interesting. Bijan arranged for one of the original technical
developers on the project to visit on Wednesday for a runthrough of
the system.
The group is also working on DAML-S - http://www.daml.org/services/
- a DAML-based Web Service Ontology. This allows description of
web services that can be tied to WSDL and used with SOAP and so on
to create web services in a semantic web style. There is a new
version 0.7 of the DAML-S ontology due out soon.
Tuesday 2002-09-24
I was pleased to find out that Jim Hendler had made himself
available to me all day; which was an unexpected delight. I gave
him an outline of SWAD Europe work and the possible areas for
collaboration we might have. It had already been noted that their
work on the storage systems and mine on the report above and with
my Redland system - http://www.redland.opensource.ac.uk/ - would be
useful to do collaborate on. Benchmarks and standard datasets for
testing stores are further possibilities.
Mindswap also plan to use Redland software as an interface layer
above Parka for using and abstracting from it, so that they can
potentially swap out the backend to another datastore for testing,
benchmarking; rather than build applications with a parka-only
binding.
The other main relationship with SWAD Europe work was in querying;
since Jim et al had worked on DAML query languages and this effort
was continuing. The SWAD Europe work on semweb QLs is extensive
and they are keen for collaboration in developing use cases,
documenting these things and moving them towards standardising them
in due course.
Jim, Bijan and I also discussed the W3C semantic web standards
efforts and sketched out some solutions to tricky syntax problems
that were being raised by WebOnt. It hopefully quickened the
resolution since I could discuss pro-s and con-s of different
solutions directly. The particular topic here was OWL's proposed
use of elements outside rdf:RDF for doing ontology description.
After seeing some of the desktop tools, I outlined the work I did
on the MEG Registry project - http://137.222.34.57:6543/ - and the
desktop schema creation tool created by Damian Steer that talks to
it - http://www.ukoln.ac.uk/metadata/education/regproj/
This is something that I plan to show at the SWAD Europe workshop
at the Dublin Core conference in Florence in mid-October.
18:00 CMSC 828y class - AI on the Web
http://www.cs.umd.edu/users/hendler/CMSC828y/
Sat in on their class where Jim was going over the OWL guide
document to introduce them to the ontology language. He found a
load of errors, but it was a brand new draft document.
See also http://www.w3.org/TR/2002/WD-owl-features-20020729/
The class wiki: http://www.mindswap.org/cgi-bin/webai/moin.cgi
Wednesday 2002-09-25
09:00 Meeting about Parka with Merwyn Taylor
http://www.cs.umd.edu/users/mtaylor/
who worked on Parka but now works at Johns Hopksins University.
Also present were Bijan Parisa, Ron Alford and Ronald Reck.
We went through a presentation (PPT slides) on how the system
worked and discussed the built-in limitations, constraints and
differences against the RDF triples model. Parka has no specific
literal indexing in it; it hashes the entire string content. It
seems actually that there is no distringuishing strings from URIs,
that has to be imposed as a practice on top of the basic core.
The knowledge base has special in-memory indexing of the subclass,
subproperty, instancing relations providing quick indexing. This
is done via static sized arrays which work fine for typical data
but would need a recompile to extend them. Parka was optimised for
quick response time - hence the fixed arrays and consideration of
fast access to the disk, done by always operating in fixed sector
sizes (4k).
The KB is built on top of an internal relational store which
provides simple and quick access. This was developed by students
(including Merywn) as part of a database class and has been stable
but there may be a new version if the class took it further.
Merwyn said that they had tested Parka on Oracle but the overhead of
using it via a query language (SQL) proved too much.
We had a discussion of the limits in the parka database code, which
are somewhat tied to it's frame-based approach on the data model.
There is a 2.4M limit in the code on the number of distinct frames
(subject URIs in RDF-speak); note frames are not assertions. This
is mostly caused by using an integer for indexing - the 32 bits of
the ingeger are split up, limiting the range substantially. The
code does support "namespaces", although I'm not sure what that
means or how it helps in this regard. The tables could also be
"chunked", pointing to next table at the bottom of full ones.
Other restrictions include how the library is using 4K pages,
marked in a fixed array, tracking usage. This is stored in a 4K
header page meaning a limit of (under) 32K pages. The page size
could be increased, although performance is best when it matches
the disk page size. There are also othe rmemory usage per frame to
consider, such as the structural assertions (isa, instanceof, ..>)
which could be rather a bloat if the data itself was mostly
ontological or schema information. Some of the rdf/s data seen has
this characteristic, but most is vastly more data than class and
property relationships.
Throughout the above discussion, I related the various aspects of
it to what I had recently read on how TAP -
http://tap.stanford.edu/ - solves them which addresses a similar
problem area, but for a specific fixed schema. It uses MySql below
as one of the storage mechanisms but also provides several
in-memory and partially in-memory (mmap-ed) stores. TAP, like
parka, provides optimised support for the structural assertions, at
the RDF-schema level.
After that discussion, this led onto how a better indexing library
could solve some of these problems and reduce the complexity of
parka. I suggested the backends could be MySQL, directly via it's
C interface, not via SQL or possibly going direct to
Berkeley/Sleepycat DB which is now one of the table tables that BDB
allows.
Merwyn then finished with an explanation of the parka query
planning and evaluation.
I outlined how Redland somewhat overlapped in some of Parka's
activity but from an RDF point of view.
12:30 MINDSWAP weekly meeting
Aditya Kalyanpur, Ron Alford, Amy Loomis, Ronald Reck, Matt Westhoff,
H. Ross Baker, Mike Grove, Jennifer Golbeck, Bijan Parsia and me
They all discussed what they'd been doing for the last week and
then we had a demo of SMORE. I went over SWAD Europe in outline,
the survey stuff I'd done/was doing and Redland.
14:00 Toru Ishida - Social Agents in Digital Cities
-- http://www.lab7.kuis.kyoto-u.ac.jp/
Talk on the use of social agents that understand the conventions of
social systems to foster human/human interaction. Uses interaction
scenarios that are described in a language. Toru was just
finishing a visit to the mindlab and had been sharing the office
with Bijan.
Thursday 2002-09-26
Kendall Clarke arrived, visiting Bijan - he writes for
XML.com (as does Bijan), O'Reilly and does technical and other
writing and editing -- http://clark.dallas.tx.us/kendall/
Aditya gave a demonstration of SMORE to Kendall and I, we commented
on the user interface since there were several things that seemed
unclear to us as we watched him. It allows you to markup some
text (HTML) written as sentences into triple form, then browse and
search existing ontologies to find appropriate terms for the
relationships, classes. They can be dragged and dropped to form a
description of the content in RDF/DAML+OIL (soon OWL) form. The
resulting description can then be saved.
Met with Ron Alford and discussed the details of implementing a
storage backend to Redland, such as they proposed to do with
Parka. Lots of looking at source code and explaining stuff that
really should be in documentation somewhere ;)
12:00 Freedman's Bureau Project
-- http://freedmensbureau.com/
Meeting with Harry Keeling and Jerry ? (Howard University, DC)
-- http://www.founders.howard.edu/CEACS/Departments/CompSci/
along with Jen, Jim, Bijan, Toru and Kendall.
This is a (proposed) digital library project based on taking the
preserved paper records from the 1865- records made by the
"The Bureau of Refugees, Freedmen and Abandoned Lands,"
which dealt with providing support for the newly freed people. The
result is a large amount of microfiche, which can then give lots of
images. The idea is to set up a system that allows multiple people
to annotate and comment on the documents in the images; since there
are lots of views or ways of approaching their description. Some
basic ontologies would start it off, but the expectation would be
to provide ways to have all interpretations representable.
I was sure I'd heard of a similar project (scanning, images,
annotations, Dublin Core) but couldn't remember or google for the
references.
18:00 CMSC 828y class - AI on the Web
http://www.cs.umd.edu/users/hendler/CMSC828y/
Next class and during it I gave an overview of RDF tools and
resources on the web that they could use. The programming
experience here seemed to be majority Java, some Python, 1 perl out
of the approx 25 students. I also went through the details of some
of the RDF bots work - logger and especially foafbot, which uses
redland underneath, but in the detail of how it manages trust
relationships. Also encouraged them to hang out on the #rdfig
channel.
Met for the first time, Jordan Katz, a high school student
attending this graduate class, who has been working on displaying
RDF via XSLT sheets to give multiple views of the document
depending on the audience.
Friday 2002-09-27
10:00 RDF Core Telecon
Agenda:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0329.html
11:20 Meeting with Ronald Reck
Suggested various surveying things he could do on the datasets he
had that would be useful to find out. Some of the parka
restrictions might be worth expanding if common datasets had
problems with them (large literals say).
14:00 Redland overview
I gave a whiteboard outline of Redland and the state of the changes
I was making to Ron Alford, Bijan, Kendall and Jim (partially).
Some of this is partially done (iterators change) and some is
planned (web, query, iostreams) and required to support other
features. I tried to show how Parka, querying and relational
backends would fit into the picture which Redland currently has
some skeleton support for, but not complete.
Saturday 2002-09-28
Travelling back via Dulles Airport - got a lot of hacking done on
Raptor which seems to be reducing in bug count:
http://www.redland.opensource.ac.uk/raptor/
Plus throughout the week lots of debugging various bits of code and
helping them with python and php access to Redland as well as doing a
little report writing (Monday only).
Received on Thursday, 3 October 2002 05:46:58 UTC