- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Thu, 03 Oct 2002 10:46:42 +0100
- To: public-esw@w3.org
Trip Report by Dave Beckett on visit to MINDSWAP group, University of Maryland, College Park, MD, USA 2002-09-23 to 2002-09-27 [ This trip was partially funded by SWAD Europe ] Monday 2002-09-23 The first working day, I had mostly with Bijan Parsia (hosting me) going over the background and the mindswap groups' projects. The group: http://www.mindswap.org/ is based at the MINDlab at the University of Maryland (UMD), College Park which is just North of Washington DC. The group is relatively new and last year Professor Jim Hendler started teaching the first semantic web classes to the UMD students, several of who now work for the group. Jim is best known for his work on SHOE (Simple HTML Ontology Extensions) - http://www.cs.umd.edu/projects/plus/SHOE/ - but has a background in agents, robots and knowledge representation. He was seconded to DARPA for several years to run the Darpa Agent Markup Language (DAML) project - http://www.daml.org/ - which later fed into the DAML+OIL ontology language, that became the basis of the W3C's work. The latter is being developed by the Web Ontology Working Group (WOWG) which Jim co-chairs: http://www.w3.org/2001/sw/WebOnt/ and the language is now called OWL (Web Ontology Language). The group now works on several projects related to DAML+OIL/OWL, RDF and Parka, more of which later. They have created several RDF and Ontology markup tools for desktop use that allow attaching of ontological information (properties, classes) to descriptions, creating instance data. The two main ones are: SMORE - Semantic Markup, Ontology and RDF Editor http://www.mindswap.org/~aditkal/editor.shtml RIC - RDF Instance Creator http://www.mindswap.org/~mhgrove/RIC/RIC.shtml which are both Java+Swing applications. The Parka system - http://www.mindswap.org/2002/parka/ - is a knowledge base (in the AI sense) and has been deployed and used commercially for several years; but can be considered a large triple store, thus suitable for RDF/OWL storage. Parka has recently been released as open source software by Mindswap - http://www.mindswap.org/2002/parka/ - but is a little raw at present. They have been talking to me about this while I was working on the SWAD Europe storage report: http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/ and it looked interesting. Bijan arranged for one of the original technical developers on the project to visit on Wednesday for a runthrough of the system. The group is also working on DAML-S - http://www.daml.org/services/ - a DAML-based Web Service Ontology. This allows description of web services that can be tied to WSDL and used with SOAP and so on to create web services in a semantic web style. There is a new version 0.7 of the DAML-S ontology due out soon. Tuesday 2002-09-24 I was pleased to find out that Jim Hendler had made himself available to me all day; which was an unexpected delight. I gave him an outline of SWAD Europe work and the possible areas for collaboration we might have. It had already been noted that their work on the storage systems and mine on the report above and with my Redland system - http://www.redland.opensource.ac.uk/ - would be useful to do collaborate on. Benchmarks and standard datasets for testing stores are further possibilities. Mindswap also plan to use Redland software as an interface layer above Parka for using and abstracting from it, so that they can potentially swap out the backend to another datastore for testing, benchmarking; rather than build applications with a parka-only binding. The other main relationship with SWAD Europe work was in querying; since Jim et al had worked on DAML query languages and this effort was continuing. The SWAD Europe work on semweb QLs is extensive and they are keen for collaboration in developing use cases, documenting these things and moving them towards standardising them in due course. Jim, Bijan and I also discussed the W3C semantic web standards efforts and sketched out some solutions to tricky syntax problems that were being raised by WebOnt. It hopefully quickened the resolution since I could discuss pro-s and con-s of different solutions directly. The particular topic here was OWL's proposed use of elements outside rdf:RDF for doing ontology description. After seeing some of the desktop tools, I outlined the work I did on the MEG Registry project - http://137.222.34.57:6543/ - and the desktop schema creation tool created by Damian Steer that talks to it - http://www.ukoln.ac.uk/metadata/education/regproj/ This is something that I plan to show at the SWAD Europe workshop at the Dublin Core conference in Florence in mid-October. 18:00 CMSC 828y class - AI on the Web http://www.cs.umd.edu/users/hendler/CMSC828y/ Sat in on their class where Jim was going over the OWL guide document to introduce them to the ontology language. He found a load of errors, but it was a brand new draft document. See also http://www.w3.org/TR/2002/WD-owl-features-20020729/ The class wiki: http://www.mindswap.org/cgi-bin/webai/moin.cgi Wednesday 2002-09-25 09:00 Meeting about Parka with Merwyn Taylor http://www.cs.umd.edu/users/mtaylor/ who worked on Parka but now works at Johns Hopksins University. Also present were Bijan Parisa, Ron Alford and Ronald Reck. We went through a presentation (PPT slides) on how the system worked and discussed the built-in limitations, constraints and differences against the RDF triples model. Parka has no specific literal indexing in it; it hashes the entire string content. It seems actually that there is no distringuishing strings from URIs, that has to be imposed as a practice on top of the basic core. The knowledge base has special in-memory indexing of the subclass, subproperty, instancing relations providing quick indexing. This is done via static sized arrays which work fine for typical data but would need a recompile to extend them. Parka was optimised for quick response time - hence the fixed arrays and consideration of fast access to the disk, done by always operating in fixed sector sizes (4k). The KB is built on top of an internal relational store which provides simple and quick access. This was developed by students (including Merywn) as part of a database class and has been stable but there may be a new version if the class took it further. Merwyn said that they had tested Parka on Oracle but the overhead of using it via a query language (SQL) proved too much. We had a discussion of the limits in the parka database code, which are somewhat tied to it's frame-based approach on the data model. There is a 2.4M limit in the code on the number of distinct frames (subject URIs in RDF-speak); note frames are not assertions. This is mostly caused by using an integer for indexing - the 32 bits of the ingeger are split up, limiting the range substantially. The code does support "namespaces", although I'm not sure what that means or how it helps in this regard. The tables could also be "chunked", pointing to next table at the bottom of full ones. Other restrictions include how the library is using 4K pages, marked in a fixed array, tracking usage. This is stored in a 4K header page meaning a limit of (under) 32K pages. The page size could be increased, although performance is best when it matches the disk page size. There are also othe rmemory usage per frame to consider, such as the structural assertions (isa, instanceof, ..>) which could be rather a bloat if the data itself was mostly ontological or schema information. Some of the rdf/s data seen has this characteristic, but most is vastly more data than class and property relationships. Throughout the above discussion, I related the various aspects of it to what I had recently read on how TAP - http://tap.stanford.edu/ - solves them which addresses a similar problem area, but for a specific fixed schema. It uses MySql below as one of the storage mechanisms but also provides several in-memory and partially in-memory (mmap-ed) stores. TAP, like parka, provides optimised support for the structural assertions, at the RDF-schema level. After that discussion, this led onto how a better indexing library could solve some of these problems and reduce the complexity of parka. I suggested the backends could be MySQL, directly via it's C interface, not via SQL or possibly going direct to Berkeley/Sleepycat DB which is now one of the table tables that BDB allows. Merwyn then finished with an explanation of the parka query planning and evaluation. I outlined how Redland somewhat overlapped in some of Parka's activity but from an RDF point of view. 12:30 MINDSWAP weekly meeting Aditya Kalyanpur, Ron Alford, Amy Loomis, Ronald Reck, Matt Westhoff, H. Ross Baker, Mike Grove, Jennifer Golbeck, Bijan Parsia and me They all discussed what they'd been doing for the last week and then we had a demo of SMORE. I went over SWAD Europe in outline, the survey stuff I'd done/was doing and Redland. 14:00 Toru Ishida - Social Agents in Digital Cities -- http://www.lab7.kuis.kyoto-u.ac.jp/ Talk on the use of social agents that understand the conventions of social systems to foster human/human interaction. Uses interaction scenarios that are described in a language. Toru was just finishing a visit to the mindlab and had been sharing the office with Bijan. Thursday 2002-09-26 Kendall Clarke arrived, visiting Bijan - he writes for XML.com (as does Bijan), O'Reilly and does technical and other writing and editing -- http://clark.dallas.tx.us/kendall/ Aditya gave a demonstration of SMORE to Kendall and I, we commented on the user interface since there were several things that seemed unclear to us as we watched him. It allows you to markup some text (HTML) written as sentences into triple form, then browse and search existing ontologies to find appropriate terms for the relationships, classes. They can be dragged and dropped to form a description of the content in RDF/DAML+OIL (soon OWL) form. The resulting description can then be saved. Met with Ron Alford and discussed the details of implementing a storage backend to Redland, such as they proposed to do with Parka. Lots of looking at source code and explaining stuff that really should be in documentation somewhere ;) 12:00 Freedman's Bureau Project -- http://freedmensbureau.com/ Meeting with Harry Keeling and Jerry ? (Howard University, DC) -- http://www.founders.howard.edu/CEACS/Departments/CompSci/ along with Jen, Jim, Bijan, Toru and Kendall. This is a (proposed) digital library project based on taking the preserved paper records from the 1865- records made by the "The Bureau of Refugees, Freedmen and Abandoned Lands," which dealt with providing support for the newly freed people. The result is a large amount of microfiche, which can then give lots of images. The idea is to set up a system that allows multiple people to annotate and comment on the documents in the images; since there are lots of views or ways of approaching their description. Some basic ontologies would start it off, but the expectation would be to provide ways to have all interpretations representable. I was sure I'd heard of a similar project (scanning, images, annotations, Dublin Core) but couldn't remember or google for the references. 18:00 CMSC 828y class - AI on the Web http://www.cs.umd.edu/users/hendler/CMSC828y/ Next class and during it I gave an overview of RDF tools and resources on the web that they could use. The programming experience here seemed to be majority Java, some Python, 1 perl out of the approx 25 students. I also went through the details of some of the RDF bots work - logger and especially foafbot, which uses redland underneath, but in the detail of how it manages trust relationships. Also encouraged them to hang out on the #rdfig channel. Met for the first time, Jordan Katz, a high school student attending this graduate class, who has been working on displaying RDF via XSLT sheets to give multiple views of the document depending on the audience. Friday 2002-09-27 10:00 RDF Core Telecon Agenda: http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Sep/0329.html 11:20 Meeting with Ronald Reck Suggested various surveying things he could do on the datasets he had that would be useful to find out. Some of the parka restrictions might be worth expanding if common datasets had problems with them (large literals say). 14:00 Redland overview I gave a whiteboard outline of Redland and the state of the changes I was making to Ron Alford, Bijan, Kendall and Jim (partially). Some of this is partially done (iterators change) and some is planned (web, query, iostreams) and required to support other features. I tried to show how Parka, querying and relational backends would fit into the picture which Redland currently has some skeleton support for, but not complete. Saturday 2002-09-28 Travelling back via Dulles Airport - got a lot of hacking done on Raptor which seems to be reducing in bug count: http://www.redland.opensource.ac.uk/raptor/ Plus throughout the week lots of debugging various bits of code and helping them with python and php access to Redland as well as doing a little report writing (Monday only).
Received on Thursday, 3 October 2002 05:46:58 UTC