W3C home > Mailing lists > Public > www-rdf-interest@w3.org > October 2002

Re: AW: (SeWeb) KAON - KArlsruhe ONtology and Semantic Web Infrastructure

From: Stephen Reed <reed@cyc.com>
Date: Thu, 10 Oct 2002 14:33:11 -0500 (CDT)
To: Jeremy Carroll <jjc@hplb.hpl.hp.com>
cc: Alexander Maedche <Maedche@fzi.de>, "John F. Sowa" <sowa@bestweb.net>, <www-rdf-logic@w3.org>, <www-rdf-interest@w3.org>, <seweb-list@cs.vu.nl>, <kaw@swi.psy.uva.nl>
Message-ID: <Pine.LNX.4.33.0210101211040.2786-100000@crapgame.cyc.com>


I used the jena RDF parser (ARP) to parse the Open Directory Project
RDF structure file in DAML form.  As the original RDF file is not
RDF-compliant, I first translated it into DAML with a java
string-substitution program.  The resulting document is a DAML file of
over 4 million triples and 400 thousand terms.  For my OpenCyc
importation experiments I used a small subset of the Open Directory
Project RDF structure for "kids and teens".


I am pleased with the Jena/ARP speed as importing the triples into OpenCyc
is by far the dominant time spent.  I especially like the streaming nature of
Jena/ARP as it allows processing of very large RDF/DAML documents without
having to build an in-memory model (which would overflow available java
virtual memory).  I simply add forward referenced objects to Cyc's
knowledge base as named entities as returned by ARP and await their later
full definition in the input DAML document. Jena/ARP's convenient
identification of anonymous nodes facilitated my handling of DAML
restriction objects.

My java source code is available from OpenCyc's CVS repository at



On Thu, 10 Oct 2002, Jeremy Carroll wrote:
> Alexander Maedche wrote:
> > With respect to existing RDF parsers we were
> > confronted with serious performance problems.
> > Thus, we implemented a new one being compliant
> > to the W3C specification.
> As the developer of the Jena RDF parser (ARP) I read this paragraph with
> interest.
> I am aware that my work has some performance issues; however I have
> never had a user request to work on the performance. Our analysis has
> been that a typical RDF application spends a relatively small percentage
> of time in parsing. Thus we have put our development effort in an
> emphasis on correct behaviour, tracking the RDFCore WG recommendations
> and passing all the new RDF Core parser test cases.
> There are at least two major optimizations missing from the Jena parser:
> - in lax mode, switiching off the extensive error checking rather than
> merely switching off error messages
> - using the Xerces pull parsing interface to allow single threaded
> operation (while retaining the architectural advantages of the coparsing
> design of the Jena parser)
> I would welcome changes to the Jena code to include these improvements
> from anyone who is interested in faster, correct RDF parsing. I look
> forward to greater cooperation between the community of open source
> semantic web developers.
> At the moment the Jena team would welcome ideas about open source (BSD
> license compatible) reasoners that can cope with large subsets of DAML
> or OWL.

Stephen L. Reed                  phone:  512.342.4036
Cycorp, Suite 100                  fax:  512.342.4040
3721 Executive Center Drive      email:  reed@cyc.com
Austin, TX 78731                   web:  http://www.cyc.com
         download OpenCyc at http://www.opencyc.org
Received on Thursday, 10 October 2002 15:33:30 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 22:44:38 UTC