librdf architecture

I'm working out more of my librdf Application Framework design as
previously mentioned:
  http://lists.w3.org/Archives/Public/www-rdf-interest/2000Jun/0082.html

The current part of the architecture I'm working on is the model /
storage / parsing / streaming part which goes something like this:
(remember written in C with objects done by hand)

Base Classes:
  class Statement - triples (resource, property, object)

  class Model - a set of Statements with a link to a Storage ...
    many methods, most passed on to Storage

  class Storage - knows how to store/retrieve Statements using identifiers
    lots of methods such as:
    method add_statement (in Statement, out identifier, ...)
      - returns a storage specific identifier (URI) that can be used
        to get the statement later
    method remove_statement (in identifier, ...)
    method get_statements (out stream of Statements)
    method find (in Node subject, in Node predicate, in Node object, out stream of Statements)
    method find (in Node subject, in Node predicate, in Node object, in/out Model)
    ...
 
Then we add:
  class XML DOM Parser - builds an in-memory DOM representation
    constructor (...)
    method init(in XML content)
    method get_dom (out in-memory DOM representation)

  class XML SAX Parser - generates SAX-like events
    constructor (...)
    method init(in stream of XML content)
    method register_sax_event_1 (in function) 
    ...
    method register_sax_event_<n> (in function) 

  class RDF Parser
    constructor (...)
    method parse_xml_events(in/out model, in XML SAX Parser)
    method parse_xml_tree(in/out model, in XML DOM Parser)


so you can do things like this:

   storage = new Storage (use Berkeley DB V2 please, ...)
   model = new Model (storage)
   rdf = new RDF Parser(...)
   www = new URI Resolver (XML content URI)

   if (using DOM model) {
     /* everything constructed in memory - better be small */
     xml_dom = new XML DOM Parser(...)
     xml_dom->init(www->get_as_string)
     rdf->parse_xml_tree(model, xml_dom)
     delete xml_dom
   } else if (using SAX-like event model) {
     xml_sax = new XML SAX Parser(...)
     xml_sax->init(www->get_as_stream)
     rdf->parse_xml_events(model, xml_sax)
     delete xml_sax
   } else if (using standalone RDF parser) {
     ... more thought needed here ...
     rdf_stream = new Stream ("command for standalone parser to emit triples")
     rdfutil.add_statements_to_model_from_stream_of_triples(model, rdf_stream)
   }
   delete www

   ... do stuff with model ...


Comments and questions are welcome.  Here are some of the things I'm
curious about.

* Should the Storage method find always generate a new model or is
  returning a set of statements as a stream OK?  Both seem useful
  to me.

* Should the Storage method find allow the passing in of any model in
  which to store the matching statements or allocate a new one model
  which is appropriate for the Storage?  (This might be a useful
  optimisation e.g. creating SQL views over SQL queries and returning
  a new model representing that view of the storage)

* Are people really going to need both DOM and SAX XML interfaces?

* Is there really just one RDF parser class or should the class for
  RDF parsers that read DOMs be separate from the one that handles
  SAX events?

Dave

Received on Thursday, 6 July 2000 11:27:48 UTC