Re: RDF API from Sergey Melnik on 1999-11-15 (www-rdf-interest@w3.org from November 1999)

From: Sergey Melnik <melnik@DB.Stanford.EDU>
Date: Sun, 14 Nov 1999 23:08:40 -0800
To: Gabe Beged-Dov <begeddov@jfinity.com>
CC: RDF Interest Group <www-rdf-interest@w3.org>
Message-ID: <382FB178.3D801431@db.stanford.edu>
Thanks Gabe! That's exactly the kind of input I intended to provoke ;)

I think org.w3c.rdf.{stream|infoset|query} is a great way to structure
the package. 

Some remarks:

> org.w3c.rdf.stream
> ================
> Sergey has added stream processing to SiRPAC which is GREAT!  The downside
> I see to the stream processing API of SiRPAC is that it is a little high
> level. It already instantiates specific classes and passes them to the
> assert callback. I would prefer that there be a lower level callback
> interface ala SAX that passed back the strings rather than an encapsulation
> in org.w3c.rdf.Resource and friends.  Something like:
> 
> void assertResourceObject(String subject,
>                           String predicate,
>                           String object);
> void assertLiteralObject(String subject,
>                          String predicate,
>                          String object,
>                          String lang);

You are right, SiRPAC API used a different level of abstraction.
Resource and Literal (together with Property that I removed from the
API) were classes and instances of them were passed to an RDFConsumer.
Resource and Literal are now interfaces and instances of default classes
implementing them can be seen as pure "wrappers" for strings, i.e. typed
strings. If the consumer has different implementations of the ground
interfaces, it can simply instantiate new classes, e.g.:

void assert(Resource subject, Resource predicate, RDFNode object) {
  myModel.add( new MyTriple( createMyResource(subject.getURI()),
                             createMyResource(predicate.getURI()),
                             object instanceof Literal ?
createMyLiteral((Literal)object.getString())
                                                       :
createMyResource((Resource)object.get ) );
}

You have certain efficiency loss, but as to memory, the corresponding
strings have to be "materialized" anyway.

On the other hand, a string-based consumer exhibits the following
problem:

1) what if the RDF parser has to pass a DOM element to the consumer?
2) what if the lang parameter in your declarations will be removed in a
subsequent RDF spec? (it's flaky anyway...)

The implications would be that your very "basic" consumer interfaces
change and people's code breaks.

If you have a slightly higher level of abstraction (as the current one),
you could solve problem (1) easily by:

interface XMLLiteral extends Literal {

  public org.w3c.dom.Element getElement();
}

The parser would still pass an instance of Literal, not a string that
would have to be parsed again.
The same with the language. If it disappears, your consumer interface
remains intact.

As to implementation: in the current (unreleased) version of GINF, the
interface Model (see below) has methods 

createResource(String) and
createLiteral(String)

My abstraction of the parser is RDFMS, that offers two methods:

public void parse(InputSource source, Model empty);
public void serialize(Model model, Writer w);

The parser gets an empty model and calls model.createResource(),
model.createLiteral() exactly in the way I described it.

BTW: what's the deal with XML embedded in RDF anyway? Does anyone on
earth uses/needs this feature? I tried a Gedankenexperiment where I
imagined an RDF description containing XML literals with, say,
presentation markup in them, but I could not arrive at anything useful
having this combination.

So what about the following:

org.w3c.rdf.core: defines the core interfaces Literal, Resource, Triple
and Model
org.w3c.rdf.stream (isn't "stream" a bit misleading?): RDFConsumer etc.

> org.w3c.rdf.infoset
> ================
> The Infoset level corresponds to current wrapper objects of SiRPAC plus the
> ones that most of the RDF implementations provide. Here I would want
> something like what Mozilla supports with datasources (primitive and

I'm asking myself why the Mozilla folks defined such a verbose
datasource API. The current GINF Model interface has only *one* search
method:

Model find(Resource subject, Resource predicate, RDFNode object);

where a null parameter matches everything.

Mozilla's composite RDF sources [2] as well as commands look very
application specific to me. Note that the notion of a datasource has
inherent specificity in it. I prefer speaking of RDF models at a more
fundamental level. You can always have:

interface ObservableModel extends Model {
   void addObserver(...);
   ...
}
interface ActiveModel extends Model {
   void executeCommand(...);
   ...
}

> Infoset API and the infoset implementation . Mozilla's support for
> notification would also be very nice. In general, Mozilla has a mature but

Again, the Mozilla API has a farther reaching goal than a core RDF API.

> composite). I like the idea from Ron Daniel's RADIX that a model should
> itself be a resource that can appear as a node in another model.

In principle, every (Java) object could implement the Resource
interface. Why *must* a Model be Resource?

Here is my current Model API. In fact, it is similar in spirit to RADIX
[1] (comments are very short for clarity):

public interface Model {

  public void setURI(String uri);
  public String getURI(); // returns base URI

  public boolean contains(Triple t);
  public void add(Triple t);
  public void remove(Triple t);

  public int size();
  public Enumeration elements(); // enumeration of Triples
  public Model find(Resource subject, Resource predicate, RDFNode
object);

  public Model create(); // creates empty model of the same class
  public Model duplicate();

  public Resource createResource(String str);
  public Literal createLiteral(String str);
  public Triple createTriple(Resource subject, Resource predicate,
RDFnode object);

  public Model union(Model m);
  public Model difference(Model m);
  public Model intersection(Model m);
}

Implementing GroundModel

interface GroundModel extends Model {
}

could ensure that we really operate with a "dumb" set of triples,
whereas

interface VirtualModel extends Model {
   Model getGroundModel();
}

could serve as a basis for SchemaModel, ActiveModel etc.

I'm not sure about querying, but this could be done as

interface QueryableModel extends Model {
   Model executeQuery(Model query);
}

where the parameter query is an RDF model representing the query.


In my view, the basic Model interface should be as spartanic as
possible. Could we remove something else from it keeping it useful?


Sergey


[1] RADIX: http://www.mailbase.ac.uk/lists/rdf-dev/1999-06/0002.html
[2] Mozilla: http://www.mozilla.org/rdf/back-end-architecture.html
Received on Monday, 15 November 1999 02:04:06 UTC