RE: RDF API convergence? from Jason Diamond on 2000-11-11 (www-rdf-interest@w3.org from November 2000)

From: Jason Diamond <jason@injektilo.org>
Date: Sat, 11 Nov 2000 14:45:56 -0800
To: "www-rdf-interest" <www-rdf-interest@w3.org>
Message-ID: <LAEMKGDBDFAKFNKPFEKLEEHDCMAA.jason@injektilo.org>
> This was something I thought about when implementing Redland, and in
> a language without decent, portable threading such as C [or Java,
> threading in Java is awful], you can't have an active thread of
> control at more than once place, so you have to compromise.  For
> managing streams of statements generated by de/serialising models, I
> created the stream abstraction which handled the data flow
> interaction, pulled by the reader.  This seems to work OK.

That's exactly the kind of abstraction that I was talking about. And you
even did it in C!

> For RDF/XML parsers, I can see why pushing and pulling interfaces
> would be useful, e.g. today just for fun I just used Repat and
> Redland to parse 1/6 of the 600M of dmoz RDF data before it hit a
> mis-aligned tag and stopped.  It consistently used a small amount of
> memory, since it was not storing anything in memory, just what you
> need for that size of data.  You wouldn't want a DOM like way (ROM?,
> W[eb]OM?) where it was all stored in memory and then made available.

How did you convert the dumps to real RDF? Does anyone know why they haven't
converted over yet?

> However for small data (say standalone RDF/XML docs) you might want
> want a DOM-like view, since it would be more convienient to work with.

I agree. Manipulating an RSS model, for example, would be much more
convenient if you could load it into something similar to a DOM. Of course,
we're not talking about _the_ DOM, but an in-memory RDF graph similar to
Redland's RDF Model Class. Populating the graph with statements from
serialized XML should ideally bypass loading the XML into a DOM and use
either a simpler push or pull based parser. Loading the data into one object
model just to convert it into another simply wastes cycles. If you already
had a DOM, however, you should be able to "read" your statements from it
using abstractions like your RDF Statement Stream Class and my RDFReader
class.

I haven't quite been able to wrap my head around how one would use an RDF
model like the DMOZ dumps without loading it into memory or importing it
into a database that you could query against. The question I have is this:
Can an API be devised that abstracts away whether or not the model is loaded
into memory or persisted in a database and still be useful to us as
developers?

> > I currently favor the resource-centric view. I think most
> developers today
> > who are used to OO programming would find it more familiar as
> well. But the
> > statement-centric model is more appropriate for logic and inferencing.
>
> However the formal model is defined in terms of statements - fun
> isn't it!  I think it is easy to write the resource-centric API
> around statements which makes practical sense since all the proposed
> storage systems for RDF are also based on statements.

I remember reading (but can't recall where) that the RDF model is actually
extremely close to the relational model. The author pointed out that columns
are like properties and field like objects where the primary key in each row
was the subject. This obviously doesn't take into consideration repeated
properties and a number of other issues but it did open my mind up to start
thinking about storing a RDF in a more resource-centric manner. One of my
goals for RDF.NET is to explore this approach and see where it ends up.

> An RDF InfoSet - funny you should say that, some of us were
> discussing what that would mean recently.  The contention is that the
> processing of RDF/XML syntax generally looses information (namespace
> prefixies, aboutEachPrefix, xml:lang, ...) which is bad and the
> output should be defined in terms of the Information Items expected
> with no information loss.

Supposedly, the RDF model is so wonderfully simple that it doesn't need an
Infoset. It's all just triples! We all know, however, that that just isn't
true. I thought that David Megginson did a good job of identifying the
components of a statement (as implied by RDF M&S 1.0): SubjectType, Subject,
Predicate, ObjectType, Object, and Language. Both repat and RDFReader have
taken that approach and I think it works well though I would have preferred
the simpler model that was advertised.

Jason.
Received on Saturday, 11 November 2000 17:48:54 UTC