Re: RDF API convergence?

>>>Jason Diamond said:
> [re: push versus pull]

This was something I thought about when implementing Redland, and in
a language without decent, portable threading such as C [or Java,
threading in Java is awful], you can't have an active thread of
control at more than once place, so you have to compromise.  For
managing streams of statements generated by de/serialising models, I
created the stream abstraction which handled the data flow
interaction, pulled by the reader.  This seems to work OK.

> [libxml push/pull interface discussion - snipped]

> Microsoft's XmlReader, on the other hand, is more like reading from a
> stream. You loop until it returns EOF. Each iteration through the loop gets
> you a new XmlNode (Element, EndTag, PI, Text, etc) to play with. It doesn't
> need to load the entire tree into memory so it's just as efficient as SAX
> and expat but easier to program against (mostly since you can keep your
> state in local variables as opposed to members of the class that receives
> the callbacks). This is what I modelled RDFReader after. I've been itching
> to modify expat to provide the same functionality.

For RDF/XML parsers, I can see why pushing and pulling interfaces
would be useful, e.g. today just for fun I just used Repat and
Redland to parse 1/6 of the 600M of dmoz RDF data before it hit a
mis-aligned tag and stopped.  It consistently used a small amount of
memory, since it was not storing anything in memory, just what you
need for that size of data.  You wouldn't want a DOM like way (ROM?,
W[eb]OM?) where it was all stored in memory and then made available.

However for small data (say standalone RDF/XML docs) you might want
want a DOM-like view, since it would be more convienient to work with.

> I currently favor the resource-centric view. I think most developers today
> who are used to OO programming would find it more familiar as well. But the
> statement-centric model is more appropriate for logic and inferencing.

However the formal model is defined in terms of statements - fun
isn't it!  I think it is easy to write the resource-centric API
around statements which makes practical sense since all the proposed
storage systems for RDF are also based on statements.

It didn't take me long to write a nice resource-centric RSS 1.0
module in perl for Redland.

> [snip] Should we try to
> combine all of these tools into one large API? I hope not. I think that most
> developers are pretty sharp and are capable of adapting to whatever API
> they're presented with. I'm not saying we should endorse a free-for-all in
> the RDF API land. Maybe we can identify, though, a core set of requirements
> that APIs trying to provide a specific model should provide (a la the
> Infoset). I don't really see any need to provide actual language-level
> bindings.

An RDF InfoSet - funny you should say that, some of us were
discussing what that would mean recently.  The contention is that the
processing of RDF/XML syntax generally looses information (namespace
prefixies, aboutEachPrefix, xml:lang, ...) which is bad and the
output should be defined in terms of the Information Items expected
with no information loss.

Dave
 

Received on Saturday, 11 November 2000 16:20:56 UTC