- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Sat, 11 Nov 2000 21:20:55 +0000
- To: www-rdf-interest <www-rdf-interest@w3.org>
>>>Jason Diamond said: > [re: push versus pull] This was something I thought about when implementing Redland, and in a language without decent, portable threading such as C [or Java, threading in Java is awful], you can't have an active thread of control at more than once place, so you have to compromise. For managing streams of statements generated by de/serialising models, I created the stream abstraction which handled the data flow interaction, pulled by the reader. This seems to work OK. > [libxml push/pull interface discussion - snipped] > Microsoft's XmlReader, on the other hand, is more like reading from a > stream. You loop until it returns EOF. Each iteration through the loop gets > you a new XmlNode (Element, EndTag, PI, Text, etc) to play with. It doesn't > need to load the entire tree into memory so it's just as efficient as SAX > and expat but easier to program against (mostly since you can keep your > state in local variables as opposed to members of the class that receives > the callbacks). This is what I modelled RDFReader after. I've been itching > to modify expat to provide the same functionality. For RDF/XML parsers, I can see why pushing and pulling interfaces would be useful, e.g. today just for fun I just used Repat and Redland to parse 1/6 of the 600M of dmoz RDF data before it hit a mis-aligned tag and stopped. It consistently used a small amount of memory, since it was not storing anything in memory, just what you need for that size of data. You wouldn't want a DOM like way (ROM?, W[eb]OM?) where it was all stored in memory and then made available. However for small data (say standalone RDF/XML docs) you might want want a DOM-like view, since it would be more convienient to work with. > I currently favor the resource-centric view. I think most developers today > who are used to OO programming would find it more familiar as well. But the > statement-centric model is more appropriate for logic and inferencing. However the formal model is defined in terms of statements - fun isn't it! I think it is easy to write the resource-centric API around statements which makes practical sense since all the proposed storage systems for RDF are also based on statements. It didn't take me long to write a nice resource-centric RSS 1.0 module in perl for Redland. > [snip] Should we try to > combine all of these tools into one large API? I hope not. I think that most > developers are pretty sharp and are capable of adapting to whatever API > they're presented with. I'm not saying we should endorse a free-for-all in > the RDF API land. Maybe we can identify, though, a core set of requirements > that APIs trying to provide a specific model should provide (a la the > Infoset). I don't really see any need to provide actual language-level > bindings. An RDF InfoSet - funny you should say that, some of us were discussing what that would mean recently. The contention is that the processing of RDF/XML syntax generally looses information (namespace prefixies, aboutEachPrefix, xml:lang, ...) which is bad and the output should be defined in terms of the Information Items expected with no information loss. Dave
Received on Saturday, 11 November 2000 16:20:56 UTC