- From: carmen <_@whats-your.name>
- Date: Tue, 14 Sep 2010 15:44:28 +0000
- To: public-rdf-ruby@w3.org
i figure i'd share my approach essentially the problem is, given a stream of triples, transform it somehow.. Triple -> Triple is out as a type signature, as you might want to return multiple triples (merging to SIOC but keeping original, or adding provenance..) or 0 triples. (grep/filtering, access-control,bandwidth-saving for API/views requiring only certain fields) Triple -> [Triple] is a possible typesig applied to [t,t.t.t] results in [[t,t],[],[t],[t,t,t]], rules out #flatten as solution to cleaning up, i'd like to not rule out Array to hold a triple's components ive got a 1GB NT file that RDF::NTriples is going to provide, triple by triple NTriples('stream.nt) do |s,p,o| actA, actB, actC.. end also i'm using RDF files as configuration, and a triple serves as a name of a source file to provide (again too much to comfortably load into RAM) the array creation Triple->[Triple] entails is not a solution to these problems (a simple grep() style scenario on triplestreams that never return more than they are input, + a clever composition function (internally using #concat/*splat and intermediate output buffer(s)) on a sequence of filterer functions could at least support the first case (notnecessarily w/ the comfort of knowing it wont blow up in your face w OOM-kills tho), maybe this is enough for most devs, i dont know) ruby blocks make this relatively comfortable (without having to go full-hog into Iteratee[1]/Conal-land[2]) so i'd like to just use ruby 1.8 features, without building a custom "pipeline" library on Ruby coroutiness in 1.9 (Aka Fibers) this is how it works def filterTriples *a send(a) do |s,p,o| yield s',p',o' end send() instantiates the first argument (a closure/stack-frame/whathaveyou), recursing until an entire pipeline is setup triples are yielded through the stack. parsing a raw feed, normalizing various date fields to DC::Date, running thru a stdlib parser/serialier so its iso8601 clean, converting various predicates to SIOC before yielding up to whatever may want it: the two typesignatures reflect sources vs filters. sinks (not pictured here) provide blocks but no yield # tripleStream def feed &f dateNorm :feedSIOCize,:feedRaw,&f end # tripleStream def feedRaw &f read.extend(FeedParse).parse &f end # tripleStream -> tripleStream def dateNorm *f send(*f){|s,p,o| yield *({E::RSS+'pubDate' => true, E::Date => true, E::Purl+'dc/elements/1.1/date' => true, Atom+'published' => true, Atom+'updated' => true }[p] ? [s, Date, Time.parse(o).utc.iso8601] :[s,p,o])} end # tripleStream -> tripleStream def feedSIOCize *f send(*f){|s,p,o| yield s, { Purl+'dc/elements/1.1/creator' => Creator, Purl+'dc/elements/1.1/subject' => SIOC+'subject', Atom+'author' => Creator, RSS+'description' => Content, Atom+'content' => Content, RSS+'title' => Title, Atom+'title' => Title, }[p]||p, o } end anyways.. [1] http://okmij.org/ftp/Streams.html [2] http://conal.net/papers/
Received on Tuesday, 14 September 2010 15:45:05 UTC