- From: cdr <_@whats-your.name>
- Date: Sun, 4 May 2008 06:28:51 -0400
- To: public-rdf-ruby@w3.org
havent tested on all feeds yet, qname resolver for absolute URIs on the predicates is another line but im lazy.. later versions @ http://repo.or.cz/w/element.git?a=blob;f=ruby/W/simple.rb require 'open-uri' class String def parseFeed scan(%r{<(rss:|atom:)?(item|entry)([\s][^>]*)?>(.*?)</\1?\2>}mi){|m| # item u = m[2] && (u=m[2].match /about=["']?([^'"]+)/) && u[1] || m[3].match(/id>([^<]+)/)[1] # URI m[3].scan(%r{<([a-z:]+)?link ([^>]+)>}mi){|e|yield u,e[1].match(/rel=['"]?([^'"\s]+)/)[1],e[1].match(/href=['"]?([^'"\s]+)/)[1]} # link m[3].scan(%r{<([a-z:]+)([\s][^>]*)?>(.*?)</\1>}mi){|e|yield u,e[0].split(/:/)[-1],e[2][0..64]}} end # element end irb(main):175:0> open('http://mt-shortwave.blogspot.com/feeds/posts/default').read.parseFeed{|s,p,o|puts [s,p,o[0..64]].join "\t"} tag:blogger.com,1999:blog-28878961.post-792620716806050191 alternate http://mt-shortwave.blogspot.com/2008/04/bbc-radio-chief-rejects- tag:blogger.com,1999:blog-28878961.post-792620716806050191 replies http://mt-shortwave.blogspot.com/feeds/792620716806050191/comment tag:blogger.com,1999:blog-28878961.post-792620716806050191 self http://www.blogger.com/feeds/28878961/posts/default/7926207168060 tag:blogger.com,1999:blog-28878961.post-792620716806050191 edit http://www.blogger.com/feeds/28878961/posts/default/7926207168060 tag:blogger.com,1999:blog-28878961.post-792620716806050191 id tag:blogger.com,1999:blog-28878961.post-792620716806050191 tag:blogger.com,1999:blog-28878961.post-792620716806050191 published 2008-04-28T08:33:00.000-07:00 tag:blogger.com,1999:blog-28878961.post-792620716806050191 updated 2008-04-28T08:38:24.327-07:00 tag:blogger.com,1999:blog-28878961.post-792620716806050191 title BBC radio chief rejects calls to privatise Radio 1 and Radio 2 tag:blogger.com,1999:blog-28878961.post-792620716806050191 content <a href="http://bp0.blogger.com/_eFGtrBi5YL8/SBXvNzdm8pI/AAAAA tag:blogger.com,1999:blog-28878961.post-792620716806050191 author <name>Gayle</name> * had some problems with the existing feed libs - nothing works on Ruby 1.9 except Simple-RSS and Raptor via Redland-bindings. the Ruby port of Mark Pilgrim's feed parser is close to 200K of source excluding tests - unsurprised something(s) broken.. - Raptor/Redland-bindings is a pain to build on shared hosts, plus im getting symbol-resolution errors linking the latest release versions even on a nice box, plus it segfaults and/or screws up on some nasty feeds, plus it doesnt work on JRuby or Rubinius, plus it requires SWIG and a compiler and -dev libs.. - Simple-RSS does things i dont want/need: creating an intermediary hash from the found 'triples' which id have to deconstruct back into the triples to begin with, plus it turns the strings into ruby objects (requiring more libs and clobbering the original content) when i just wanted the strings to begin with. plus its got a hardcoded set of tags to look for. plus it misses the <link rel= tags from Atom feeds
Received on Sunday, 4 May 2008 10:29:40 UTC