Re: Progress on Ruby RDF/XML parser

On Tue Jul 08, 2008 at 11:42:27PM +0100, Dan Brickley wrote:
>
> Ben Adida wrote:
>>> extracting of RDF data from HTML documents with embedded RDFa.
>> Yes, that would be great :)
>> In the RDFa task force, I built an RDFa parser in Ruby as a way to 
>> validate the processing rules. It's probably awful Ruby (I'm still 
>> learning the language), and it needs a serious cleanup, but I'm happy to 
>> eventually contribute it to the group in whatever toolkit we end up 
>> putting together.
>
> Please ship the code, in whatever state! Ideally wired up to some tests, 
> but anything is a start.

indeed. pasting in emails is a start too

>
> I wrote to the author of http://code.google.com/p/ruby-rdfa/ earlier today. 
> No response yet. But it would be good to know what's been done already 
> before people start duplicating. For example, taking one of the Python RDFa 
> parsers (eg. http://dev.w3.org/2004/PythonLib-IH/pyRdfa/) would also be a 
> reasonable option; generally transliterating Python into Ruby is relatively 
> straightforward. But beginning from native Ruby would be better of 
> course...

also preferably multiple choices of deps. eg REXML and such was always a nightmare for me, i could never get Sam Ruby's or Mark Pilgrim's code working in the atom department

partially because REXML was broked on ruby 1.9.x but also beacuse it necessitated fairly verbose/manual manipulations

plus Hpricot has a liberal parser.. handing html in addition to validating XML (planet venus/mars has some sanitize step that obliterates the original source... no thanks)

didnt i already see a RDFa parser built on HPricot?

heres my old one for jQuery:

$.fn.rdfa = function() {
    function r(e) {return e.attr("about")||r(e.parent())}
    return $.map($('[@property]',this),function(e){
        return [[r($(e)),
                 $(e).attr("property"),
                 $.parseJSON(decodeURIComponent($(e).attr("content").replace(/\+/g," ")))]]}).concat(
                     $.map($('a[@rel]',this),function(a){
                         return [[r($(a)),
                                  $(a).attr("rel"),
                                  {uri:$(a).attr("href")}]]}))}


which isnt quite RDFa, but you could add the XMLNS searching on the cascade up.. and remove the json parsing of the 'object' of the statement using the xsd datatype instead.. 

if i could be bothered to read the entire spec id be able to say whether this solution (find stuff with @property and cascade up looking for the subject and down looking for the object) even makes sense

>
> cheers,
>
> Dan
>
> --
> http://danbri.org/

Received on Wednesday, 9 July 2008 04:06:33 UTC