Re: Enumerating RDF::Format

On May 13, 2012, at 2:57 PM, Nicholas Humfrey wrote:

> 
> On 13/05/2012 21:12, "Gregg Kellogg" <gregg@greggkellogg.net> wrote:
>> Hi Nick!
>> 
>> On May 13, 2012, at 4:33 AM, Nicholas Humfrey wrote:
>> 
>>> Hello,
>>> 
>>> I have been using RDF.rb gem as part of dbpedialite.org for a couple of
>>> years now. I opened an issue about enumerating the registered list of
>>> RDF::Formats:
>>> 
>>> https://github.com/bendiken/rdf/issues/16
>> 
>> Sorry, I wasn't aware of the outstanding issue. As you probably know, the
>> active gem is maintained on my fork (http://github.com/gkellogg/rdf), and Arto
>> hasn't been too responsive at keeping in sync.
>> 
>>> Time has passed and this still isn't possible, so I thought I would try and
>>> come up with a patch. Specifically, I want iterate through the registered
>>> list of formats and get:
>>> 
>>> * The format name (eg 'RDF/XML' or 'N-Triples')
>>> * The default (most official) content type (eg 'text/turtle')
>>> * The default file suffix (eg .rdf or .trix)
>> 
>> I do something like this in my RDF Distiller
>> (http://rdf.greggkellogg.net/distiller), where I need to present possible
>> input and output formats using basically the following:
>> 
>>    RDF::Format.each.to_a.map(&:reader).compact.map(&:to_sym)
>> 
>> When loaded up with the linked data gem, this generates the following:
>> 
>>    [:ntriples, :nquads, :jsonld, :json, :microdata, :n3, :n3, :rdfa, :rdfa,
>> :rdfa, :rdfa, :rdfa, :rdfxml, :trig, :trix, :turtle, :turtle]
>> 
>> You can do the same thing with &:writer to get the list of available writers:
>> 
>>    [:ntriples, :nquads, :jsonld, :json, :n3, :n3, :rdfa, :rdfa, :rdfa, :rdfa,
>> :rdfa, :rdfxml, :trig, :trix, :turtle, :turtle]
>> 
>> (basically, the same, but without a microdata writer).
>> 
>> In the case of the Distiller, it uses either content-negotiation, or file
>> extension to figure out the appropriate format to use. The sinatra-linkeddata
>> gem can do this for you, or the sparql gem if you want to have a SPARQL
>> endpoint too:
>> 
>>    require 'sinatra-respond_to'
>>    require 'sinatra-linkeddata'
>> 
>>    register Sinatra::RespondTo
>>    register Sinatra::LinkedData
>> 
>> This will then respond based on either the Accept header or the file extension
>> and format the RDF::Queryable results using the appropriate writer.
>> 
>> Check out http://github.com/gkellogg/rdf-distiller and
>> http://github.com/gkellogg/github-lod for some examples of doing this.
>> 
>>> This is used to generate a <link rel="alternate"> and hyperlinks to other
>>> formats in the HTML page.
>> 
>> You could do this with a variation of the previous Format.each clause:
>> 
>>    RDF::Format.file_extensions.keys
>> 
>> This will give you the file extensions of all loaded formats, which RespondTo
>> should dispatch on.
>> 
>> Also, note that the RDF::Reader.open() will look at various things, including
>> content type, file extension, specified format, and if necessary, content
>> sniffing to try to find an appropriate reader. It's pretty good, but could be
>> improved upon further, particularly for HTML serializations.
>> 
>>> Unfortunately the internal data structures currently make this difficult and
>>> there is no name for a format stored (other than deriving it from the class
>>> name). For the time being I have decided to resort to storing it in my own
>>> data structure:
>>> http://github.com/njh/dbpedialite/blob/master/lib/formats.rb
>> 
>> Yes, the internal structures don't make this easy. I'd certainly entertain a
>> reasonable patch that made this easier. Perhaps just exposing the code I show
>> here as RDF::Format class methods would be useful.
> 
> Hi Gregg!
> 
> Thanks for the very helpful email. I will take a look at making a patch - it
> did try some similar things to what you have suggested but it felt quite
> hacky and I got some weird results - such as Turtle appearing multiple
> times.

Yes, you get multiple results because of the way the internal datastructures are presented, and that there are sometimes other Formats that are equivalent (e.g. RDF::RDFa::Format, RDF::RDFa::Lite, RDF::RDFa::HTML, RDF::RDFa::XHTML, which exist so you can do a RDF::Format.for(:html), and get something reasonable. Adding a #uniq to the results takes care of this.

> Do you think it would make sense to add a new 'name' property to formats? I
> would really like to be able to display 'RDF/XML' in the list of alternative
> formats. I find it a bit weird that specific formats aren't more like
> instances of the class RDF::Format.

Good idea, it could default based on the class name. This would allow names like "N-Triples" and "RDF/XML".

> Perhaps your fork should become the official repo for the gem? Lots of gems
> have done that now...

Yes, I either think mine should be the primary, or we should create an organization to hold the repos. Unfortunately, Arto's the repo owner, and hasn't been responsive to do things like this. He also is responsible for http://rdf.rubyforge.org, which is also several steps out of date.

Arto, if you're listening, it's time to take some action.

I'd suggest we create an organization such as "RDF-Ruby", "Ruby-RDF", or some such and transfer the ownership for the various gems here. I'm certainly willing to transfer all of my gems there, if we can get the others pushed over too; specifically the following:

* https://github.com/bendiken/rdf,
* https://github.com/bendiken/rdf-spec,
* https://github.com/bendiken/sparql-client,
* https://github.com/bendiken/sxp-ruby,
* https://github.com/bendiken/rdf-trix,
* https://github.com/bendiken/rdf-json
* https://github.com/bhuga/rdf-do
* https://github.com/bhuga/spira
* https://github.com/datagraph/rack-linkeddata
* https://github.com/datagraph/sinatra-linkeddata
* https://github.com/datagraph/linkeddata

I'm basically the primary developer on all of these now, releasing updates to rubygems when I'm able to.

We could also use ghpages to reference documentation, although I currently manage a roll-up of the documentation at http://rdf.greggkellogg.net/yard/index.html, I'd also be happy to create some form of organization to host the distiller and other information about the Ruby SemWeb/LinkedData organization; perhaps along the lines of http://rdfa.info.

Gregg

> nick.
> 
> 
> http://www.bbc.co.uk/
> This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
> If you have received it in error, please delete it from your system.
> Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
> Please note that the BBC monitors e-mails sent or received.
> Further communication will signify your consent to this.
> 					

Received on Sunday, 13 May 2012 22:20:14 UTC