W3C home > Mailing lists > Public > public-rdf-ruby@w3.org > June 2010

Re: RDF.rb and format discovery

From: Gregg Kellogg <gregg@kellogg-assoc.com>
Date: Wed, 30 Jun 2010 13:21:07 -0400
To: "Hellekin O. Wolf" <hellekin@cepheide.org>
CC: "public-rdf-ruby@w3.org" <public-rdf-ruby@w3.org>
Message-ID: <A43C062F-4B5A-4953-9DDA-247FCDC4F865@kellogg-assoc.com>
I agree, I think that RDF::Reader.for needs to be somewhat smarter.

 *   The symbol case is limited to using an element of the classname (e.g. RDF::RDFXML => :rdfxml). It would be nice to specify alternate symbols (e.g., :rdf). Of course, this can be done through for(:extension => "rdf").
 *   RDF::Reader.open, when loading a remote resource, should look at the returned Mime-Type to do a format match, rather than requiring it be provided explicitly. Arto seems to be of the opinion that this is done via LinkedData, but it seems to be a fair thing to do directly in RDF.rb
 *   I believe that Format specifications should also provide a RegExp to match against the beginning of the content (I use the first 1000 bytes in RdfContext). This would be used within RDF::Reader.open in case a format couldn't be found through other uses, consider the following:

# Heuristically detect the input stream
def detect_format(stream)
  # Got to look into the file to see
  if stream.respond_to?(:rewind)
    string = stream.read(1000)
    string = stream.to_s
  case string
  when /<(\w+:)?RDF/  then :rdfxml
  when /<\w+:)?html/i then :rdfa
  when /@prefix/i     then :n3
  else                     :ntriples

This could instead be found by looping through available Format subclasses and looking for a #match method.  Within RDFXML::Format, I could perform the following:

class Format < RDF::Format
  MATCH = %r(<(\w+:)?RDF))

  content_type     'text/turtle', :extension => :ttl
  content_type     'text/n3', :extension => :n3
  content_encoding 'utf-8'

  reader { RDF::N3::Reader }
  writer { RDF::N3::Writer }

  def match(content)

In RDF::Reader.open, first look for a reader using the options. Then, failing that, open the file and look for a mime-type, failing that, loop through Format instances and see if the Format matches the string content.

In most cases, this will do what the user expects.


On Jun 30, 2010, at 2:03 AM, Hellekin O. Wolf wrote:


I was looking into supporting more formats for FOAFSSL-ruby, including
the recently released rdf-rdfa and rdf-n3 gems.

But what I found looks like hell:

- there doesn't seem to be a reliable way of discovering the FOAF
file format,
- different formats will fail with different errors,
- when no format is given, RDF::Graph won't detect the right one (and
give unpredictable results)

The original way of doing it in FOAFSSL-ruby is to try it, and
fallback to a different format on failure.  It works, but it's so ugly
my grand-mother died.  When I tried to add new formats, I had to find
another solution.

I went for the following (ugly) algorithm (now, my grand-mother is
already dead):

1. lookup the file extension in the given WebID
2. lookup the Content-Type after an HTTP HEAD to the WebID
3. GET the file and identify it from its contents
4. fail if the format isn't known by now.

That gives a pretty good image of a house of cards, if any.

Any idea how to deal properly with auto-discovery of formats?

Received on Wednesday, 30 June 2010 17:22:12 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:02:14 UTC