- From: Gregg Kellogg <gregg@kellogg-assoc.com>
- Date: Wed, 30 Jun 2010 13:21:07 -0400
- To: "Hellekin O. Wolf" <hellekin@cepheide.org>
- CC: "public-rdf-ruby@w3.org" <public-rdf-ruby@w3.org>
- Message-ID: <A43C062F-4B5A-4953-9DDA-247FCDC4F865@kellogg-assoc.com>
I agree, I think that RDF::Reader.for needs to be somewhat smarter. * The symbol case is limited to using an element of the classname (e.g. RDF::RDFXML => :rdfxml). It would be nice to specify alternate symbols (e.g., :rdf). Of course, this can be done through for(:extension => "rdf"). * RDF::Reader.open, when loading a remote resource, should look at the returned Mime-Type to do a format match, rather than requiring it be provided explicitly. Arto seems to be of the opinion that this is done via LinkedData, but it seems to be a fair thing to do directly in RDF.rb * I believe that Format specifications should also provide a RegExp to match against the beginning of the content (I use the first 1000 bytes in RdfContext). This would be used within RDF::Reader.open in case a format couldn't be found through other uses, consider the following: # Heuristically detect the input stream def detect_format(stream) # Got to look into the file to see if stream.respond_to?(:rewind) stream.rewind string = stream.read(1000) stream.rewind else string = stream.to_s end case string when /<(\w+:)?RDF/ then :rdfxml when /<\w+:)?html/i then :rdfa when /@prefix/i then :n3 else :ntriples end end This could instead be found by looping through available Format subclasses and looking for a #match method. Within RDFXML::Format, I could perform the following: class Format < RDF::Format MATCH = %r(<(\w+:)?RDF)) content_type 'text/turtle', :extension => :ttl content_type 'text/n3', :extension => :n3 content_encoding 'utf-8' reader { RDF::N3::Reader } writer { RDF::N3::Writer } def match(content) content.to_s.match(MATCH) end end In RDF::Reader.open, first look for a reader using the options. Then, failing that, open the file and look for a mime-type, failing that, loop through Format instances and see if the Format matches the string content. In most cases, this will do what the user expects. Gregg On Jun 30, 2010, at 2:03 AM, Hellekin O. Wolf wrote: Hi, I was looking into supporting more formats for FOAFSSL-ruby, including the recently released rdf-rdfa and rdf-n3 gems. But what I found looks like hell: - there doesn't seem to be a reliable way of discovering the FOAF file format, - different formats will fail with different errors, - when no format is given, RDF::Graph won't detect the right one (and give unpredictable results) The original way of doing it in FOAFSSL-ruby is to try it, and fallback to a different format on failure. It works, but it's so ugly my grand-mother died. When I tried to add new formats, I had to find another solution. I went for the following (ugly) algorithm (now, my grand-mother is already dead): 1. lookup the file extension in the given WebID 2. lookup the Content-Type after an HTTP HEAD to the WebID 3. GET the file and identify it from its contents 4. fail if the format isn't known by now. That gives a pretty good image of a house of cards, if any. Any idea how to deal properly with auto-discovery of formats? == hk
Received on Wednesday, 30 June 2010 17:22:12 UTC