- From: Stephane Corlosquet <scorlosquet@gmail.com>
- Date: Wed, 27 Jan 2010 21:25:59 -0500
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org" <public-lod@w3.org>
- Message-ID: <1452bf811001271825i317f13a4h720b3fcf393271ca@mail.gmail.com>
Hugh, The ARC2 parser has a "built-in RDF format detector" [1]. You might want to look at the code to see how it's done. Why not using the --guess option of rapper? Steph. [1] http://arc.semsol.org/docs/v2/parsing On Wed, Jan 27, 2010 at 9:08 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote: > On 27/01/2010 09:49, "Tom Heath" <tom.heath@talis.com> wrote: > > > +1 for Moriarty, whether you're working with the Platform or not. Ian > > and the other contributors have done a great job - personally I'd > > start here before writing any new code. > Too true mate. > > Now my next bit of pissing about. > Before writing it (if I can find the gumption). > Don't think this is in Moriarty, as the Talis Platform is, of course, > well-behaved. > > I run cURL, using an amended version of what was described before (as at > the end of this message). > > So now I need to deal with what comes back. > I actually hand it over to rapper, so would sort of like to know what the > data is to improve the reliability by setting the rapper type parameter. > I am trying to avoid looking inside the file, although am happy to if > someone can provide the code :-). > The Content-Type is unreliable – for example could (is likely to) be > text/plain for a turtle file that someone has put on a standard web server. > So it is the usual problem of messing about with extensions, modified by > extra information from the Content-Type. > Of course we need to worry about the final URL (curl_getinfo($ch)['url']), > possibly as well as the requesting URI, as that might be where there is an > extension. > So perhaps something that sets the Content-Type in curl_getinfo($ch) as > best it can? > > Any offers? (Pretty please!) > And maybe we can feed back to Moriarty, PEAR, etc, unless already there and > I missed it. > > On another worry, If the requesting URI does a 302 to a new URI, which then > does 303, it looks an interesting challenge to capture the new URI as > expected. I don’t intend to do this at the moment, but if anyone has done > that, ... > > Enjoy. > Hugh > > PHP much preferred. > > Fetching code: > $ch = curl_init(); > curl_setopt($ch, CURLOPT_URL, $_REQUEST['uri']); > curl_setopt($ch, CURLOPT_USERAGENT, "http://void.rkbexplorer.com/submission agent 1.0"); > curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); > curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); > curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/rdf+xml, > text/n3, text/rdf+n3, text/turtle, application/x-turtle, application/turtle, > text/plain")); > $data = curl_exec($ch); > $info = curl_getinfo($ch); > curl_close($ch); > > > > > My 2p worth :) > > > > Tom. > > > > > > 2010/1/26 Ian Davis <lists@iandavis.com>: > >> You may find something useful in my Moriarty project: > >> > >> http://code.google.com/p/moriarty/ > >> > >> It's geared towards the Talis Platform but there is a lot of code in > >> there that has no dependencies on the platform, e.g.: > >> > >> > http://code.google.com/p/moriarty/source/browse/trunk/httprequest.class.php > >> > >> some documentation for that class here: > >> > >> http://code.google.com/p/moriarty/wiki/HttpRequest > >> > >> Ian > >> > >> > >> ______________________________________________________________________ > >> This email has been scanned by the MessageLabs Email Security System. > >> For more information please visit http://www.messagelabs.com/email > >> ______________________________________________________________________ > >> > > > > > >
Received on Thursday, 28 January 2010 02:27:16 UTC