W3C home > Mailing lists > Public > public-lod@w3.org > January 2010

Re: PHP RDF fetching code

From: Stephane Corlosquet <scorlosquet@gmail.com>
Date: Wed, 27 Jan 2010 21:25:59 -0500
Message-ID: <1452bf811001271825i317f13a4h720b3fcf393271ca@mail.gmail.com>
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Cc: "public-lod@w3.org" <public-lod@w3.org>
Hugh,

The ARC2 parser has a "built-in RDF format detector" [1]. You might want to
look at the code to see how it's done.

Why not using the --guess option of rapper?

Steph.

[1] http://arc.semsol.org/docs/v2/parsing

On Wed, Jan 27, 2010 at 9:08 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:

> On 27/01/2010 09:49, "Tom Heath" <tom.heath@talis.com> wrote:
>
> > +1 for Moriarty, whether you're working with the Platform or not. Ian
> > and the other contributors have done a great job - personally I'd
> > start here before writing any new code.
> Too true mate.
>
> Now my next bit of pissing about.
> Before writing it (if I can find the gumption).
> Don't think this is in Moriarty, as the Talis Platform is, of course,
> well-behaved.
>
> I run cURL, using an amended version of what was described before (as at
> the end of this message).
>
> So now I need to deal with what comes back.
> I actually hand it over to rapper, so would sort of like to know what the
> data is to improve the reliability by setting the rapper type parameter.
> I am trying to avoid looking inside the file, although am happy to if
> someone can provide the code :-).
> The Content-Type is unreliable Ė for example could (is likely to) be
> text/plain for a turtle file that someone has put on a standard web server.
> So it is the usual problem of messing about with extensions, modified by
> extra information from the Content-Type.
> Of course we need to worry about the final URL (curl_getinfo($ch)['url']),
> possibly as well as the requesting URI, as that might be where there is an
> extension.
> So perhaps something that sets the Content-Type in curl_getinfo($ch) as
> best it can?
>
> Any offers? (Pretty please!)
> And maybe we can feed back to Moriarty, PEAR, etc, unless already there and
> I missed it.
>
> On another worry, If the requesting URI does a 302 to a new URI, which then
> does 303, it looks an interesting challenge to capture the new URI as
> expected. I donít intend to do this at the moment, but if anyone has done
> that, ...
>
> Enjoy.
> Hugh
>
> PHP much preferred.
>
> Fetching code:
> $ch = curl_init();
> curl_setopt($ch, CURLOPT_URL, $_REQUEST['uri']);
> curl_setopt($ch, CURLOPT_USERAGENT, "http://void.rkbexplorer.com/submission agent 1.0");
> curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
> curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/rdf+xml,
> text/n3, text/rdf+n3, text/turtle, application/x-turtle, application/turtle,
> text/plain"));
> $data = curl_exec($ch);
> $info = curl_getinfo($ch);
> curl_close($ch);
>
> >
> > My 2p worth :)
> >
> > Tom.
> >
> >
> > 2010/1/26 Ian Davis <lists@iandavis.com>:
> >> You may find something useful in my Moriarty project:
> >>
> >> http://code.google.com/p/moriarty/
> >>
> >> It's geared towards the Talis Platform but there is a lot of code in
> >> there that has no dependencies on the platform, e.g.:
> >>
> >>
> http://code.google.com/p/moriarty/source/browse/trunk/httprequest.class.php
> >>
> >> some documentation for that class here:
> >>
> >> http://code.google.com/p/moriarty/wiki/HttpRequest
> >>
> >> Ian
> >>
> >>
> >> ______________________________________________________________________
> >> This email has been scanned by the MessageLabs Email Security System.
> >> For more information please visit http://www.messagelabs.com/email
> >> ______________________________________________________________________
> >>
> >
> >
>
>
Received on Thursday, 28 January 2010 02:27:16 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:24 UTC