W3C home > Mailing lists > Public > public-lod@w3.org > January 2010

Re: PHP RDF fetching code

From: Mischa Tuffield <mmt04r@ecs.soton.ac.uk>
Date: Thu, 28 Jan 2010 12:37:11 +0000
Cc: "public-lod@w3.org community" <public-lod@w3.org>
Message-ID: <EMEW3|a681a001432da4d5ed16a8e7e0e22dd6m0RCbb06mmt04r|ecs.soton.ac.uk|C82343F9-8C85-40F7-8A4D-FCBDE8A0A0B1@ecs.soton.ac.uk>
To: Hugh Glaser <hg@ecs.soton.ac.uk>
Hi Hugh, 

There is a "trace" option in rapper. 

You can do something like : 


jambi:~ mmt$ rapper --trace --guess http://mmt.me.uk/foaf.rdf > lame.nt
rapper: Parsing URI http://mmt.me.uk/foaf.rdf with parser guess
rapper: Serializing with serializer ntriples
rapper: Processing URI http://mmt.me.uk/foaf.rdf
rapper: Guessed parser name 'rdfxml'
rapper: Parsing returned 298 triples

This shows you what the --guess option guessed.

Mischa


On 28 Jan 2010, at 12:26, Hugh Glaser wrote:

> Thanks for the pointer.
> (Won’t actually look at the ARC code at the moment, as it may be hard to comply with Benji’s license.)
> 
> However, rather than being as clever as possible, somehow I thought I should respect what the publisher said, so perhaps first Content-Type, then extension, rather than ignoring them.
> 
> The reason I wasn’t relying on rapper --guess is that the handover to rapper is part of the RDF store, and I will probably use other stores that don’t use rapper.
> Also, I wanted to gather statistics on what RDF format people were using, and couldn’t see an option to rapper to tell me the input type that it guessed.
> 
> At the moment I record the Content-Type and the extension, and then let rapper or whatever do their magic – I guess that is enough.
> 
> Cheers
> Hugh
> 
> On 28/01/2010 02:25, "Stephane Corlosquet" <scorlosquet@gmail.com> wrote:
> 
> Hugh,
> 
> The ARC2 parser has a "built-in RDF format detector" [1]. You might want to look at the code to see how it's done.
> 
> Why not using the --guess option of rapper?
> 
> Steph.
> 
> [1] http://arc.semsol.org/docs/v2/parsing
> 
> On Wed, Jan 27, 2010 at 9:08 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote:
> On 27/01/2010 09:49, "Tom Heath" <tom.heath@talis.com> wrote:
> 
>> +1 for Moriarty, whether you're working with the Platform or not. Ian
>> and the other contributors have done a great job - personally I'd
>> start here before writing any new code.
> Too true mate.
> 
> Now my next bit of pissing about.
> Before writing it (if I can find the gumption).
> Don't think this is in Moriarty, as the Talis Platform is, of course, well-behaved.
> 
> I run cURL, using an amended version of what was described before (as at the end of this message).
> 
> So now I need to deal with what comes back.
> I actually hand it over to rapper, so would sort of like to know what the data is to improve the reliability by setting the rapper type parameter.
> I am trying to avoid looking inside the file, although am happy to if someone can provide the code :-).
> The Content-Type is unreliable – for example could (is likely to) be text/plain for a turtle file that someone has put on a standard web server.
> So it is the usual problem of messing about with extensions, modified by extra information from the Content-Type.
> Of course we need to worry about the final URL (curl_getinfo($ch)['url']), possibly as well as the requesting URI, as that might be where there is an extension.
> So perhaps something that sets the Content-Type in curl_getinfo($ch) as best it can?
> 
> Any offers? (Pretty please!)
> And maybe we can feed back to Moriarty, PEAR, etc, unless already there and I missed it.
> 
> On another worry, If the requesting URI does a 302 to a new URI, which then does 303, it looks an interesting challenge to capture the new URI as expected. I don’t intend to do this at the moment, but if anyone has done that, ...
> 
> Enjoy.
> Hugh
> 
> PHP much preferred.
> 
> Fetching code:
> $ch = curl_init();
> curl_setopt($ch, CURLOPT_URL, $_REQUEST['uri']);
> curl_setopt($ch, CURLOPT_USERAGENT, "http://void.rkbexplorer.com/ submission agent 1.0");
> curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
> curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
> curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/rdf+xml, text/n3, text/rdf+n3, text/turtle, application/x-turtle, application/turtle, text/plain"));
> $data = curl_exec($ch);
> $info = curl_getinfo($ch);
> curl_close($ch);
> 
>> 
>> My 2p worth :)
>> 
>> Tom.
>> 
>> 
>> 2010/1/26 Ian Davis <lists@iandavis.com>:
>>> You may find something useful in my Moriarty project:
>>> 
>>> http://code.google.com/p/moriarty/
>>> 
>>> It's geared towards the Talis Platform but there is a lot of code in
>>> there that has no dependencies on the platform, e.g.:
>>> 
>>> http://code.google.com/p/moriarty/source/browse/trunk/httprequest.class.php
>>> 
>>> some documentation for that class here:
>>> 
>>> http://code.google.com/p/moriarty/wiki/HttpRequest
>>> 
>>> Ian
>>> 
>>> 
>>> ______________________________________________________________________
>>> This email has been scanned by the MessageLabs Email Security System.
>>> For more information please visit http://www.messagelabs.com/email
>>> ______________________________________________________________________
>>> 
>> 
>> 
> 
> 
> 
> 

_________________________________
Mischa Tuffield
ECS - http://www.ecs.soton.ac.uk/
Homepage - http://users.ecs.soton.ac.uk/mmt04r/
Identity - http://id.ecs.soton.ac.uk/person/6914
WebID - http://mmt.me.uk/foaf.rdf#mischa
Received on Thursday, 28 January 2010 12:38:46 UTC

This archive was generated by hypermail 2.3.1 : Sunday, 31 March 2013 14:24:24 UTC