- From: Mischa Tuffield <mmt04r@ecs.soton.ac.uk>
- Date: Thu, 28 Jan 2010 12:37:11 +0000
- To: Hugh Glaser <hg@ecs.soton.ac.uk>
- Cc: "public-lod@w3.org community" <public-lod@w3.org>
Hi Hugh, There is a "trace" option in rapper. You can do something like : jambi:~ mmt$ rapper --trace --guess http://mmt.me.uk/foaf.rdf > lame.nt rapper: Parsing URI http://mmt.me.uk/foaf.rdf with parser guess rapper: Serializing with serializer ntriples rapper: Processing URI http://mmt.me.uk/foaf.rdf rapper: Guessed parser name 'rdfxml' rapper: Parsing returned 298 triples This shows you what the --guess option guessed. Mischa On 28 Jan 2010, at 12:26, Hugh Glaser wrote: > Thanks for the pointer. > (Won’t actually look at the ARC code at the moment, as it may be hard to comply with Benji’s license.) > > However, rather than being as clever as possible, somehow I thought I should respect what the publisher said, so perhaps first Content-Type, then extension, rather than ignoring them. > > The reason I wasn’t relying on rapper --guess is that the handover to rapper is part of the RDF store, and I will probably use other stores that don’t use rapper. > Also, I wanted to gather statistics on what RDF format people were using, and couldn’t see an option to rapper to tell me the input type that it guessed. > > At the moment I record the Content-Type and the extension, and then let rapper or whatever do their magic – I guess that is enough. > > Cheers > Hugh > > On 28/01/2010 02:25, "Stephane Corlosquet" <scorlosquet@gmail.com> wrote: > > Hugh, > > The ARC2 parser has a "built-in RDF format detector" [1]. You might want to look at the code to see how it's done. > > Why not using the --guess option of rapper? > > Steph. > > [1] http://arc.semsol.org/docs/v2/parsing > > On Wed, Jan 27, 2010 at 9:08 PM, Hugh Glaser <hg@ecs.soton.ac.uk> wrote: > On 27/01/2010 09:49, "Tom Heath" <tom.heath@talis.com> wrote: > >> +1 for Moriarty, whether you're working with the Platform or not. Ian >> and the other contributors have done a great job - personally I'd >> start here before writing any new code. > Too true mate. > > Now my next bit of pissing about. > Before writing it (if I can find the gumption). > Don't think this is in Moriarty, as the Talis Platform is, of course, well-behaved. > > I run cURL, using an amended version of what was described before (as at the end of this message). > > So now I need to deal with what comes back. > I actually hand it over to rapper, so would sort of like to know what the data is to improve the reliability by setting the rapper type parameter. > I am trying to avoid looking inside the file, although am happy to if someone can provide the code :-). > The Content-Type is unreliable – for example could (is likely to) be text/plain for a turtle file that someone has put on a standard web server. > So it is the usual problem of messing about with extensions, modified by extra information from the Content-Type. > Of course we need to worry about the final URL (curl_getinfo($ch)['url']), possibly as well as the requesting URI, as that might be where there is an extension. > So perhaps something that sets the Content-Type in curl_getinfo($ch) as best it can? > > Any offers? (Pretty please!) > And maybe we can feed back to Moriarty, PEAR, etc, unless already there and I missed it. > > On another worry, If the requesting URI does a 302 to a new URI, which then does 303, it looks an interesting challenge to capture the new URI as expected. I don’t intend to do this at the moment, but if anyone has done that, ... > > Enjoy. > Hugh > > PHP much preferred. > > Fetching code: > $ch = curl_init(); > curl_setopt($ch, CURLOPT_URL, $_REQUEST['uri']); > curl_setopt($ch, CURLOPT_USERAGENT, "http://void.rkbexplorer.com/ submission agent 1.0"); > curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); > curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); > curl_setopt($ch, CURLOPT_HTTPHEADER, array("Accept: application/rdf+xml, text/n3, text/rdf+n3, text/turtle, application/x-turtle, application/turtle, text/plain")); > $data = curl_exec($ch); > $info = curl_getinfo($ch); > curl_close($ch); > >> >> My 2p worth :) >> >> Tom. >> >> >> 2010/1/26 Ian Davis <lists@iandavis.com>: >>> You may find something useful in my Moriarty project: >>> >>> http://code.google.com/p/moriarty/ >>> >>> It's geared towards the Talis Platform but there is a lot of code in >>> there that has no dependencies on the platform, e.g.: >>> >>> http://code.google.com/p/moriarty/source/browse/trunk/httprequest.class.php >>> >>> some documentation for that class here: >>> >>> http://code.google.com/p/moriarty/wiki/HttpRequest >>> >>> Ian >>> >>> >>> ______________________________________________________________________ >>> This email has been scanned by the MessageLabs Email Security System. >>> For more information please visit http://www.messagelabs.com/email >>> ______________________________________________________________________ >>> >> >> > > > > _________________________________ Mischa Tuffield ECS - http://www.ecs.soton.ac.uk/ Homepage - http://users.ecs.soton.ac.uk/mmt04r/ Identity - http://id.ecs.soton.ac.uk/person/6914 WebID - http://mmt.me.uk/foaf.rdf#mischa
Received on Thursday, 28 January 2010 12:38:46 UTC