- From: Dan Scott <denials@gmail.com>
- Date: Fri, 16 Sep 2016 13:09:51 -0400
- To: Nicolas Torzec <torzecn@yahoo-inc.com>
- Cc: Tim Strehle <tim@strehle.de>, Elias Kaerle <elias.kaerle@sti2.at>, "public-schemaorg@w3.org" <public-schemaorg@w3.org>
- Message-ID: <CAAY5AM3zsWJ4KMxoQo1tO62Kke_t4m0B10m=bqhgsO-buNTpyg@mail.gmail.com>
I had trouble using most of the available out-of-the box scripts that I found late last year--Any23 was failing to parse relatively common in-the-wild HTML, for example--so put together the "crawl" script contained in https://github.com/dbs/dialled-ca for my own currently very specific purposes of checking library homepages for linked open data. It relies on rdflib and rdflib-json to extract RDFa, Microdata, and JSON-LD from a list of target pages and stores them in a single Turtle file as linked data. It could be relatively easily modified to be made much more generic, upon reflection. I suspect many of the "stupid hacks" that I've built into the script would be needed for anything trying to deal with the real wild web. Note that I also found rdflib wanting in some situations (and contributed a few branches upstream to try to fix it up, most of which have been accepted--yay!), so I currently run this script on a local branch of rdflib, but for both Any23 and rdflib a subsequent stable release might have resolved many of those problems. On Fri, Sep 16, 2016 at 11:46 AM, Nicolas Torzec <torzecn@yahoo-inc.com> wrote: > Also have a look at Apache Any23 if the task is simply to fetch select > URLs and extract semantic markup: https://any23.apache.org/index.html > > <https://any23.apache.org/index.html> > -N. > > > > On Friday, September 16, 2016 8:20 AM, Tim Strehle <tim@strehle.de> wrote: > > > Hi Elias, > > you’re probably not into PHP, but just for completeness – the EasyRDF PHP > library http://www.easyrdf.org/ is also pretty capable. > > You can try its converter demo here, which reads structured data from a > Web site and turns it into JSON-LD, RDF or even a graph rendered in PNG or > SVG: > > http://www.easyrdf.org/converter > > Kind regards, > Tim > > https://www.strehle.de/tim/ – And yes, there’s structured data hidden on > my home page :) > > > Am 16.09.2016 um 15:58 schrieb Elias Kaerle <elias.kaerle@sti2.at>: > > > > Hi, > > > > I was looking for software to parse structured data in a website - > > preferably JSON-LD. > > I found Python libraries for RDF to JSON-LD and vice versa > > (https://github.com/RDFLib/rdflib-jsonld), or NodeJS and Ruby libraries > > for JSON-LD operations but that's not exactly what I am looking for. > > > > The software I was thinking about takes the URL of any website as an > > input and returns, if found, JSON as an output. > > > > Are you aware of something like that? > > > > Thanks, best, > > Elias > > > > -- > > Elias Kärle, MSc > > Semantic Technology Institute > > University of Innsbruck > > > > ICT - Technologie Park Innsbruck > > 2nd Floor, Room 3S02 > > Technikerstrasse, 21a > > 6020 Innsbruck > > Austria > > > > Tel.: (+43) 512 507 53738 > > Skype: elias.kaerle > > > > > > > >
Received on Friday, 16 September 2016 17:10:28 UTC