W3C home > Mailing lists > Public > public-schemaorg@w3.org > September 2016

Re: sdo Software

From: Dan Scott <denials@gmail.com>
Date: Fri, 16 Sep 2016 13:09:51 -0400
Message-ID: <CAAY5AM3zsWJ4KMxoQo1tO62Kke_t4m0B10m=bqhgsO-buNTpyg@mail.gmail.com>
To: Nicolas Torzec <torzecn@yahoo-inc.com>
Cc: Tim Strehle <tim@strehle.de>, Elias Kaerle <elias.kaerle@sti2.at>, "public-schemaorg@w3.org" <public-schemaorg@w3.org>
I had trouble using most of the available out-of-the box scripts that I
found late last year--Any23 was failing to parse relatively common
in-the-wild HTML, for example--so put together the "crawl" script contained
in https://github.com/dbs/dialled-ca for my own currently very specific
purposes of checking library homepages for linked open data. It relies on
rdflib and rdflib-json to extract RDFa, Microdata, and JSON-LD from a list
of target pages and stores them in a single Turtle file as linked data. It
could be relatively easily modified to be made much more generic, upon
reflection. I suspect many of the "stupid hacks" that I've built into the
script would be needed for anything trying to deal with the real wild web.

Note that I also found rdflib wanting in some situations (and contributed a
few branches upstream to try to fix it up, most of which have been
accepted--yay!), so I currently run this script on a local branch of
rdflib, but for both Any23 and rdflib a subsequent stable release might
have resolved many of those problems.

On Fri, Sep 16, 2016 at 11:46 AM, Nicolas Torzec <torzecn@yahoo-inc.com>
wrote:

> Also have a look at Apache Any23 if the task is simply to fetch select
> URLs and extract semantic markup: https://any23.apache.org/index.html
>
> <https://any23.apache.org/index.html>
> -N.
>
>
>
> On Friday, September 16, 2016 8:20 AM, Tim Strehle <tim@strehle.de> wrote:
>
>
> Hi Elias,
>
> you’re probably not into PHP, but just for completeness – the EasyRDF PHP
> library http://www.easyrdf.org/ is also pretty capable.
>
> You can try its converter demo here, which reads structured data from a
> Web site and turns it into JSON-LD, RDF or even a graph rendered in PNG or
> SVG:
>
> http://www.easyrdf.org/converter
>
> Kind regards,
> Tim
>
> https://www.strehle.de/tim/ – And yes, there’s structured data hidden on
> my home page :)
>
> > Am 16.09.2016 um 15:58 schrieb Elias Kaerle <elias.kaerle@sti2.at>:
> >
> > Hi,
> >
> > I was looking for software to parse structured data in a website -
> > preferably JSON-LD.
> > I found Python libraries for RDF to JSON-LD and vice versa
> > (https://github.com/RDFLib/rdflib-jsonld), or NodeJS and Ruby libraries
> > for JSON-LD operations but that's not exactly what I am looking for.
> >
> > The software I was thinking about takes the URL of any website as an
> > input and returns, if found, JSON as an output.
> >
> > Are you aware of something like that?
> >
> > Thanks, best,
> > Elias
> >
> > --
> > Elias Kärle, MSc
> > Semantic Technology Institute
> > University of Innsbruck
> >
> > ICT - Technologie Park Innsbruck
> > 2nd Floor, Room 3S02
> > Technikerstrasse, 21a
> > 6020 Innsbruck
> > Austria
> >
> > Tel.: (+43) 512 507 53738
> > Skype: elias.kaerle
> >
> >
>
>
>
>
Received on Friday, 16 September 2016 17:10:28 UTC

This archive was generated by hypermail 2.3.1 : Friday, 16 September 2016 17:10:28 UTC