W3C home > Mailing lists > Public > public-schemaorg@w3.org > September 2016

Re: sdo Software

From: Elias Kaerle <elias.kaerle@sti2.at>
Date: Fri, 16 Sep 2016 16:54:46 +0200
To: public-schemaorg@w3.org
Message-ID: <eb151121-97f4-b546-8fe7-6c814f607215@sti2.at>
Hi Hans, Phil and Thad,

thank you all for your answers.

Hans: actually I was looking for a library or snippet which does exactly
that

Phil: I was already looking into rdflib and rdflib-jsonld, but they are
"only" translating JSON-LD into RDF and other things, not extracting
structured data from a website

Thad: right, actually something like a web crawler! A plugin to those
would be great, I agree. I also hoped, that maybe BeautifulSoup is the
right way to go, but it's still a hack to get the pure JSON-LD out of
html (especially when it is wrapped in CData).

I keep searching and keep you updated, thanks again for your help!
Best, Elias

On 16.09.2016 16:26, Thad Guidry wrote:
> Your talking about a web crawler or spider.
> 
> There's a few listed here:
> https://www.schemaapp.com/60-structured-data-tools-create-test-plugins-more/
> 
> But none that are open source I see.
> Ideally I'd like to see a Apache Nutch an Scrapy plugins for this.  Even
> BeautifulSoup.
> 
> Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
> 
> 
> ______________________________________________________________________
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
> ______________________________________________________________________
> 

-- 
Elias Kärle, MSc
Semantic Technology Institute
University of Innsbruck

ICT - Technologie Park Innsbruck
2nd Floor, Room 3S02
Technikerstrasse, 21a
6020 Innsbruck
Austria

Tel.: (+43) 512 507 53738
Skype: elias.kaerle
Received on Friday, 16 September 2016 14:55:38 UTC

This archive was generated by hypermail 2.3.1 : Friday, 16 September 2016 14:55:39 UTC