- From: Justin Clark-Casey <justinccdev@gmail.com>
- Date: Mon, 26 Feb 2018 16:38:13 +0000
- To: public-bioschemas@w3.org
Received on Monday, 26 February 2018 16:38:43 UTC
This is a minor release, with small improvements to the indexer (still in lieu of using a proper JSON-LD parsing library), the crawler (avoid failure on bad sitemaps, don't choke on blank lines in URL listing files, and configuration (allow alternative locations for the crawl database and Solr). More details at [1]. Many thanks to @innovationchef <https://github.com/innovationchef>, @aswanipranjal <https://github.com/aswanipranjal> and @HaoPatrick <https://github.com/haopatrick> for contributions. For overhauling the crawler, I am now leaning considerably towards Scrapy/Frontera, for the reasons listed at [2] [1] https://github.com/justinccdev/bsbang-crawler/releases/tag/0.0.4 [2] https://github.com/justinccdev/bsbang-crawler/wiki/Transition-to-an-established-crawler-package -- Justin Clark-Casey Research Software Engineer, InterMine life sciences data integration, U of Cambridge http://justincc.org
Received on Monday, 26 February 2018 16:38:43 UTC