- From: Justin Clark-Casey <jc955@cam.ac.uk>
- Date: Wed, 14 Feb 2018 18:22:45 +0000
- To: public-bioschemas@w3.org
This is mainly a packaging up of work from some time ago so slightly hazy in my memory. Highlights: bsbang-crawler [1] ============== * Implemented crawling of optional schema properties * Implemented remapping of properties, so that for example, PhysicalEntity.biologicalType is remapped to PhysicalEntity.additionalType (I know that's not very applicable but biosamples were doing this at one stage :). So not so useful now but the kind of thing needed in the future. * Crawling and indexing are now 3 separate stages (crawl, extract, index) to make staged data processing easier. More details at [2]. Still extremely alpha and early, very shallow processing of schemas, etc. However, now having done this work, the next big item is to look at replacing much of it with a proper crawler like Apache Nutch or similar. Arguably this is what I should have done in the first place, but I took the short term fun of hacking something together in python and now I might be paying for it ^_^ bsbang-frontend [3] ============== Even more primitive frontend for the Solr index generated by bsbang-crawler. Very few changes, mainly * Displaying Thing.alternativeName (now an optional crawled property) * Making some properties links if they are urls. More details at [4]. Hope to entice a GSoC student to actually make it not butt ugly. A frontend example with crawl of a few small sites still up at [5]. [1] https://github.com/justinccdev/bsbang-crawler [2] https://github.com/justinccdev/bsbang-crawler/releases/tag/0.0.3 [3] https://github.com/justinccdev/bsbang-frontend [4] https://github.com/justinccdev/bsbang-frontend/releases/tag/0.0.3 [5] bsbang.science -- Justin Clark-Casey Research Software Engineer, InterMine life sciences data integration, U of Cambridge http://twitter.com/justincc http://justincc.org
Received on Wednesday, 14 February 2018 18:23:12 UTC