Re: Buzzbang crawler and search release 0.0.2 now available

This sounds great! It would be interesting to try to write down the
specific data patterns you're extracting, by using W3C SHACL or SHEX shape
markup. I will be attempting the same for Google...

Dan

On 19 Oct 2017 12:37, "Justin Clark-Casey" <justinccdev@gmail.com> wrote:

> Hi all,
>
> Following on from the Bioschemas adoption meeting, I'm continuing to work
> on the extremely alpha Buzzbang Bioschemas crawler and frontend when I can
> (renamed from BsBang, after Alistair pointed out the connotations of 'bs'
> :)).
>
> You can play with the current search engine by going to
> http://buzzbang.science
>
> In this release, I decided to concentrate on indexing DataCatalog (this is
> extremely primitive as of yet, only recording the name, url, description
> and keywords properties).  If you go to buzzbang.science and search for
> terms such as 'data' or 'registry' you'll get some results.
>
> Currently, I'm manually adding URLs - you can see the small list at [1]. I
> added those that have DataCatalog JSON+LD embedded that I had in my notes,
> such as identifiers.org and fairsharing.org. Down the road, users will be
> able to submit URLs for crawling directly on the website, but for now,
> please contact me, raise a Github issue [2] or submit a pull request if
> there's an URL I can add.
>
> Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets and
> think about how that information can help improve simple search.
>
> All feature suggestions or pull requests welcome on the Github crawler [2]
> and search frontend [3] projects.
>
> [1] https://github.com/justinccdev/bsbang-crawler/
> blob/master/conf/default-targets.txt
> [2] https://github.com/justinccdev/bsbang-crawler
> [3] https://github.com/justinccdev/bsbang-frontend
>
> Cheers,
>
> --
> Justin Clark-Casey (@justincc)
> Research Software Architect
> Micklem Lab, University of Cambridge
>

Received on Thursday, 19 October 2017 14:20:38 UTC