Re: Buzzbang crawler and search release 0.0.2 now available

Hi Sarala,

Yes, I'm modularizing the code as I go along, with a view to making it
reusable by other projects. I'll keep the group posted as I make it a more
complete crawler of DataCatalog and then DataSet.

Regards,

Justin

On Fri, Oct 20, 2017 at 10:33 AM, Sarala Wimalaratne <sarala@ebi.ac.uk>
wrote:

> Hi Justin,
>
> This is great. I would like to see whether I can integrate your crawler
> within identifiers.org for CataCatalog and DateSets. Keep me posted...
>
> Regards,
>
> Sarala M Wimalaratne, B.Eng. PhD
> Project Lead
> European Bioinformatics Institute (EMBL-EBI)
> European Molecular Biology Laboratory
> Wellcome Trust Genome Campus
> Hinxton
> Cambridge CB10 1SD UK
>
> On 19 Oct 2017, at 12:31, Justin Clark-Casey <justinccdev@gmail.com>
> wrote:
>
> Hi all,
>
> Following on from the Bioschemas adoption meeting, I'm continuing to work
> on the extremely alpha Buzzbang Bioschemas crawler and frontend when I can
> (renamed from BsBang, after Alistair pointed out the connotations of 'bs'
> :)).
>
> You can play with the current search engine by going to
> http://buzzbang.science
>
> In this release, I decided to concentrate on indexing DataCatalog (this is
> extremely primitive as of yet, only recording the name, url, description
> and keywords properties).  If you go to buzzbang.science and search for
> terms such as 'data' or 'registry' you'll get some results.
>
> Currently, I'm manually adding URLs - you can see the small list at [1]. I
> added those that have DataCatalog JSON+LD embedded that I had in my notes,
> such as identifiers.org and fairsharing.org. Down the road, users will be
> able to submit URLs for crawling directly on the website, but for now,
> please contact me, raise a Github issue [2] or submit a pull request if
> there's an URL I can add.
>
> Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets and
> think about how that information can help improve simple search.
>
> All feature suggestions or pull requests welcome on the Github crawler [2]
> and search frontend [3] projects.
>
> [1] https://github.com/justinccdev/bsbang-crawler/
> blob/master/conf/default-targets.txt
> [2] https://github.com/justinccdev/bsbang-crawler
> [3] https://github.com/justinccdev/bsbang-frontend
>
> Cheers,
>
> --
> Justin Clark-Casey (@justincc)
> Research Software Architect
> Micklem Lab, University of Cambridge
>
>
>

Received on Friday, 20 October 2017 18:55:57 UTC