- From: Dan Brickley <danbri@danbri.org>
- Date: Thu, 19 Oct 2017 15:20:12 +0100
- To: Justin Clark-Casey <justinccdev@gmail.com>
- Cc: public-bioschemas@w3.org
- Message-ID: <CAFfrAFr1T=LCQHNQyYyz28PES522rj1hj-95Vh_0ttnXjjp7VA@mail.gmail.com>
This sounds great! It would be interesting to try to write down the specific data patterns you're extracting, by using W3C SHACL or SHEX shape markup. I will be attempting the same for Google... Dan On 19 Oct 2017 12:37, "Justin Clark-Casey" <justinccdev@gmail.com> wrote: > Hi all, > > Following on from the Bioschemas adoption meeting, I'm continuing to work > on the extremely alpha Buzzbang Bioschemas crawler and frontend when I can > (renamed from BsBang, after Alistair pointed out the connotations of 'bs' > :)). > > You can play with the current search engine by going to > http://buzzbang.science > > In this release, I decided to concentrate on indexing DataCatalog (this is > extremely primitive as of yet, only recording the name, url, description > and keywords properties). If you go to buzzbang.science and search for > terms such as 'data' or 'registry' you'll get some results. > > Currently, I'm manually adding URLs - you can see the small list at [1]. I > added those that have DataCatalog JSON+LD embedded that I had in my notes, > such as identifiers.org and fairsharing.org. Down the road, users will be > able to submit URLs for crawling directly on the website, but for now, > please contact me, raise a Github issue [2] or submit a pull request if > there's an URL I can add. > > Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets and > think about how that information can help improve simple search. > > All feature suggestions or pull requests welcome on the Github crawler [2] > and search frontend [3] projects. > > [1] https://github.com/justinccdev/bsbang-crawler/ > blob/master/conf/default-targets.txt > [2] https://github.com/justinccdev/bsbang-crawler > [3] https://github.com/justinccdev/bsbang-frontend > > Cheers, > > -- > Justin Clark-Casey (@justincc) > Research Software Architect > Micklem Lab, University of Cambridge >
Received on Thursday, 19 October 2017 14:20:38 UTC