W3C home > Mailing lists > Public > public-bioschemas@w3.org > October 2017

Re: Buzzbang crawler and search release 0.0.2 now available

From: Dan Brickley <danbri@danbri.org>
Date: Sat, 21 Oct 2017 21:36:58 +0100
Message-ID: <CAFfrAFrUxz_+iB4vrYwwFMUunnem+vpwQM4_qCpq+s+hKb59Dg@mail.gmail.com>
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: Andra Waagmeester <andra@micelio.be>, Justin Clark-Casey <justinccdev@gmail.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
I believe there are opensource JavaScript implementations of each, in case
anyone gets the urge to wrap both behind a common API / bundle in a unified


On 21 Oct 2017 21:42, "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk> wrote:

> I used ShEx in the past but that was before SHACL. I do like the intuitive
> concise notation.
> However my postdoc has been looking at both and finding the SHACL examples
> more accessible. Also the SHACL framework seemed more up to date.
> For most purposes there probably isn't much in it. It would be good to
> know what are the things that can be done in one and not the other.
> Alasdair
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science
> Herriot-Watt University, Edinburgh
> www.macs.hw.ac.uk/~ajg33 <http://%3Cbr/%3Ewww.macs.hw.ac.uk/~ajg33>
> ------------------------------
> *From:* Andra Waagmeester <andra@micelio.be>
> *Sent:* Saturday, October 21, 2017 7:18:14 AM
> *To:* Dan Brickley
> *Cc:* Justin Clark-Casey; public-bioschemas@w3.org
> *Subject:* Re: Buzzbang crawler and search release 0.0.2 now available
> I also have a preference for ShEx. The syntax feels more intuitive.
> However, just recently a book describing and comparing both ShEx and
> Shacl was released: http://www.morganclaypoolpublishers.com/
> catalog_Orig/product_info.php?products_id=1091
> Personally, I I like the regex style of expressing cardinalities and the
> possibility to combine different shapes for similar concepts in Shex.
> On Fri, Oct 20, 2017 at 11:09 PM, Dan Brickley <danbri@danbri.org> wrote:
>> I have a slight preference for Shex personally but the most official in
>> W3C terms is Shacl.  Anyone else have a view?
>> On 20 Oct 2017 20:08, "Justin Clark-Casey" <justinccdev@gmail.com> wrote:
>>> Thanks Dan.  Yes, I need to look into SHACL/SHEX - I only have a passing
>>> acquaintance with them at the moment.  Would you recommend either one over
>>> the other?
>>> Regards,
>>> -- Justin
>>> On Thu, Oct 19, 2017 at 3:20 PM, Dan Brickley <danbri@danbri.org> wrote:
>>>> This sounds great! It would be interesting to try to write down the
>>>> specific data patterns you're extracting, by using W3C SHACL or SHEX shape
>>>> markup. I will be attempting the same for Google...
>>>> Dan
>>>> On 19 Oct 2017 12:37, "Justin Clark-Casey" <justinccdev@gmail.com>
>>>> wrote:
>>>>> Hi all,
>>>>> Following on from the Bioschemas adoption meeting, I'm continuing to
>>>>> work on the extremely alpha Buzzbang Bioschemas crawler and frontend when I
>>>>> can (renamed from BsBang, after Alistair pointed out the connotations of
>>>>> 'bs' :)).
>>>>> You can play with the current search engine by going to
>>>>> http://buzzbang.science
>>>>> In this release, I decided to concentrate on indexing DataCatalog
>>>>> (this is extremely primitive as of yet, only recording the name, url,
>>>>> description and keywords properties).  If you go to buzzbang.science and
>>>>> search for terms such as 'data' or 'registry' you'll get some results.
>>>>> Currently, I'm manually adding URLs - you can see the small list at
>>>>> [1]. I added those that have DataCatalog JSON+LD embedded that I had in my
>>>>> notes, such as identifiers.org and fairsharing.org. Down the road,
>>>>> users will be able to submit URLs for crawling directly on the website, but
>>>>> for now, please contact me, raise a Github issue [2] or submit a pull
>>>>> request if there's an URL I can add.
>>>>> Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets
>>>>> and think about how that information can help improve simple search.
>>>>> All feature suggestions or pull requests welcome on the Github crawler
>>>>> [2] and search frontend [3] projects.
>>>>> [1] https://github.com/justinccdev/bsbang-crawler/blob/master/co
>>>>> nf/default-targets.txt
>>>>> [2] https://github.com/justinccdev/bsbang-crawler
>>>>> [3] https://github.com/justinccdev/bsbang-frontend
>>>>> Cheers,
>>>>> --
>>>>> Justin Clark-Casey (@justincc)
>>>>> Research Software Architect
>>>>> Micklem Lab, University of Cambridge
> ------------------------------
> *Heriot-Watt University is The Times & The Sunday Times International
> University of the Year 2018*
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
> This email is generated from the Heriot-Watt University Group, which
> includes:
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Edinburgh Business School a Charity Registered in Scotland,
>    SC026900. Edinburgh Business School is a company limited by guarantee,
>    registered in Scotland with registered number SC173556 and registered
>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>    Midlothian, EH14 4AS
>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
Received on Saturday, 21 October 2017 20:37:33 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:07:59 UTC