W3C home > Mailing lists > Public > public-bioschemas@w3.org > October 2017

Re: Buzzbang crawler and search release 0.0.2 now available

From: Andra Waagmeester <andra@micelio.be>
Date: Mon, 23 Oct 2017 11:16:20 +0200
Message-ID: <CAMNM0fVCqLF7q-oJWFW9XCanW41nc6oB=PtMBFvMQJrVpbXjgw@mail.gmail.com>
To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
Cc: Dan Brickley <danbri@danbri.org>, Justin Clark-Casey <justinccdev@gmail.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
Here's my understanding of the state of ShEx implementations:
There is an old ShEx 1 js implementation but shex.js is up to date up to
ShEx 2.0, as are the Scala and Ruby implementations.
The Java implementation is an incomplete ShEx 2.0 but INRIA's hiring an
engineer to complete it.
The python implementation is ShEx 1, but is expected to be ShEx 2.0 soon.

The Validating RDF Data book has a chapter that compares both ShEx and
SHACL which highlights some of the expressivity differences.
The clearest message is that ShEx shapes are evaluated as a whole while
SHACL shapes are evaluated as a list of property declarations.
This means that ShEx supports constructs like

p:P644 @<P644_genomic_start> ;
p:P645 @<P645_genomic_end> ;
Which expresses the constraints of the genomic locations for a gene record.
It means that neither or both of the genomic locations must appear in the
data and that if only one of the two appears the data is deemed invalid.

my 2cts


On Sat, Oct 21, 2017 at 9:42 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>

> I used ShEx in the past but that was before SHACL. I do like the intuitive
> concise notation.
> However my postdoc has been looking at both and finding the SHACL examples
> more accessible. Also the SHACL framework seemed more up to date.
> For most purposes there probably isn't much in it. It would be good to
> know what are the things that can be done in one and not the other.
> Alasdair
> Alasdair J G Gray
> Fellow of the Higher Education Academy
> Assistant Professor in Computer Science
> Herriot-Watt University, Edinburgh
> www.macs.hw.ac.uk/~ajg33 <http://%3Cbr/%3Ewww.macs.hw.ac.uk/~ajg33>
> ------------------------------
> *From:* Andra Waagmeester <andra@micelio.be>
> *Sent:* Saturday, October 21, 2017 7:18:14 AM
> *To:* Dan Brickley
> *Cc:* Justin Clark-Casey; public-bioschemas@w3.org
> *Subject:* Re: Buzzbang crawler and search release 0.0.2 now available
> I also have a preference for ShEx. The syntax feels more intuitive.
> However, just recently a book describing and comparing both ShEx and
> Shacl was released: http://www.morganclaypoolpublishers.com/
> catalog_Orig/product_info.php?products_id=1091
> Personally, I I like the regex style of expressing cardinalities and the
> possibility to combine different shapes for similar concepts in Shex.
> On Fri, Oct 20, 2017 at 11:09 PM, Dan Brickley <danbri@danbri.org> wrote:
>> I have a slight preference for Shex personally but the most official in
>> W3C terms is Shacl.  Anyone else have a view?
>> On 20 Oct 2017 20:08, "Justin Clark-Casey" <justinccdev@gmail.com> wrote:
>>> Thanks Dan.  Yes, I need to look into SHACL/SHEX - I only have a passing
>>> acquaintance with them at the moment.  Would you recommend either one over
>>> the other?
>>> Regards,
>>> -- Justin
>>> On Thu, Oct 19, 2017 at 3:20 PM, Dan Brickley <danbri@danbri.org> wrote:
>>>> This sounds great! It would be interesting to try to write down the
>>>> specific data patterns you're extracting, by using W3C SHACL or SHEX shape
>>>> markup. I will be attempting the same for Google...
>>>> Dan
>>>> On 19 Oct 2017 12:37, "Justin Clark-Casey" <justinccdev@gmail.com>
>>>> wrote:
>>>>> Hi all,
>>>>> Following on from the Bioschemas adoption meeting, I'm continuing to
>>>>> work on the extremely alpha Buzzbang Bioschemas crawler and frontend when I
>>>>> can (renamed from BsBang, after Alistair pointed out the connotations of
>>>>> 'bs' :)).
>>>>> You can play with the current search engine by going to
>>>>> http://buzzbang.science
>>>>> In this release, I decided to concentrate on indexing DataCatalog
>>>>> (this is extremely primitive as of yet, only recording the name, url,
>>>>> description and keywords properties).  If you go to buzzbang.science and
>>>>> search for terms such as 'data' or 'registry' you'll get some results.
>>>>> Currently, I'm manually adding URLs - you can see the small list at
>>>>> [1]. I added those that have DataCatalog JSON+LD embedded that I had in my
>>>>> notes, such as identifiers.org and fairsharing.org. Down the road,
>>>>> users will be able to submit URLs for crawling directly on the website, but
>>>>> for now, please contact me, raise a Github issue [2] or submit a pull
>>>>> request if there's an URL I can add.
>>>>> Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets
>>>>> and think about how that information can help improve simple search.
>>>>> All feature suggestions or pull requests welcome on the Github crawler
>>>>> [2] and search frontend [3] projects.
>>>>> [1] https://github.com/justinccdev/bsbang-crawler/blob/master/co
>>>>> nf/default-targets.txt
>>>>> [2] https://github.com/justinccdev/bsbang-crawler
>>>>> [3] https://github.com/justinccdev/bsbang-frontend
>>>>> Cheers,
>>>>> --
>>>>> Justin Clark-Casey (@justincc)
>>>>> Research Software Architect
>>>>> Micklem Lab, University of Cambridge
> ------------------------------
> *Heriot-Watt University is The Times & The Sunday Times International
> University of the Year 2018*
> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
> campuses and students across the entire globe we span the world, delivering
> innovation and educational excellence in business, engineering, design and
> the physical, social and life sciences.
> This email is generated from the Heriot-Watt University Group, which
> includes:
>    1. Heriot-Watt University, a Scottish charity registered under number
>    SC000278
>    2. Edinburgh Business School a Charity Registered in Scotland,
>    SC026900. Edinburgh Business School is a company limited by guarantee,
>    registered in Scotland with registered number SC173556 and registered
>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>    Midlothian, EH14 4AS
>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>    performance centre for sport. Heriot-Watt Services Limited is a private
>    limited company registered is Scotland with registered number SC271030 and
>    registered office at Research & Enterprise Services Heriot-Watt University,
>    Riccarton, Edinburgh, EH14 4AS.
> The contents (including any attachments) are confidential. If you are not
> the intended recipient of this e-mail, any disclosure, copying,
> distribution or use of its contents is strictly prohibited, and you should
> please notify the sender immediately and then delete it (including any
> attachments) from your system.
Received on Monday, 23 October 2017 09:17:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:07:59 UTC