W3C home > Mailing lists > Public > public-bioschemas@w3.org > October 2017

Re: Buzzbang crawler and search release 0.0.2 now available

From: Dan Brickley <danbri@danbri.org>
Date: Mon, 23 Oct 2017 15:48:38 +0000
Message-ID: <CAFfrAFqT8JpK8tDOmt6=BcrZue3K6uvu7hhxAa7xGmgfaaeQuA@mail.gmail.com>
To: Andra Waagmeester <andra@micelio.be>
Cc: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>, Justin Clark-Casey <justinccdev@gmail.com>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
For shacl, http://shacl.org/playground/->
https://github.com/TopQuadrant/shacl-js (MIT licensed)

What would an API around both look like?

Dan

On 23 October 2017 at 09:16, Andra Waagmeester <andra@micelio.be> wrote:

> Here's my understanding of the state of ShEx implementations:
> There is an old ShEx 1 js implementation but shex.js is up to date up to
> ShEx 2.0, as are the Scala and Ruby implementations.
> The Java implementation is an incomplete ShEx 2.0 but INRIA's hiring an
> engineer to complete it.
> The python implementation is ShEx 1, but is expected to be ShEx 2.0 soon.
>
> The Validating RDF Data book has a chapter that compares both ShEx and
> SHACL which highlights some of the expressivity differences.
> The clearest message is that ShEx shapes are evaluated as a whole while
> SHACL shapes are evaluated as a list of property declarations.
> This means that ShEx supports constructs like
>
> (
> p:P644 @<P644_genomic_start> ;
> p:P645 @<P645_genomic_end> ;
> )*
> Which expresses the constraints of the genomic locations for a gene
> record.
> It means that neither or both of the genomic locations must appear in the
> data and that if only one of the two appears the data is deemed invalid.
>
> my 2cts
>
> Andra
>
>
> On Sat, Oct 21, 2017 at 9:42 PM, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
> wrote:
>
>> I used ShEx in the past but that was before SHACL. I do like the
>> intuitive concise notation.
>>
>> However my postdoc has been looking at both and finding the SHACL
>> examples more accessible. Also the SHACL framework seemed more up to date.
>>
>> For most purposes there probably isn't much in it. It would be good to
>> know what are the things that can be done in one and not the other.
>>
>> Alasdair
>>
>> Alasdair J G Gray
>> Fellow of the Higher Education Academy
>> Assistant Professor in Computer Science
>> Herriot-Watt University, Edinburgh
>>
>> www.macs.hw.ac.uk/~ajg33 <http://%3Cbr/%3Ewww.macs.hw.ac.uk/~ajg33>
>>
>> ------------------------------
>> *From:* Andra Waagmeester <andra@micelio.be>
>> *Sent:* Saturday, October 21, 2017 7:18:14 AM
>> *To:* Dan Brickley
>> *Cc:* Justin Clark-Casey; public-bioschemas@w3.org
>> *Subject:* Re: Buzzbang crawler and search release 0.0.2 now available
>>
>> I also have a preference for ShEx. The syntax feels more intuitive.
>> However, just recently a book describing and comparing both ShEx and
>> Shacl was released: http://www.morganclaypoolpublishers.com/catalog_
>> Orig/product_info.php?products_id=1091
>>
>> Personally, I I like the regex style of expressing cardinalities and the
>> possibility to combine different shapes for similar concepts in Shex.
>>
>> On Fri, Oct 20, 2017 at 11:09 PM, Dan Brickley <danbri@danbri.org> wrote:
>>
>>> I have a slight preference for Shex personally but the most official in
>>> W3C terms is Shacl.  Anyone else have a view?
>>>
>>> On 20 Oct 2017 20:08, "Justin Clark-Casey" <justinccdev@gmail.com>
>>> wrote:
>>>
>>>> Thanks Dan.  Yes, I need to look into SHACL/SHEX - I only have a
>>>> passing acquaintance with them at the moment.  Would you recommend either
>>>> one over the other?
>>>>
>>>> Regards,
>>>>
>>>> -- Justin
>>>>
>>>> On Thu, Oct 19, 2017 at 3:20 PM, Dan Brickley <danbri@danbri.org>
>>>> wrote:
>>>>
>>>>> This sounds great! It would be interesting to try to write down the
>>>>> specific data patterns you're extracting, by using W3C SHACL or SHEX shape
>>>>> markup. I will be attempting the same for Google...
>>>>>
>>>>> Dan
>>>>>
>>>>> On 19 Oct 2017 12:37, "Justin Clark-Casey" <justinccdev@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Following on from the Bioschemas adoption meeting, I'm continuing to
>>>>>> work on the extremely alpha Buzzbang Bioschemas crawler and frontend when I
>>>>>> can (renamed from BsBang, after Alistair pointed out the connotations of
>>>>>> 'bs' :)).
>>>>>>
>>>>>> You can play with the current search engine by going to
>>>>>> http://buzzbang.science
>>>>>>
>>>>>> In this release, I decided to concentrate on indexing DataCatalog
>>>>>> (this is extremely primitive as of yet, only recording the name, url,
>>>>>> description and keywords properties).  If you go to buzzbang.science and
>>>>>> search for terms such as 'data' or 'registry' you'll get some results.
>>>>>>
>>>>>> Currently, I'm manually adding URLs - you can see the small list at
>>>>>> [1]. I added those that have DataCatalog JSON+LD embedded that I had in my
>>>>>> notes, such as identifiers.org and fairsharing.org. Down the road,
>>>>>> users will be able to submit URLs for crawling directly on the website, but
>>>>>> for now, please contact me, raise a Github issue [2] or submit a pull
>>>>>> request if there's an URL I can add.
>>>>>>
>>>>>> Next, I plan to crawl the rest of DataCatalog, esp. embedded DataSets
>>>>>> and think about how that information can help improve simple search.
>>>>>>
>>>>>> All feature suggestions or pull requests welcome on the Github
>>>>>> crawler [2] and search frontend [3] projects.
>>>>>>
>>>>>> [1] https://github.com/justinccdev/bsbang-crawler/blob/master/co
>>>>>> nf/default-targets.txt
>>>>>> [2] https://github.com/justinccdev/bsbang-crawler
>>>>>> [3] https://github.com/justinccdev/bsbang-frontend
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> --
>>>>>> Justin Clark-Casey (@justincc)
>>>>>> Research Software Architect
>>>>>> Micklem Lab, University of Cambridge
>>>>>>
>>>>>
>>>>
>> ------------------------------
>>
>> *Heriot-Watt University is The Times & The Sunday Times International
>> University of the Year 2018*
>>
>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>> campuses and students across the entire globe we span the world, delivering
>> innovation and educational excellence in business, engineering, design and
>> the physical, social and life sciences.
>>
>> This email is generated from the Heriot-Watt University Group, which
>> includes:
>>
>>    1. Heriot-Watt University, a Scottish charity registered under number
>>    SC000278
>>    2. Edinburgh Business School a Charity Registered in Scotland,
>>    SC026900. Edinburgh Business School is a company limited by guarantee,
>>    registered in Scotland with registered number SC173556 and registered
>>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>>    Midlothian, EH14 4AS
>>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>>    performance centre for sport. Heriot-Watt Services Limited is a private
>>    limited company registered is Scotland with registered number SC271030 and
>>    registered office at Research & Enterprise Services Heriot-Watt University,
>>    Riccarton, Edinburgh, EH14 4AS.
>>
>> The contents (including any attachments) are confidential. If you are not
>> the intended recipient of this e-mail, any disclosure, copying,
>> distribution or use of its contents is strictly prohibited, and you should
>> please notify the sender immediately and then delete it (including any
>> attachments) from your system.
>>
>
>
Received on Monday, 23 October 2017 15:49:07 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:07:59 UTC