W3C home > Mailing lists > Public > public-bioschemas@w3.org > July 2018

Re: DefinedTerm vs CategoryCode

From: Justin Clark-Casey <jc955@cam.ac.uk>
Date: Thu, 26 Jul 2018 16:26:55 +0100
To: public-bioschemas@w3.org
Message-ID: <804e5a0c-0f14-a550-d16c-e47972e6e9f0@cam.ac.uk>
Sorry for the very late reply, Melanie.

The old hand-hacked crawler will be too slow - I finally realized this myself through trying to crawl Biosamples.  Ankit 
(our GSoC student) is working on a much faster Scrapy-based crawler, and Ricardo has GoCrawlIt.  We're now all working 
under the Buzzbang umbrella to have these scrapers deposit to a common format in MongoDB.  More details through the 
links at [1].

When you guys have any questions, please feel free to pop over to the Bioschemas Buzzbang channel at [2] or of course, 
create an issue on the appropriate Github.  This is all skunkworks still, Ankit's GSoC programme will end in August and 
myself and Ricardo are only unofficially working on this, but it will be great to see some real-world issues.  We hope 
to deploy the newer scrapers to Cambridge HPC cloud soon, after which a common crawl will hopefully become available 
down the line.

[1] https://github.com/buzzbangorg/buzzbang-doc/wiki
[2] https://bioschemas.slack.com/messages/CB5TWCHS4/



On 05/07/18 08:59, Melanie Courtot wrote:
> HI Leyla,
> Thanks for the note. I understand defined terms to be a mechanism to provide a finite list of terms to be used in our profiles - which is why I thought they’d be a good way to define “protein” “samples” “gene” etc - the 20 or so entities we are using throughout. I don’t think they’d be suitable for a list of unknown terms as we obviously wouldn’t want to have a dictionary of millions of terms (in addition to them already being hosted by OLS for example)
> I would love to hear your thoughts if you can think of a way it could work better for the samples representation. We have deployed the bioschemas markup on the BioSamples samples, and are now adding a context to link the sample entity to the corresponding OBI term - similar to what was done for protein and PRO. It seems to be working fine at the moment.
> We are about to try and crawl the resources and extract all ontology terms (which was one of our use cases) - @Justin we’ll try your crawler but this is an early notice that may come back with requests for help/modifications.
> Cheers,
> Melanie
>> On 4 Jul 2018, at 21:29, ljgarcia <ljgarcia@ebi.ac.uk> wrote:
>> Hi Melanie, Simon, all,
>> I think you are using CategoryCode in Samples to refer to ontology terms. Would it not be more appropriate to use DefinedTerm in that case? I am wondering about it, any thoughts? If Samples mark up is already in use and tools are working fine with CategoryCode, I guess it would be better to stick to it.
>> Regards,
Received on Thursday, 26 July 2018 15:27:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:08:05 UTC