- From: Justin Clark-Casey <justinccdev@gmail.com>
- Date: Wed, 30 Sep 2020 10:26:54 +0100
- To: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
- Cc: Franck Michel <franck.michel@cnrs.fr>, Dan Brickley <danbri@danbri.org>, "public-bioschemas@w3.org" <public-bioschemas@w3.org>
- Message-ID: <CAME9NR8-Chd68Q4hO_N1A7Qyf-ErsngBvcdV1o4FREh8p4BOnQ@mail.gmail.com>
"Abandoned" is a bit harsh, Alasdair :). I'm going to say "wound down" as both myself, Ankit and Ricardo had to go on to other things. And there was a nasty second system effect where I was way too ambitious with its second iteration which unfortunately didn't leave it in an operational state (unless you know differently, Ankit). But yeah, BMUSE is definitely the thing to look at for checking if the markup could be crawled. Best, -- Justin Clark-Casey EOSC Programme Manager EMBL-EBI On Wed, 30 Sep 2020 at 09:38, Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk> wrote: > Hi > > > > BMUSE is a scraper that we are actively developing to be able to scrape > Bioschemas markup from sites. It is capable of crawling both static and > single page application sites. As input it takes either a list of URLs or a > sitemap (release just being made). > > > > We are about to start a few directed crawls using BMUSE. The first of > which will be targeted at gathering data pertinent to COVID-19 and making > that available at a single point for further processing. > > > > With regard to buzzbang, that was an earlier effort at crawling which has > since been abandoned. BMUSE is taking the ideas of buzzbang and expanding > on them. Hopefully we can offer some sort of graphical explorer over the > crawled data at some point in the future. > > > > Best regards > > > > Alasdair > > > > -- > > Alasdair J G Gray > > Associate Professor in Computer Science, > School of Mathematical and Computer Sciences > Heriot-Watt University, Edinburgh, UK. > > Email: A.J.G.Gray@hw.ac.uk <A.J.G.Gray@hw.ac.uk> > Web: http://www.macs.hw.ac.uk/~ajg33 > ORCID: http://orcid.org/0000-0002-5711-4872 > Office: Earl Mountbatten Building 1.39 > Twitter: @gray_alasdair > > > > > > Heriot-Watt is a global University, as a result my working hours may not > be your working hours. Do not feel pressure to reply to this email outside > your working hours. > > > > > > To arrange a meeting: https://doodle.com/mm/alasdairgray/book-a-time > > > > > > *From: *"franck.michel@cnrs.fr" <franck.michel@cnrs.fr> > *Date: *Tuesday, 29 September 2020 at 17:11 > *To: *"danbri@danbri.org" <danbri@danbri.org> > *Cc: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> > *Subject: *Re: Abstract for oral presentation accepted at TDWG 2020 > *Resent from: *"public-bioschemas@w3.org" <public-bioschemas@w3.org> > *Resent date: *Tuesday, 29 September 2020 at 17:10 > > > > > ***************************************************************** * > *Caution: This email originated from a sender outside Heriot-Watt > University. Do not follow links or open attachments if you doubt the > authenticity of the sender or the content. * > * ***************************************************************** > > > > Hi Dan, > > About the XML sitemaps I don't know, I'll ask that to the MNHN webmasters. > > About a tool to crawle the markup, last summer I tried BMUSE > <https://github.com/HW-SWeL/BMUSE>. that scrapes pages given by URL, but > it may have an option to cope with sitemaps. To be checked. > Anyway, for one page of the MNHN it works fine and returns an ntriple file. > > Rgds, > Franck. > > Le 29/09/2020 à 16:59, Dan Brickley a écrit : > > > > This is great - congratulations! Does anyone from the bioschemas community > have a crawler that could be applied to > https://inpn.mnhn.fr/accueil/index?lg=en ? do you publish XML sitemaps > that could make it easier for people to find and crawl this data? > > > > cheers, > > > > Dan > > > > > > On Tue, 29 Sep 2020 at 15:43, Franck Michel <franck.michel@cnrs.fr> wrote: > > Dear all, > > As you may know TDWG 2020 <https://www.tdwg.org/conferences/2020/> was > rescheduled as as online conference taking place during 2 weeks: working > sessions (Sep 21-25) and dissemination and sharing sessions (Oct 19-23). > > We have an abstract accepted for oral presentation in the second week.It > is called: "Unleash the Potential of your Website! 180,000 webpages from > the French Natural History Museum marked up with Bioschemas/Schema.org > biodiversity types" => https://doi.org/10.3897/biss.4.59046 > > Thanks to all for your contributions to this work. > > Franck. > > > -------- Message transféré -------- > > *Sujet : * > > [Biodiversity Information Science and Standards] Submission #59046: > Manuscript Published > > *Date de renvoi : * > > Tue, 29 Sep 2020 13:33:09 +0200 (CEST) > > *De (renvoi) : * > > franck.michel@cnrs.fr > > *Date : * > > Tue, 29 Sep 2020 14:33:04 +0300 > > *De : * > > Biodiversity Information Science and Standards <biss@pensoft.net> > <biss@pensoft.net> > > *Pour : * > > franck.michel@cnrs.fr > > > > Dear Franck Michel: > > We are pleased to inform you that your paper #59046 "Unleash the > Potential of your Website! 180,000 webpages from the French Natural History > Museum marked up with Bioschemas/Schema.org biodiversity types > <https://biss.pensoft.net/article/59046/>" was published in Biodiversity > Information Science and Standards, doi: 10.3897/biss.4.59046. Thank you for > choosing Biodiversity Information Science and Standards as a publication > venue for your work! > > We suggest that you help us increase the visibility of your study and > thereby boost its citations and impact by sharing it on social media (i.e. > Twitter, Facebook, Mendeley etc.), ideally using both your own and your > institution’s channels. Information and suggestions on how to promote your > work to the international scientific audience and wider public can be found > on our website <https://biss.pensoft.net/about#ScienceCommunication>. > > You may also order high-quality full-color reprints of your article > through our order form <https://goto.arphahub.com/JoZoQ63meQLy>. > > To keep yourself updated about research published in your scientific > field, you can set up email alerts for Biodiversity Information Science and > Standards via this link <https://goto.arphahub.com/1jKwlrASAWvL> or > through your user profile. You can change the research topics, journals or > frequency of these email alerts anytime. > > Biodiversity Information Science and Standards Editorial office > ___________________ > Pensoft Publishers <https://pensoft.net> > ARPHA Platform <https://arphahub.com/> > Biodiversity Information Science and Standards on Twitter > <https://twitter.com/BISS_Journal> and Facebook > <https://www.facebook.com/BISSJournal> > > PLEASE DO NOT FORWARD THIS EMAIL, IT CONTAINS YOUR PERSONAL AUTO LOGIN > LINK. > > > ------------------------------ > > Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With > campuses and students across the entire globe we span the world, delivering > innovation and educational excellence in business, engineering, design and > the physical, social and life sciences. This email is generated from the > Heriot-Watt University Group, which includes: > > 1. Heriot-Watt University, a Scottish charity registered under number > SC000278 > 2. Heriot- Watt Services Limited (Oriam), Scotland's national > performance centre for sport. Heriot-Watt Services Limited is a private > limited company registered is Scotland with registered number SC271030 and > registered office at Research & Enterprise Services Heriot-Watt University, > Riccarton, Edinburgh, EH14 4AS. > > The contents (including any attachments) are confidential. If you are not > the intended recipient of this e-mail, any disclosure, copying, > distribution or use of its contents is strictly prohibited, and you should > please notify the sender immediately and then delete it (including any > attachments) from your system. >
Received on Wednesday, 30 September 2020 09:27:45 UTC