Re: Google Summer of Code

I think it could be complementary since potentially Buzzbang (free text
search oriented) and Fredrico's proposal (which from the above I understand
to be SPARQL oriented) could use the same crawled data.  My crawler in
Buzzbang at the moment is hand-built, but it would probably be more
sensible to use something like Apache Nutch (which I see from your Github
profile, Fredrico, that you may already have worked on).  Also, I agree
commoncrawl.org is something to look at (it's on my todo list too).  My
questions there are how often sites are recrawled and whether particular
sites can be recrawled on demand.

Perhaps there could be 3 proposed projects.

1) Depending on how far Fredrico has got already, a project to flesh out a
crawler using Apache Nutch and/or use commoncrawl data.  The results to be
suitable for SPARQL/free text search and available for download. I'd be
happy to help mentor, potentially. This will require some reasonable
hardware to do the crawl and store the results.

2) Frederico's SPARQL search frontend using a common crawl.

3) Buzzbang search adapted to use a common crawl as it becomes available.

-- Justin

On Fri, Jan 19, 2018 at 9:56 AM, Rafael C. Jimenez <
rafael.jimenez@elixir-europe.org> wrote:

> Hi Federico,
>
> I am including the Bioschemas mailing list since I think this is an
> interesting idea worth discussing. Maybe this is something that could
> complement the proposal
> <https://docs.google.com/document/d/1_i2vqUfCy1laVbslVR6kjaRLIUN8Y2nqQIEPT8G68gg/edit>
> made by Justin? @Justin, what do you think? Would you see this as a
> different project?
>
> About this project I was wondering if it would make sense to collaborate
> and build on top of http://commoncrawl.org/
>
> Regards,
> Rafa
>
> On 19 January 2018 at 09:32, Federico López <fico89@gmail.com> wrote:
>
>> Hi,
>>
>> I was thinking in something like buzzbang but a little less high level.
>> The idea would be to provide an interface to make it possible to run SPARQL
>> like queries against one or multiple Bioschemas websites, something like
>> BioschemasQL or BioQL (I suppose this one had been used before). The query
>> client interface should be able to crawl the website and transform the
>> bioschemas markup in something we can query, it would be pretty useful to
>> have a way to query several websites at a time in a similar way used with
>> the federated queries in the Linked Open Data Cloud represented as multiple
>> SPARQL endpoints.
>>
>>
>> And yes I would be happy to participate as a mentor.
>>
>>
>> Cheers!
>>
>> On Fri, Jan 19, 2018 at 7:24 AM, Rafael C. Jimenez <
>> rafael.jimenez@elixir-europe.org> wrote:
>>
>>> Hi Niall and Federico, you were engaged in the tools group/discussions
>>> and came up with good ideas. Is there anything you would like to add or
>>> would you like to participate as a mentor?
>>> ---------- Forwarded message ----------
>>> From: "Gray, Alasdair J G" <A.J.G.Gray@hw.ac.uk>
>>> Date: 18 Jan 2018 10:07
>>> Subject: Re: Google Summer of Code
>>> To: "public-bioschemas@w3.org" <public-bioschemas@w3.org>
>>> Cc: "Stian Soiland-Reyes" <soiland-reyes@manchester.ac.uk>, "Alan R
>>> Williams" <alan.r.williams@manchester.ac.uk>
>>>
>>> Thanks for all the positive responses about this initiative and to Alan
>>>> and Stian for their advice.
>>>>
>>>> I think that we are well on course with the initial project ideas. Rafa
>>>> and I will need to come up with some blurb about the Bioschemas community
>>>> which can hopefully entice students and convince the Googlers.
>>>>
>>>> The deadline is this Sunday, so not much time left.
>>>>
>>>> Alasdair
>>>>
>>>> On 15 Jan 2018, at 10:51, Leyla Garcia <ljgarcia@ebi.ac.uk> wrote:
>>>>
>>>> Hi Alasdair,
>>>>
>>>> I like the Validata project, let me know if I can collaborate somehow
>>>> with it.
>>>> The EBI search could be done as well using Bioschemas markup so that
>>>> would be a possibility as well. I could help redacting the idea behind but
>>>> I am afraid not this week. @Rafael, what do you think of this idea? This is
>>>> something you have mentioned in the past.
>>>>
>>>> Cheers,
>>>>
>>>> On 11/01/2018 15:54, Gray, Alasdair J G wrote:
>>>>
>>>> Hi All,
>>>>
>>>> We would like to propose Bioschemas as a Google Summer of Code
>>>> organisation (deadline 23 January 2018).
>>>> https://summerofcode.withgoogle.com/
>>>>
>>>> I have started drafting ideas for projects in a google document, please
>>>> feel free to add more project ideas or details to my initial brain dump.
>>>> https://docs.google.com/document/d/1_i2vqUfCy1laVbslVR6kjaRL
>>>> IUN8Y2nqQIEPT8G68gg/edit
>>>>
>>>> Best regards
>>>>
>>>> Alasdair, Carole, and Rafa
>>>>
>>>> Alasdair J G Gray
>>>>
>>>> Fellow of the Higher Education Academy
>>>> Assistant Professor in Computer Science,
>>>> School of Mathematical and Computer Sciences
>>>> (Athena SWAN Bronze Award)
>>>> Heriot-Watt University, Edinburgh UK.
>>>>
>>>> Email: A.J.G.Gray@hw.ac.uk
>>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>>> Office: Earl Mountbatten Building 1.39
>>>> Twitter: @gray_alasdair
>>>>
>>>> ------------------------------
>>>>
>>>> *Heriot-Watt University is The Times & The Sunday Times International
>>>> University of the Year 2018*
>>>>
>>>> Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With
>>>> campuses and students across the entire globe we span the world, delivering
>>>> innovation and educational excellence in business, engineering, design and
>>>> the physical, social and life sciences.
>>>>
>>>> This email is generated from the Heriot-Watt University Group, which
>>>> includes:
>>>>
>>>>    1. Heriot-Watt University, a Scottish charity registered under
>>>>    number SC000278
>>>>    2. Edinburgh Business School a Charity Registered in Scotland,
>>>>    SC026900. Edinburgh Business School is a company limited by guarantee,
>>>>    registered in Scotland with registered number SC173556 and registered
>>>>    office at Heriot-Watt University Finance Office, Riccarton, Currie,
>>>>    Midlothian, EH14 4AS
>>>>    3. Heriot- Watt Services Limited (Oriam), Scotland's national
>>>>    performance centre for sport. Heriot-Watt Services Limited is a private
>>>>    limited company registered is Scotland with registered number SC271030 and
>>>>    registered office at Research & Enterprise Services Heriot-Watt University,
>>>>    Riccarton, Edinburgh, EH14 4AS.
>>>>
>>>> The contents (including any attachments) are confidential. If you are
>>>> not the intended recipient of this e-mail, any disclosure, copying,
>>>> distribution or use of its contents is strictly prohibited, and you should
>>>> please notify the sender immediately and then delete it (including any
>>>> attachments) from your system.
>>>>
>>>>
>>>>
>>>>
>>>> Alasdair J G Gray
>>>>
>>>> Fellow of the Higher Education Academy
>>>> Assistant Professor in Computer Science,
>>>> School of Mathematical and Computer Sciences
>>>> (Athena SWAN Bronze Award)
>>>> Heriot-Watt University, Edinburgh UK.
>>>>
>>>> Email: A.J.G.Gray@hw.ac.uk
>>>> Web: http://www.macs.hw.ac.uk/~ajg33
>>>> ORCID: http://orcid.org/0000-0002-5711-4872
>>>> Office: Earl Mountbatten Building 1.39
>>>> Twitter: @gray_alasdair
>>>>
>>>>
>>
>>
>> --
>>
>> *FEDERICO LÓPEZ GÓMEZ*
>> Ingeniero de Sistemas
>> Universidad del Valle
>>
>
>
>
> --
>
> *Rafael C Jimenez*
> ELIXIR Chief Technical Officer
> www.elixir-europe.org
>
> ELIXIR Hub, South Building
> Wellcome Genome Campus
> Hinxton, Cambridge, CB10 1SD, UK
> Tel: +44 (0) 1223 49 2574 <%2B44%20%280%29%201223%20492574>
> E-Mail: rafael.jimenez@elixir-europe.org [image: ELIXIR]
> <http://www.elixir-europe.org/>
>
>

Received on Friday, 19 January 2018 12:24:42 UTC