Re: Call for Linked Research from Paul Houle on 2014-07-29 (public-lod@w3.org from July 2014)

From: Paul Houle <ontology2@gmail.com>
Date: Tue, 29 Jul 2014 13:21:40 -0400
To: Sarven Capadisli <info@csarven.ca>
Cc: Linked Data community <public-lod@w3.org>, SW-forum <semantic-web@w3.org>
Message-ID: <CAE__kdSviPm0i0cSr48QJrn0LLxduzO_QGVVYjM3FJyevwvL-A@mail.gmail.com>
I wouldn't blame journals or conference organizers for the problems,
 instead I'd blame granting agencies,  tenure committees,  etc.

For the most part in animal behavior you find that reinforcing a behavior
causes that behavior to happen more often.  (Unlike animals,  humans
occasionally behave in ways that defy any rational explanation.)

If granting agencies said that publishing in *Science* means no more grants
for you,  people would stop submitting to *Science*.  If doing business
with Elsevier meant junior faculty were ineligible for tenure and that
senior faculty got their office relocated to the most distant broom closet,
 you'd see Elsevier pack up the academic publishing division.  (That
wouldn't be the end for them since they could still charge people for
access to the laws they are supposed to obey...  Just don't tell the judge
you couldn't afford to read the law that you broke)

So any real change here needs some reform in how research is funded.

As other people have said in this thread,  these issues are most acute in
life sciences for a number of practical reasons.  For one thing,  most
computing practitioners don't read the CS literature at all,  so bad
results have a limited effect on practitioners.  Certainly many vendors
engage in benchmarketing but the harm to society is nowhere near the harm
done by the corruption of the medical literature in connection with pharma
or the silent stupidity that comes from the fact that it's almost
economically possible to do studies of non-pharmaceutical treatments that
are valid and have enough statistical power to be worth doing.

But as for reproduciblity in CS,  I've put in my hours on the depressing
task of searching through TREC proceedings to see what I can apply to
real-life search relevance and also hours in the depressing task of finding
bugs in search engine and ML code,  probably more in the commercial space,
 but sometimes looking at academic code.  When people say they tried ten
different variations of their search algorithm and they all get almost the
same relevance,  I can't take it for granted anymore that they implemented
the algorithm that they thought they did.  When people get poor results
with a neural network,  I think the odds are something like beta(7,3) that
they got the gradient function wrong.

Some days I think I'll take my chances with the animals because at least
they show gratitude when you feed them.


ᐧ


On Tue, Jul 29, 2014 at 10:10 AM, Sarven Capadisli <info@csarven.ca> wrote:

> On 2014-07-29 11:56, Hugh Glaser wrote:
>
>> This is of course an excellent initiative.
>> But I worry that it feels like people are talking about building stuff
>> from scratch, or even lashing things together.
>>
>> Is it really the case that a typical research approach to what you are
>> calling Linked Research doesn’t turn up theories and systems that can
>> inform what we do?
>>
>> What I think you are talking about is what I think is commonly called
>> e-Science.
>> And there is a vast body of research on this topic.
>> This initiative also impinges on the Open Archives/Access/Repositories
>> movements, who are deeply concerned about how to capture all research
>> outputs. See for example http://www.openarchives.org/ore/
>>
>> In e-Science I know of http://www.myexperiment.org, for example, which
>> has been doing what I think is very related stuff for 6 or 7 years now,
>> with significant funding, so is a mature system.
>> And, of course, it is compatible with all our Linked Data goodness (I
>> hope).
>> Eg http://www.myexperiment.org/workflows/59
>> We could do worse than look to see what they can do for us?
>> And it appears that things can be skinned within the system:
>> http://www.myexperiment.org/packs/106
>>
>> You are of course right, that it is a social problem, rather than a
>> technical problem; this is why others’ experience in solving the social
>> problem is of great interest.
>>
>> Maybe myExperiment or a related system would do what you want pretty much
>> out of the box?
>>
>> Note that it goes even further than you are suggesting, as it has
>> facilities to allow other researchers to actually run the code/workflows.
>>
>> It would take us years to get anywhere close to this sort of thing,
>> unless we (LD people) could find serious resources.
>> And I suspect we would end up with something that looks very similar!
>>
>> Very best
>> Hugh
>>
>
> Thanks Hugh. Those are great examples and all the power to those people
> that's working hard at it. And you are right about the eScience bit. Just
> to clarify for anyone that's following this thread up:
>
> It is not my intention to overlook or devalue existing or similar efforts
> to what I'm proposing. Nor is it my intention to "re-brand" anything. This
> is simply a Call to "DIY".
>
> If conferences and publishers set the limitations to how we can join our
> combine knowledge and efforts, that's a clear sign to take the control
> back. They are not delivering on anything. We can do better.
>
> You publish your work in however LD-friendly way you can. How much effort
> that goes into it is what you and others can get back. If you are content
> to not be able to discover interesting or relevant parts of others people's
> knowledge using the technologies and tools that's in front of you, there is
> nothing to debate about here.
>
> Like I said, it is mind-boggling to think that the SW/LD research
> community is stuck on 1-star Linked Data. Is that sinking in yet?
>
> -Sarven
> http://csarven.ca/#i
>
>


-- 
Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com
Received on Tuesday, 29 July 2014 17:22:07 UTC