W3C home > Mailing lists > Public > semantic-web@w3.org > July 2014

Re: Call for Linked Research

From: Paul Houle <ontology2@gmail.com>
Date: Wed, 30 Jul 2014 11:28:41 -0400
Message-ID: <CAE__kdQOxmzU6ZymRRuVJ-bLQcNCs3HvkBeQx7GFpcB4=CoDrA@mail.gmail.com>
To: Gannon Dick <gannon_dick@yahoo.com>
Cc: Sarven Capadisli <info@csarven.ca>, Giovanni Tummarello <g.tummarello@gmail.com>, Linking Open Data <public-lod@w3.org>, SW-forum <semantic-web@w3.org>
I think's a little more than "tax avoidance".  It's more that it seems much
easier for Elsevier to extort huge amounts of money than to get people to
pay a little bit of money for services that are inexpensive to provision.

If you take the amount that a commercial journal gets in subscription fees
and divide that by the number of papers it publishes you typically get a
number that is upwards of $10,000.

If you look at a well-run non-profit publisher,  such as the American
Physical Society,  it comes closer to $2000.

Neither of these figures counts the unpaid work of reviewers, the editorial
board,  etc.

When I worked at arXiv.org and divided the size of the budget by the number
of papers we handled,  we'd get a number more like $5 a paper.

arXiv could have been quite the sustainable business if it had managed to
get just 1/1000 the value per paper that commercial journal publishers get.

For a long time,  arXiv was able to run at Los Alamos labs but,  with the
Republicans in power (who tend to want to close LANL and move the weapons
work to LLNL) Paul Ginsparg decided it was time to get out and he brought
it to the Cornell Library.

When I was involved in the mid-00's arXiv represented perhaps 4% of the
budget of Cornell Library but probably delivered more value to end users
than the rest of the library put together -- back then,  50,000 scientists
got up every morning and looked at arXiv to see what was new in their
fields and now the numbers are certainly more than that.  The cost of
running arXiv was much smaller than the check that the library cut yearly
to Elsevier.

The short story is that CUL,  like most of the real jewels of Cornell,  was
seen as a cost center and not an opportunity center and faced intense
budget screw tightening and a lot of crazy stuff happened and one side
effect was that I left.  After about a decade of penury and confusion,
 arXiv finally got a "sustainability plan" that ensures it will continue in


This was all the more painful to endure because I saw so much larger amount
of funding going down various black holes.  For instance,  there was the
postdoc in the office next door who was supposed to use a supercomputer to
analyze the usage log of a project that cost $2 M a year to develop,
 except after extracting the robots he could have printed out the logs on a
line printer and done the analysis by hand (I'm not kidding about this!)
 Then there was the foundation that got a $20M endowment to make a handful
of journals available to a handful of journals in a handful of 4th world
countries.  if arXiv had gotten that,  it would be free papers for everyone
everywhere forever.


I've recently developed a system for scalable RDF publishing pretty much at
cost.  The first round of products includes


that offer unlimited access with no throttling since users pay for their
own hardware.  (Practically this means:  try to use the DBpedia SPARQL
endpoint or Freebase MQL = PROJECT FAILURE,  use RDFeasy = IT JUST WORKS)

I'm not going to accept any objections that this product is "too expensive"
because at 45 cents an hour it is 5% of the cost of a minimum wage worker
in the U.S. and if you are using it for R&D you probably only need to run
it when that worker is working.  I also think it would be hard to save
money rolling it on your own if you think people's labor is worth anything
at all,  because it doesn't take very much screwing around to waste $200 of
labor,  even at grad student rates.

And while I'm ranting,  I'll also call your attention to this


This is a campaign where I collect money to pay my server bills.  If I get
more money,  I can offer more services and spend more time improving things
(HELPING *YOU* SUCCEED AT YOUR PROJECTS)  I'm grateful to the people who
are contributing,  but the way things now I am spending a lot of time
hustling up work (i.e. helping companies like Elsevier keep Lucene 3
installations running) rather than doing the work I can do best.  Money you
donate here does not go to university administration overhead,  owners of
San Francisco real estate,  or other leeches and rent seekers.

If you have an issue with :BaseKB you can talk with me and I can do
something about it.  If you have an issue with Freebase,  go talk to the
hand at the evil empire.


On Wed, Jul 30, 2014 at 9:50 AM, Gannon Dick <gannon_dick@yahoo.com> wrote:

> --------------------------------------------
> On Wed, 7/30/14, Giovanni Tummarello <g.tummarello@gmail.com> wrote:
>  So Sarvem let us be rational and pick Occam's razor style simplest
> explanation ...
> By lex parsimonae (Occam's Razor) Tax Avoidance is magic.
> By Bell's Theorem, Tax Avoidance is theft (of services).
> Theft of Software As A Service is ... making me dizzy and diz-interested.

Paul Houle
Expert on Freebase, DBpedia, Hadoop and RDF
(607) 539 6254    paul.houle on Skype   ontology2@gmail.com
Received on Wednesday, 30 July 2014 15:29:16 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:49:17 UTC