W3C home > Mailing lists > Public > semantic-web@w3.org > October 2014

Re: Cost and access (Was Re: [ESWC 2015] First Call for Paper)

From: Simon Spero <sesuncedu@gmail.com>
Date: Fri, 3 Oct 2014 14:28:11 -0400
Message-ID: <CADE8KM4KiP2XmotP7UA=O21FQby=DzS+P_HO=wKX1S_eR7bHpQ@mail.gmail.com>
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: "Eric Prud'hommeaux" <eric@w3.org>, Linked Data community <public-lod@w3.org>, semantic-web@w3.org
On Oct 3, 2014 11:07 AM, "Phillip Lord" <phillip.lord@newcastle.ac.uk>
wrote:
>
> Eric Prud'hommeaux <eric@w3.org> writes:
>
> > Let's work through the requirements and a plausible migration plan. We
need:
> >
> > 1 persistent storage: it's hard to beat books for a feeling of
persistence.
> > Contracts with trusted archival institutions can help but we might also
want some assurances that the protocols and formats will persist as well.

1. An item printed on NISO Z39.48 conformant paper, using appropriate ink,
is intended to have a life expectancy of "several hundred" years. Issues of
testing using accelerated aging complicate matters - see e.g.
http://www.loc.gov/preservation/resources/rt/AcceleratedAging.pdf

Lossless compression of paper is difficult,  which leads to a much higher
attrition rate as items are "weeded". Retrieval costs become higher as the
number of replicas decreases.

On the other hand, because a copy of the material is owned, a decision not
to continue subscription to a journal does not cause loss of access to
previous issues.

2. Document format obsolescence does not seem to be as big a problem as was
once feared due to pre-emptive awareness of the issue, and the use of
approaches like emulation.  See e.g.
http://www.dpworkshop.org/dpm-eng/oldmedia/index.html

3. Physical format obsolescence is a bigger issue; however moving forward
it is less of a concern since storage media needs to be periodically
replaced.

4. Archival data can (and should) be replicated, in multiple locations.

Systems like David Rosenthal's LOCKSS (Lots Of Copies Keep Stuff Safe) use
a "k-strategy", using a relatively small number of high reliability and
high cost replicas, at highly trusted institutions.

http://www.lockss.org

I proposed an "r-strategy" approach, using a much larger ad-hoc mesh
containing much less reliable storage services with far more copies
(requiring much more reasoning and automated planning to conform to
preservation and performance policies). The working title was SCHMEER
(Several Copies Help Make Everything Eventually Reachable) - alas my
advisor, a noted expert in Digital Preservation, was not comfortable with
computer thingies...

>  I've thrown away
**Weeded**
> conference proceedings the last decade anyway.

> > 2 impact factor: i have the impression that conventional publishers
have a bit of a monopoly and and sudden disruption would be hard to
engineer. How do to get leading researchers to devote their work in some
new crackpot e-journal to the exclusion of other articles which will earn
them more points towards tenure and grants? Perhaps the answer is slowly
build the impact factor; perhaps it's some sort of revolution in the minds
of administrators and funders.

The value of publication in formal journals derives solely from scarcity.
Because there are only a limited number of slots, they allow for simple
metrics.
The same value could be achieved by skipping the whole publication part,
and just issuing digitally signed badges to go in the disciplinary
archives.

Sophisticated scientometrics can provide more useful measures of the value
of scientific research, but any metric that is known ahead of time can be
gamed.
Cassidy Sugimoto and I joked about starting a company called "pimp my h"
that would provide bespoke strategic advice on publishing strategies to get
the most h for a given amount of new work- intentional obliteration, Google
Scholar SEO etc). We never thought of making up imaginary people to cite
stuff though.

There is a lot of effort going in to making data citable in ways meaningful
to funding agencies.

Simon
Received on Friday, 3 October 2014 18:28:40 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:49:25 UTC