Re: scientific publishing process (was Re: Cost and access) from Paul Tyson on 2014-10-04 (semantic-web@w3.org from October 2014)

From: Paul Tyson <phtyson@sbcglobal.net>
Date: Sat, 04 Oct 2014 18:47:19 -0500
To: Michael Brunnbauer <brunni@netestate.de>
Cc: "semantic-web@w3.org" <semantic-web@w3.org>, Linking Open Data <public-lod@w3.org>
Message-ID: <1412466439.2343.84.camel@aquinas.attlocal.net>

Hi Michael,

On Sat, 2014-10-04 at 11:19 +0200, Michael Brunnbauer wrote:
> Hello Paul,
> 
> On Fri, Oct 03, 2014 at 04:05:07PM -0500, Paul Tyson wrote:
> > Yes. We are setting the bar too low. The field of knowledge computing
> > will only reach maturity when authors can publish their theses in such a
> > manner that one can programmatically extract the concepts, propositions,
> > and arguments;
> 
> I thought Kingsley is the only one seriously suggesting that we communicate in
> triples. Let's take one step back to the proposal of making research datasets
> machine readable with RDF.

I certainly was not suggesting this. It would indeed be silly to publish
large collections of empirical quantitative propositions in RDF.

Nor do I think Kingsley would endorse such efforts (but he can speak for
himself on that). I mostly admire and agree with Kingsley's
indefatigable efforts to show how easy it is to harvest the low-hanging
fruit of semantic web/linked data technologies. I just don't want that
to be mistaken for the desired end state.

> 
> Please go to http://crcns.org/NWB
> 
> Have a look at an example dataset:
> 
>  http://crcns.org/data-sets/hc/hc-3/about-hc-3
> 
> "The total size of the data is about 433 GB compressed"
> 
> Even if you do not use triples for all of that (which would be insane),
> specifying a "structured data container" is a very difficult task.
> 
> So instead of talking about setting the bar higher, why not just help the 
> people over there with their problem?

Creating, tracking, and publishing empirical quantitative propositions
is not their biggest impediment to contributing to human knowledge.

Connecting those propositions to significant conclusions through sound
arguments is the more important problem. They will attempt to do so,
presumably, by creating monographs in an electronic source format that
has more or less structure to it. The structure will support many useful
operations, including formatting the content for different media,
hyperlinking to other resources, indexing, and metadata gleaning. The
structure will most likely *not* support any programmatic operations to
expose the logical form of the arguments in such a way that another
person could extract them and put them into his own logic machine to
confirm, deny, strengthen, or weaken the arguments.

Take for example a research paper whose argument proceeded along the
lines of "All men are mortal; Socrates is a man; therefore Socrates is
mortal." Along comes a skeptic who purports to have evidence that
Socrates is not a man. He publishes the evidence in such a way that
other users can if they wish insert the conclusion from such evidence in
place of the minor premise in the original researcher's argument. Then
the conclusion cannot be affirmed. The original researcher must either
find a different form of argument to prove his conclusion, overturn the
skeptic's evidence (by further argument, also machine-processable), or
withdraw his conclusion.

This simple model illustrates how human knowledge has progressed for
millenia, mediated solely by oral, written, and visual and diagrammatic
communication. I am suggesting we enlist computers to do something more
for us in this realm than just speeding up the millenia-old mechanisms.

Of course we don't need a program to help us determine whether or not
Socrates is mortal. But what about the task of affirming or denying the
proposition, "Unchecked anthropogenic climate change will destroy human
civilization." Gigabytes of data do not constitute logical argument. A
sound chain of reasoning from empirical evidence and agreed universals
is wanted. Yes, this can be done in academic prose supplemented by
charts and diagrams, and backed by digital files containing lots of
numbers. But, as Kingsley would say, that is not the best way ca. 2014.

Regards,
--Paul

Received on Saturday, 4 October 2014 23:50:12 UTC