Re: scientific publishing process (was Re: Cost and access) from Simon Spero on 2014-10-07 (semantic-web@w3.org from October 2014)

From: Simon Spero <sesuncedu@gmail.com>
Date: Tue, 7 Oct 2014 14:16:53 -0400
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: semantic-web@w3.org, Linked Data community <public-lod@w3.org>, Luca Matteis <lmatteis@gmail.com>, Alexander Garcia Castro <alexgarciac@gmail.com>, Norman Gray <norman@astro.gla.ac.uk>
Message-ID: <CADE8KM78xzogmdsqX4WZ8AWcjEjhYzBOBnaT-0=FHS=tejAp7Q@mail.gmail.com>
BLUF: This is where information science comes in. Technology must meet the
needs of real users.

It may be better to generate better Tagged PDFs, and to experiment, using
some existing methodology annotation ontologies, with generating auxiliary
files of triples. This might require new/changed latex packages, new
div/span classes, etc.
\huge

But what is really needed is actually working with SMEs to discover the
cultural practices within the field and subfield, and developing systems
that support their work styles. This is why Information Science is
important.

If there are changes in practices that would be beneficial, and these
benefits can be demonstrated to the appropriate audiences, then these can
be suggested.

If existing programs, libraries, and  operating systems can be modified to
provide these wins transparently, then it is easier to get the changes
adopted.

If the benefits require additional work, then the additional work must give
proportionate benefits to those doing the work, or be both of great benefit
to funding agencies or other gatekeepers, *and* be easily verifiable.

An example might be a proof (or justified belief)   that a paper and it's
supplemental materials do, or do not contain everything required to attempt
to replicate the results.
This might be feasible in many fields through combination of annotation,
with sufficiently powerful KR language and reasoning system.

Similarly, relatively simple meta-statistical analysis can note common
errors (like multiple comparisons that do not correct for False Discovery
Rate). This can be easy if the analysis code is embedded in the paper (eg
SWeave), or if the adjustment method is part of the annotation, and the
decision process need not be total.

This kind of validation can be useful to researchers (less embarrassment),
and useful to gatekeepers (less to manually review).

Convincing communities working with large datasets to use RDF as a native
data format is unlikely to work.

The primary problem is that it isn't a very good one. It's great for
combining data from multiple sources- as long as ever datum is true.
If you want to be less credulous , KMAC YOYO.

Convincing people to add metadata describing  values in structures as
owl/rdfs datatypes or classes is much easier- for example,  as HDF5
attributes.

If the benefits require major changes to the cultural practices within a
given knowledge community, then they must be extremely important *to that
community*, and will still be resisted, especially by those most
accultutrated into that knowledge community.

An example of this kind of change might be inclusion in supplemental
materials of analyses and data that did not give positive results. This
reduces the file drawer effect,  and may improve the justified level of
belief in the significance of published results (p < 1.0).

This level of change may require a "blood upgrade" ( <
https://www.goodreads.com/quotes/4079-a-new-scientific-truth-does-not-triumph-by-convincing-its>).


It might also be imposable from above by extreme measures (if more than 10%
of your claimed significant results can't be replicated, and you can't
provide a reasonable explanation in a court of law, you may be held liable
for consequential damages incurred by others reasonably relying on your
work, and reasonable costs & possible punitive damages for costs incurred
attempting to replicate.

Repeat offenders will be fed to a ravenous mob of psychology
undergraduates,  or forced to teach introductory creative writing ).

Simon
P. S.

[dvips was much easier if you had access to Distiller]

It is possible to add mathematical content to html pages, but it is not
easy.

MathML is not something that browser developers want, which means that the
only viable approach is MathJax (<http://mathjax.org>).

Mathjax is impressive, and supports a nice subset of LaTeX (including some
AMS).
However, it adds a noticeable delay to page rendering, as it is heavy duty
eczema script, and is computing layout on the fly.

It does not require server side support, so is usable from static sites
like github pages (see e g.  the tests at the bottom of <
http://who-wg.github.io>).

However the common deployment pattern, using their CDN, adds archival
dependencies.

>From a processing perspective, this does not make semantic processing of
the text much easier, as it may require eczema script code to be executed.
 On Oct 7, 2014 8:14 AM, "Phillip Lord" <phillip.lord@newcastle.ac.uk>
wrote:

>
>
> On 10/07/2014 05:20 AM, Phillip Lord wrote:
>
>> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> writes:
>>
>>  tex4ht takes the slight strange approach of having an strange and
>>>> incomprehensible command line, and then lots of scripts which do default
>>>> options, of which xhmlatex is one. In my installation, they've only put
>>>> the basic ones into the path, so I ran this with
>>>> /usr/share/tex4ht/xhmlatex.
>>>>
>>>>
>>>> Phil
>>>>
>>>>
>>> So someone has to package this up so that it can be easily used.  Before
>>> then,
>>> how can it be required for conferences?
>>>
>>
>> http://svn.gnu.org.ua/sources/tex4ht/trunk/bin/ht/unix/xhmlatex
>>
>
> Somehow this is not in my tex4ht package.
>
> In any case, the HTML output it produces is dreadful.   Text characters,
> even outside math, are replaced by numeric XML character entity references.
>
> peter
>
>
Received on Tuesday, 7 October 2014 18:17:23 UTC