Re: scientific publishing process (was Re: Cost and access)

"PDFs are surprisingly flexible and open containers for transporting around
Stuff"

hi, i'm feeling tempted to add something provocative ;-)

"PDFs are surprisingly mature in disguising all the 'bla bla' and make it
look nice"...

=> http://tractatus-online.appspot.com/Tractatus/jonathan/index.html

wkr turnguard




| Jürgen Jakobitsch,
| Software Developer
| Semantic Web Company GmbH
| Mariahilfer Straße 70 / Neubaugasse 1, Top 8
| A - 1070 Wien, Austria
| Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22

COMPANY INFORMATION
| web       : http://www.semantic-web.at/
| foaf      : http://company.semantic-web.at/person/juergen_jakobitsch
PERSONAL INFORMATION
| web       : http://www.turnguard.com
| foaf      : http://www.turnguard.com/turnguard
| g+        : https://plus.google.com/111233759991616358206/posts
| skype     : jakobitsch-punkt
| xmlns:tg  = "http://www.turnguard.com/turnguard#"

2014-10-04 14:47 GMT+02:00 Norman Gray <norman@astro.gla.ac.uk>:

>
> Bernadette, hello.
>
> On 2014 Oct 4, at 00:36, Bernadette Hyland <bhyland@3roundstones.com>
> wrote:
>
> ... a really useful message which pulls several of these threads
> together.  The following is a rather fragmentary response.
>
> As a reference point, I tend to think "publication" = "LaTeX -> PDF".  To
> pre-dispel a misconception, here, I'm not being a cheerleader for PDF
> below, but a fair fraction of the antagonism directed towards PDF in this
> thread is, I think, misplaced -- PDF is not the problem.
>
> > We'd do ourselves a huge favor if we showed (STM) publishing executives
> why this Linked Data stuff matters anyway.
>
> They know.  A surprisingly large fraction of the Article Processing Charge
> we pay to them goes on extracting, managing and sharing metadata.  That
> includes DOIs, Crossref feeds, science direct, and so on and so on, and so
> (it seems) on.  It also includes conversion to XML: if you submit a LaTeX
> file to a big publisher, the first thing they'll do is convert it to
> XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and
> preserve that; several of them then re-generate LaTeX for final production.
>
> To a large extent, I suspect publishers now regard metadata management as
> their Job -- in the sense of their contribution to the scholarly endeavour
> -- and they could do without the dead trees.  If you can offer them a way
> of making metadata _insertion_ easier, which is cost effective, can be
> scaled up, and which a _broad_ range of authors will accept (the hard bit),
> they'll rip your arm off.
>
> > 1) PDF works well for (STM) publishers who require fixed page display;
>
> Yes, and for authors.  Given an alternative between an HTML version of a
> paper and a PDF version, I will _always_ choose the PDF, because it's
> zero-hassle, more reliably faithful to the author's original, more
> readable, and I can read it in the bath.
>
> > 2) PDF doesn't take advantage of the advances we've made in machine
> readability;
>
> If by this you mean RDF, then yes, the naive ways of generating PDFs are
> not RDF-aware.  So we shouldn't be naive...
>
> XMP is an ISO standard (as PDF is, and like it originating from Adobe) and
> is a type of RDF (well, an irritatingly 90% profile of RDF, but let that
> pass).  Though it's not trivial, it's not hard to generate an XMP packet
> and get it into a PDF, and once there, the metadata job is mostly done.
>
> > 3) In fact, PDFs suck on eBook readers which are all about flexible page
> layout; and
>
> Sure, but they're not intended for e-book readers, so of course they're
> poor at that.
>
> > 4) We already have the necessary Web Standards to address the problem,
> so no need to recreate the wheel.
>
> If, again, you mean RDF, then I agree completely.
>
> > --> Produce a Web-based tool that allows researchers to share their
> [privately | publicly ] funded knowledge and produces a variety of outputs:
> LaTeX, PDF and carries with it a machine readable representation.
>
> Well, not web-based: I'd want something I can run on my own machine.
>
> > Do people agree with the following SOLUTION approach?
> >
> > The international standards to solve this exist. Standards from W3C and
> the International Digital Publishing Forum (IDPF).[2]  Use (X)HTML for
> generalized document creation/rendering. Use CSS for styling. Use MathML
> for formulas. Use JS for action. Use RDF to model the metadata within HTML.
>
> PDF and XMP are both ISO standards, too.  LaTeX isn't a Standard standard,
> but it's pretty damn stable.
>
> MathML one would _not_ want to type.  The only ways of generating MathML,
> that I'm slightly familiar with, start with TeX syntax.  There are
> presumably GUI-based ones, too *shudder*.
>
> > I propose a 'walk before we run' approach but do better than basic
> metadata (i.e., title, author name, institution, abstract).  Link to other
> scholarly communities/projects such as Vivo.[3]
>
> I generate Atom feeds for my PDF lecture notes.  The feed content is
> extracted from the XMP and from the /Author, /Title, etc, metadata within
> the PDF.  That metadata gets there automatically from the \author{...},
> \title{...} metadata which is necessarily within the LaTeX source.  The
> pipeline isn't production quality, but it's done.  That much isn't
> challenging.
>
> > We've got to show the 1,200 lb gorillas (STM publishers) why they want
> to come over to our part of the forest ... it isn't enough to stay with PDF
> to facilitate typesetting in 2015!  The Web has moved on & so must the
> publishers.
>
> While we're up our tree arguing, that din you can hear in the next
> clearing is the publishers spending their APCs on large-scale metadata
> extraction, and tearing out their hair at authors' apparent inability to
> follow simple instructions on how to make that easier.
>
> (And just by the way: yes, publishers are in it for the money, ...
> monopoly rents..., yadda yadda, ... but I've never actually _caught_ one
> eating babies).
>
> > Anything we do must be better than LaTeX in terms of ease-of-use.
>
> Really?  What, exactly?
>
> Word (and analogues)?  Sure, you can get metadata from WP files, but it
> takes a lot of heuristic effort, and requires authors to be pretty
> disciplined about using styles.
>
> GUI XML editors?  I was talking to someone a couple of weeks ago who'd
> just completed a whole PhD detailing exactly how rubbish XML editors are in
> practical usability terms.
>
> nxml-mode in Emacs?  Probably the best option for writing pointy-brackets,
> but still a bit painful for authoring extensive text.  And you can't write
> MathML.
>
> > Publishers will make more money because their customers which include
> researchers & universities, will be able to discover, access and re-use
> data liberated from the 20th Century PDF.
>
> That's why the publishers currently care about metadata.
>
> ----
>
> PDFs are surprisingly flexible and open containers for transporting around
> Stuff (I haven't tried it, but I have little doubt you could bundle HTML,
> CSS and all the RDF you wanted into a PDF, should you somehow manage to
> devise a use-case for that).  The hard-ish bit is using that metadata in a
> visibly useful way -- tools tend not to rely on it, because it tends not to
> be there; and it tends not to be there because users don't demand it; and
> users don't demand it because tools don't display it.  The seriously hard
> bit is getting the metadata from the authors (who, to a first
> approximation, _really_, *really* don't care) into the PDF.
>
> All the best,
>
> Norman
>
>
> --
> Norman Gray  :  http://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
>
>
>

Received on Saturday, 4 October 2014 13:46:54 UTC