Re: scientific publishing process (was Re: Cost and access) from Eric Prud'hommeaux on 2014-10-07 (semantic-web@w3.org from October 2014)

From: Eric Prud'hommeaux <eric@w3.org>
Date: Tue, 7 Oct 2014 02:13:54 -0400
To: Luca Matteis <lmatteis@gmail.com>
Cc: Norman Gray <norman@astro.gla.ac.uk>, Alexander Garcia Castro <alexgarciac@gmail.com>, Linking Open Data <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Message-ID: <20141007061352.GC31137@w3.org>
* Luca Matteis <lmatteis@gmail.com> [2014-10-07 00:41+0200]
> Sorry to jump into this once again but when it comes to typesetting
> nothing really comes close to Latex/PDF:
> http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
> not even HTML/CSS/JavaScript

Making a floating model look like Latex/PDF at all resolutions seems
impossible. Perhaps targeting a fixed (A4 or 8½×11 @300dpi) resolution
is quite doable. Doing so allows one to use fixed position for all CSS
directives.

But Eric, that sucks!!

Well, sort of, because we can't conveniently read it on a phone and it
doesn't fill large displays, but that may be a small price to pay to
be able to use all of the rich markup that we wax poetic about on this
list. If it does work, then we can figure out ways to script it so it
has a simply-controlled, predictable behavior at a certain resolution
but is reasonable at arbitrary resolutions.


> On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray <norman@astro.gla.ac.uk> wrote:
> >
> > Greetings.
> >
> > On 2014 Oct 6, at 19:19, Alexander Garcia Castro <alexgarciac@gmail.com> wrote:
> >
> >> querying PDFs is NOT simple and requires a lot of work -and usually
> >> produces lots of errors. just querying metadata is not enough. As I said
> >> before, I understand the PDF as something that gives me a uniform layout.
> >> that is ok and necessary, but not enough or sufficient within the context
> >> of the web of data and scientific publications. I would like to have the
> >> content readily available for mining purposes. if I pay for the publication
> >> I should get access to the publication in every format it is available. the
> >> content should be presented in a way so that it makes sense within the web
> >> of data.  if it is the full content of the paper represented in RDF or XML
> >> fine. also, I would like to have well annotated content, this is simple and
> >> something that could quite easily be part of existing publication
> >> workflows. it may also be part of the guidelines for authors -for instance,
> >> identify and annotate rhetorical structures.
> >
> >
> > The following might add something to this conversation.
> >
> > It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet.
> >
> > This has the document metadata, the abstract, and an illustrative bit of argumentation.  Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF.  Incorporating \begin{tabular} tables would be rather tricky, but not impossible.  I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic.
> >
> > It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start.
> >
> > Generating and querying this PDF seems pretty simple to me.
> >
> > ----
> >
> > $ cat test-xmp.tex
> > \documentclass{article}
> >
> > \usepackage{xmp-management}
> >
> > \title{This is a test file}
> > \author{Norman Gray}
> > \date{2014 October 6}
> >
> > \begin{document}
> >
> > \maketitle
> >
> > \abstract{It's easy to include metadata in \LaTeX\ files.
> >
> > That's because there's plenty of metadata in there already.}
> >
> > There is text and metatext within files.
> >
> > \section{Further details}
> >
> > In this section we could potentially discuss moving information
> > around.  I think we can assert that \claim{it is easy to move
> >   information around}, and, further, that \claim{making metadata
> >   readily available is a Good Thing}.  I hope that clears that up.
> > \end{document}
> > $ cat xmp-management.sty
> > \ProvidesPackage{xmp-management}[2014/10/06]
> >
> > \newwrite\xmp@ttlfile
> > \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl
> >   \let\xmp@open\relax}
> > \long\def\xmp@stmt#1#2{%
> >   \xmp@open
> >   \write\xmp@ttlfile{<> #1 """#2""".}}
> > \let\xmp@origtitle\title
> > \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}}
> > \let\xmp@origauthor\author
> > \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}}
> > \let\xmp@origdate\date
> > \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}}
> >
> > \long\def\abstract#1{
> >   \xmp@stmt{dc:abstract}{#1}
> >   \begin{quotation}\textbf{Abstract:} #1\end{quotation}}
> > \def\claim#1{
> >   \xmp@stmt{xmpinfo:claim}{#1}
> >   \emph{#1}}
> >
> > \let\xmp@origsection\section
> > \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1}
> >   \xmp@origsection{#1}}
> >
> > \usepackage{xmpincl}
> > \AtBeginDocument{\includexmp{info}}
> > $ pdflatex test-xmp
> > This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
> >  restricted \write18 enabled.
> > entering extended mode
> > (./test-xmp.tex
> > LaTeX2e <2011/06/27>
> > [...BLAH...]
> > Output written on test-xmp.pdf (1 page, 75667 bytes).
> > Transcript written on test-xmp.log.
> > $ cat test-xmp.ttl
> > <> dc:title """This is a test file""".
> > <> dc:creator """Norman Gray""".
> > <> dc:created """2014 October 6""".
> > <> dc:abstract """It's easy to include metadata in \LaTeX  \ files. \par That's because there's plenty of metadata in there already.""".
> > <> xmpinfo:has_section """Further details""".
> > <> xmpinfo:claim """it is easy to move information around""".
> > <> xmpinfo:claim """making metadata readily available is a Good Thing""".
> > $ make info.xmp
> > sed 's/\\//g' test-xmp.ttl | \
> >           cat prefix.ttl - | \
> >           rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \
> >           sed '/<\?xpacket/d' >info.xmp.tmp && mv info.xmp.tmp info.xmp
> > $ pdflatex test-xmp
> > This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
> >  restricted \write18 enabled.
> > entering extended mode
> > (./test-xmp.tex
> > LaTeX2e <2011/06/27>
> > [...BLAH...]
> > Output written on test-xmp.pdf (1 page, 77069 bytes).
> > Transcript written on test-xmp.log.
> > $ make extract-xmp
> > cc -Wall -o extract-xmp extract-xmp.c
> > $ ./extract-xmp test-xmp.pdf
> > <rdf:RDF xmlns:cc="http://creativecommons.org/ns#"
> > xmlns:dc="http://purl.org/dc/elements/1.1/"
> > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> > xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/"
> > xmlns:xmpinfo="http://example.org/xmpinfo"
> > xml:base="file:test-xmp.pdf">
> > <rdf:Description rdf:about="">
> > <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc-nd/4.0/"/>
> > <xmpinfo:claim>it is easy to move information around</xmpinfo:claim>
> > <xmpinfo:has_section>Further details</xmpinfo:has_section>
> > <xapRights:Marked>True</xapRights:Marked>
> > <xapRights:UsageTerms>
> > <rdf:Alt>
> > <rdf:li xml:lang="x-default">This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"&gt;Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License&lt;/a&gt;.</rdf:li>
> > </rdf:Alt>
> > </xapRights:UsageTerms>
> > <dc:abstract>It's easy to include metadata in LaTeX files. par That's because there's plenty of metadata in there already.</dc:abstract>
> > <dc:created>2014 October 6</dc:created>
> > <dc:creator>Norman Gray</dc:creator>
> > <dc:title>This is a test file</dc:title>
> > </rdf:Description>
> > </rdf:RDF>
> > $
> >
> >
> > ----
> >
> > All the best,
> >
> > Norman
> >
> >
> > --
> > Norman Gray  :  http://nxg.me.uk
> > SUPA School of Physics and Astronomy, University of Glasgow, UK
> >
> >
> 

-- 
-ericP

office: +1.617.599.3509
mobile: +33.6.80.80.35.59

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

There are subtle nuances encoded in font variation and clever layout
which can only be seen by printing this message on high-clay paper.
Received on Tuesday, 7 October 2014 06:13:59 UTC