Re: scientific publishing process (was Re: Cost and access) from Luca Matteis on 2014-10-06 (semantic-web@w3.org from October 2014)

From: Luca Matteis <lmatteis@gmail.com>
Date: Tue, 7 Oct 2014 00:41:56 +0200
To: Norman Gray <norman@astro.gla.ac.uk>
Cc: Alexander Garcia Castro <alexgarciac@gmail.com>, Linking Open Data <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Message-ID: <CALp38EMBUyVZNPzM0_91T3cnx0=tkieHMJFHMTFEjUzoZKYBhw@mail.gmail.com>
Sorry to jump into this once again but when it comes to typesetting
nothing really comes close to Latex/PDF:
http://tex.stackexchange.com/questions/120271/alternatives-to-latex -
not even HTML/CSS/JavaScript

On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray <norman@astro.gla.ac.uk> wrote:
>
> Greetings.
>
> On 2014 Oct 6, at 19:19, Alexander Garcia Castro <alexgarciac@gmail.com> wrote:
>
>> querying PDFs is NOT simple and requires a lot of work -and usually
>> produces lots of errors. just querying metadata is not enough. As I said
>> before, I understand the PDF as something that gives me a uniform layout.
>> that is ok and necessary, but not enough or sufficient within the context
>> of the web of data and scientific publications. I would like to have the
>> content readily available for mining purposes. if I pay for the publication
>> I should get access to the publication in every format it is available. the
>> content should be presented in a way so that it makes sense within the web
>> of data.  if it is the full content of the paper represented in RDF or XML
>> fine. also, I would like to have well annotated content, this is simple and
>> something that could quite easily be part of existing publication
>> workflows. it may also be part of the guidelines for authors -for instance,
>> identify and annotate rhetorical structures.
>
>
> The following might add something to this conversation.
>
> It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF.  Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet.
>
> This has the document metadata, the abstract, and an illustrative bit of argumentation.  Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF.  Incorporating \begin{tabular} tables would be rather tricky, but not impossible.  I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic.
>
> It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start.
>
> Generating and querying this PDF seems pretty simple to me.
>
> ----
>
> $ cat test-xmp.tex
> \documentclass{article}
>
> \usepackage{xmp-management}
>
> \title{This is a test file}
> \author{Norman Gray}
> \date{2014 October 6}
>
> \begin{document}
>
> \maketitle
>
> \abstract{It's easy to include metadata in \LaTeX\ files.
>
> That's because there's plenty of metadata in there already.}
>
> There is text and metatext within files.
>
> \section{Further details}
>
> In this section we could potentially discuss moving information
> around.  I think we can assert that \claim{it is easy to move
>   information around}, and, further, that \claim{making metadata
>   readily available is a Good Thing}.  I hope that clears that up.
> \end{document}
> $ cat xmp-management.sty
> \ProvidesPackage{xmp-management}[2014/10/06]
>
> \newwrite\xmp@ttlfile
> \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl
>   \let\xmp@open\relax}
> \long\def\xmp@stmt#1#2{%
>   \xmp@open
>   \write\xmp@ttlfile{<> #1 """#2""".}}
> \let\xmp@origtitle\title
> \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}}
> \let\xmp@origauthor\author
> \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}}
> \let\xmp@origdate\date
> \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}}
>
> \long\def\abstract#1{
>   \xmp@stmt{dc:abstract}{#1}
>   \begin{quotation}\textbf{Abstract:} #1\end{quotation}}
> \def\claim#1{
>   \xmp@stmt{xmpinfo:claim}{#1}
>   \emph{#1}}
>
> \let\xmp@origsection\section
> \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1}
>   \xmp@origsection{#1}}
>
> \usepackage{xmpincl}
> \AtBeginDocument{\includexmp{info}}
> $ pdflatex test-xmp
> This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
>  restricted \write18 enabled.
> entering extended mode
> (./test-xmp.tex
> LaTeX2e <2011/06/27>
> [...BLAH...]
> Output written on test-xmp.pdf (1 page, 75667 bytes).
> Transcript written on test-xmp.log.
> $ cat test-xmp.ttl
> <> dc:title """This is a test file""".
> <> dc:creator """Norman Gray""".
> <> dc:created """2014 October 6""".
> <> dc:abstract """It's easy to include metadata in \LaTeX  \ files. \par That's because there's plenty of metadata in there already.""".
> <> xmpinfo:has_section """Further details""".
> <> xmpinfo:claim """it is easy to move information around""".
> <> xmpinfo:claim """making metadata readily available is a Good Thing""".
> $ make info.xmp
> sed 's/\\//g' test-xmp.ttl | \
>           cat prefix.ttl - | \
>           rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \
>           sed '/<\?xpacket/d' >info.xmp.tmp && mv info.xmp.tmp info.xmp
> $ pdflatex test-xmp
> This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012)
>  restricted \write18 enabled.
> entering extended mode
> (./test-xmp.tex
> LaTeX2e <2011/06/27>
> [...BLAH...]
> Output written on test-xmp.pdf (1 page, 77069 bytes).
> Transcript written on test-xmp.log.
> $ make extract-xmp
> cc -Wall -o extract-xmp extract-xmp.c
> $ ./extract-xmp test-xmp.pdf
> <rdf:RDF xmlns:cc="http://creativecommons.org/ns#"
> xmlns:dc="http://purl.org/dc/elements/1.1/"
> xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
> xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/"
> xmlns:xmpinfo="http://example.org/xmpinfo"
> xml:base="file:test-xmp.pdf">
> <rdf:Description rdf:about="">
> <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc-nd/4.0/"/>
> <xmpinfo:claim>it is easy to move information around</xmpinfo:claim>
> <xmpinfo:has_section>Further details</xmpinfo:has_section>
> <xapRights:Marked>True</xapRights:Marked>
> <xapRights:UsageTerms>
> <rdf:Alt>
> <rdf:li xml:lang="x-default">This work is licensed under a &lt;a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/"&gt;Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License&lt;/a&gt;.</rdf:li>
> </rdf:Alt>
> </xapRights:UsageTerms>
> <dc:abstract>It's easy to include metadata in LaTeX files. par That's because there's plenty of metadata in there already.</dc:abstract>
> <dc:created>2014 October 6</dc:created>
> <dc:creator>Norman Gray</dc:creator>
> <dc:title>This is a test file</dc:title>
> </rdf:Description>
> </rdf:RDF>
> $
>
>
> ----
>
> All the best,
>
> Norman
>
>
> --
> Norman Gray  :  http://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
>
>
Received on Monday, 6 October 2014 22:42:26 UTC