- From: Luca Matteis <lmatteis@gmail.com>
- Date: Tue, 7 Oct 2014 00:41:56 +0200
- To: Norman Gray <norman@astro.gla.ac.uk>
- Cc: Alexander Garcia Castro <alexgarciac@gmail.com>, Linking Open Data <public-lod@w3.org>, "semantic-web@w3.org Web" <semantic-web@w3.org>
Sorry to jump into this once again but when it comes to typesetting nothing really comes close to Latex/PDF: http://tex.stackexchange.com/questions/120271/alternatives-to-latex - not even HTML/CSS/JavaScript On Tue, Oct 7, 2014 at 12:18 AM, Norman Gray <norman@astro.gla.ac.uk> wrote: > > Greetings. > > On 2014 Oct 6, at 19:19, Alexander Garcia Castro <alexgarciac@gmail.com> wrote: > >> querying PDFs is NOT simple and requires a lot of work -and usually >> produces lots of errors. just querying metadata is not enough. As I said >> before, I understand the PDF as something that gives me a uniform layout. >> that is ok and necessary, but not enough or sufficient within the context >> of the web of data and scientific publications. I would like to have the >> content readily available for mining purposes. if I pay for the publication >> I should get access to the publication in every format it is available. the >> content should be presented in a way so that it makes sense within the web >> of data. if it is the full content of the paper represented in RDF or XML >> fine. also, I would like to have well annotated content, this is simple and >> something that could quite easily be part of existing publication >> workflows. it may also be part of the guidelines for authors -for instance, >> identify and annotate rhetorical structures. > > > The following might add something to this conversation. > > It illustrates getting the metadata from a LaTeX file, putting it into an XMP packet in a PDF, and getting it out of the PDF as RDF. Pace Peter's mention of /Author, /Title, etc, this just focuses on the XMP packet. > > This has the document metadata, the abstract, and an illustrative bit of argumentation. Adding details about the document structure, and (RDF) pointers to any figures would be feasible, as would, I suspect, incorporating CSV files directly into the PDF. Incorporating \begin{tabular} tables would be rather tricky, but not impossible. I can't help feeling that the XHTML+RDFa equivalent would be longer and need more documentation to instruct the author where to put the RDFa magic. > > It's not very fancy, and still has rough edges, but it only took me 100 minutes, from a standing start. > > Generating and querying this PDF seems pretty simple to me. > > ---- > > $ cat test-xmp.tex > \documentclass{article} > > \usepackage{xmp-management} > > \title{This is a test file} > \author{Norman Gray} > \date{2014 October 6} > > \begin{document} > > \maketitle > > \abstract{It's easy to include metadata in \LaTeX\ files. > > That's because there's plenty of metadata in there already.} > > There is text and metatext within files. > > \section{Further details} > > In this section we could potentially discuss moving information > around. I think we can assert that \claim{it is easy to move > information around}, and, further, that \claim{making metadata > readily available is a Good Thing}. I hope that clears that up. > \end{document} > $ cat xmp-management.sty > \ProvidesPackage{xmp-management}[2014/10/06] > > \newwrite\xmp@ttlfile > \def\xmp@open{\immediate\openout\xmp@ttlfile \jobname.ttl > \let\xmp@open\relax} > \long\def\xmp@stmt#1#2{% > \xmp@open > \write\xmp@ttlfile{<> #1 """#2""".}} > \let\xmp@origtitle\title > \def\title#1{\xmp@stmt{dc:title}{#1}\xmp@origtitle{#1}} > \let\xmp@origauthor\author > \def\author#1{\xmp@stmt{dc:creator}{#1}\xmp@origauthor{#1}} > \let\xmp@origdate\date > \def\date#1{\xmp@stmt{dc:created}{#1}\xmp@origdate{#1}} > > \long\def\abstract#1{ > \xmp@stmt{dc:abstract}{#1} > \begin{quotation}\textbf{Abstract:} #1\end{quotation}} > \def\claim#1{ > \xmp@stmt{xmpinfo:claim}{#1} > \emph{#1}} > > \let\xmp@origsection\section > \def\section#1{\xmp@stmt{xmpinfo:has_section}{#1} > \xmp@origsection{#1}} > > \usepackage{xmpincl} > \AtBeginDocument{\includexmp{info}} > $ pdflatex test-xmp > This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) > restricted \write18 enabled. > entering extended mode > (./test-xmp.tex > LaTeX2e <2011/06/27> > [...BLAH...] > Output written on test-xmp.pdf (1 page, 75667 bytes). > Transcript written on test-xmp.log. > $ cat test-xmp.ttl > <> dc:title """This is a test file""". > <> dc:creator """Norman Gray""". > <> dc:created """2014 October 6""". > <> dc:abstract """It's easy to include metadata in \LaTeX \ files. \par That's because there's plenty of metadata in there already.""". > <> xmpinfo:has_section """Further details""". > <> xmpinfo:claim """it is easy to move information around""". > <> xmpinfo:claim """making metadata readily available is a Good Thing""". > $ make info.xmp > sed 's/\\//g' test-xmp.ttl | \ > cat prefix.ttl - | \ > rapper -iturtle -ordfxml-xmp -q - file:test-xmp.pdf | \ > sed '/<\?xpacket/d' >info.xmp.tmp && mv info.xmp.tmp info.xmp > $ pdflatex test-xmp > This is pdfTeX, Version 3.1415926-2.4-1.40.13 (TeX Live 2012) > restricted \write18 enabled. > entering extended mode > (./test-xmp.tex > LaTeX2e <2011/06/27> > [...BLAH...] > Output written on test-xmp.pdf (1 page, 77069 bytes). > Transcript written on test-xmp.log. > $ make extract-xmp > cc -Wall -o extract-xmp extract-xmp.c > $ ./extract-xmp test-xmp.pdf > <rdf:RDF xmlns:cc="http://creativecommons.org/ns#" > xmlns:dc="http://purl.org/dc/elements/1.1/" > xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" > xmlns:xapRights="http://ns.adobe.com/xap/1.0/rights/" > xmlns:xmpinfo="http://example.org/xmpinfo" > xml:base="file:test-xmp.pdf"> > <rdf:Description rdf:about=""> > <cc:license rdf:resource="http://creativecommons.org/licenses/by-nc-nd/4.0/"/> > <xmpinfo:claim>it is easy to move information around</xmpinfo:claim> > <xmpinfo:has_section>Further details</xmpinfo:has_section> > <xapRights:Marked>True</xapRights:Marked> > <xapRights:UsageTerms> > <rdf:Alt> > <rdf:li xml:lang="x-default">This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/4.0/">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>.</rdf:li> > </rdf:Alt> > </xapRights:UsageTerms> > <dc:abstract>It's easy to include metadata in LaTeX files. par That's because there's plenty of metadata in there already.</dc:abstract> > <dc:created>2014 October 6</dc:created> > <dc:creator>Norman Gray</dc:creator> > <dc:title>This is a test file</dc:title> > </rdf:Description> > </rdf:RDF> > $ > > > ---- > > All the best, > > Norman > > > -- > Norman Gray : http://nxg.me.uk > SUPA School of Physics and Astronomy, University of Glasgow, UK > >
Received on Monday, 6 October 2014 22:42:24 UTC