- From: Jürgen Jakobitsch <j.jakobitsch@semantic-web.at>
- Date: Sat, 4 Oct 2014 15:46:25 +0200
- To: Linked Data community <public-lod@w3.org>
- Message-ID: <CAETaefw0NW75ZsQe+9GbJ7cLQoJ7ZoBjkV9gvkcKQjv8yU69ww@mail.gmail.com>
"PDFs are surprisingly flexible and open containers for transporting around Stuff" hi, i'm feeling tempted to add something provocative ;-) "PDFs are surprisingly mature in disguising all the 'bla bla' and make it look nice"... => http://tractatus-online.appspot.com/Tractatus/jonathan/index.html wkr turnguard | Jürgen Jakobitsch, | Software Developer | Semantic Web Company GmbH | Mariahilfer Straße 70 / Neubaugasse 1, Top 8 | A - 1070 Wien, Austria | Mob +43 676 62 12 710 | Fax +43.1.402 12 35 - 22 COMPANY INFORMATION | web : http://www.semantic-web.at/ | foaf : http://company.semantic-web.at/person/juergen_jakobitsch PERSONAL INFORMATION | web : http://www.turnguard.com | foaf : http://www.turnguard.com/turnguard | g+ : https://plus.google.com/111233759991616358206/posts | skype : jakobitsch-punkt | xmlns:tg = "http://www.turnguard.com/turnguard#" 2014-10-04 14:47 GMT+02:00 Norman Gray <norman@astro.gla.ac.uk>: > > Bernadette, hello. > > On 2014 Oct 4, at 00:36, Bernadette Hyland <bhyland@3roundstones.com> > wrote: > > ... a really useful message which pulls several of these threads > together. The following is a rather fragmentary response. > > As a reference point, I tend to think "publication" = "LaTeX -> PDF". To > pre-dispel a misconception, here, I'm not being a cheerleader for PDF > below, but a fair fraction of the antagonism directed towards PDF in this > thread is, I think, misplaced -- PDF is not the problem. > > > We'd do ourselves a huge favor if we showed (STM) publishing executives > why this Linked Data stuff matters anyway. > > They know. A surprisingly large fraction of the Article Processing Charge > we pay to them goes on extracting, managing and sharing metadata. That > includes DOIs, Crossref feeds, science direct, and so on and so on, and so > (it seems) on. It also includes conversion to XML: if you submit a LaTeX > file to a big publisher, the first thing they'll do is convert it to > XML+MathML (using workflows based on for example LaTeXML or TeX4ht) and > preserve that; several of them then re-generate LaTeX for final production. > > To a large extent, I suspect publishers now regard metadata management as > their Job -- in the sense of their contribution to the scholarly endeavour > -- and they could do without the dead trees. If you can offer them a way > of making metadata _insertion_ easier, which is cost effective, can be > scaled up, and which a _broad_ range of authors will accept (the hard bit), > they'll rip your arm off. > > > 1) PDF works well for (STM) publishers who require fixed page display; > > Yes, and for authors. Given an alternative between an HTML version of a > paper and a PDF version, I will _always_ choose the PDF, because it's > zero-hassle, more reliably faithful to the author's original, more > readable, and I can read it in the bath. > > > 2) PDF doesn't take advantage of the advances we've made in machine > readability; > > If by this you mean RDF, then yes, the naive ways of generating PDFs are > not RDF-aware. So we shouldn't be naive... > > XMP is an ISO standard (as PDF is, and like it originating from Adobe) and > is a type of RDF (well, an irritatingly 90% profile of RDF, but let that > pass). Though it's not trivial, it's not hard to generate an XMP packet > and get it into a PDF, and once there, the metadata job is mostly done. > > > 3) In fact, PDFs suck on eBook readers which are all about flexible page > layout; and > > Sure, but they're not intended for e-book readers, so of course they're > poor at that. > > > 4) We already have the necessary Web Standards to address the problem, > so no need to recreate the wheel. > > If, again, you mean RDF, then I agree completely. > > > --> Produce a Web-based tool that allows researchers to share their > [privately | publicly ] funded knowledge and produces a variety of outputs: > LaTeX, PDF and carries with it a machine readable representation. > > Well, not web-based: I'd want something I can run on my own machine. > > > Do people agree with the following SOLUTION approach? > > > > The international standards to solve this exist. Standards from W3C and > the International Digital Publishing Forum (IDPF).[2] Use (X)HTML for > generalized document creation/rendering. Use CSS for styling. Use MathML > for formulas. Use JS for action. Use RDF to model the metadata within HTML. > > PDF and XMP are both ISO standards, too. LaTeX isn't a Standard standard, > but it's pretty damn stable. > > MathML one would _not_ want to type. The only ways of generating MathML, > that I'm slightly familiar with, start with TeX syntax. There are > presumably GUI-based ones, too *shudder*. > > > I propose a 'walk before we run' approach but do better than basic > metadata (i.e., title, author name, institution, abstract). Link to other > scholarly communities/projects such as Vivo.[3] > > I generate Atom feeds for my PDF lecture notes. The feed content is > extracted from the XMP and from the /Author, /Title, etc, metadata within > the PDF. That metadata gets there automatically from the \author{...}, > \title{...} metadata which is necessarily within the LaTeX source. The > pipeline isn't production quality, but it's done. That much isn't > challenging. > > > We've got to show the 1,200 lb gorillas (STM publishers) why they want > to come over to our part of the forest ... it isn't enough to stay with PDF > to facilitate typesetting in 2015! The Web has moved on & so must the > publishers. > > While we're up our tree arguing, that din you can hear in the next > clearing is the publishers spending their APCs on large-scale metadata > extraction, and tearing out their hair at authors' apparent inability to > follow simple instructions on how to make that easier. > > (And just by the way: yes, publishers are in it for the money, ... > monopoly rents..., yadda yadda, ... but I've never actually _caught_ one > eating babies). > > > Anything we do must be better than LaTeX in terms of ease-of-use. > > Really? What, exactly? > > Word (and analogues)? Sure, you can get metadata from WP files, but it > takes a lot of heuristic effort, and requires authors to be pretty > disciplined about using styles. > > GUI XML editors? I was talking to someone a couple of weeks ago who'd > just completed a whole PhD detailing exactly how rubbish XML editors are in > practical usability terms. > > nxml-mode in Emacs? Probably the best option for writing pointy-brackets, > but still a bit painful for authoring extensive text. And you can't write > MathML. > > > Publishers will make more money because their customers which include > researchers & universities, will be able to discover, access and re-use > data liberated from the 20th Century PDF. > > That's why the publishers currently care about metadata. > > ---- > > PDFs are surprisingly flexible and open containers for transporting around > Stuff (I haven't tried it, but I have little doubt you could bundle HTML, > CSS and all the RDF you wanted into a PDF, should you somehow manage to > devise a use-case for that). The hard-ish bit is using that metadata in a > visibly useful way -- tools tend not to rely on it, because it tends not to > be there; and it tends not to be there because users don't demand it; and > users don't demand it because tools don't display it. The seriously hard > bit is getting the metadata from the authors (who, to a first > approximation, _really_, *really* don't care) into the PDF. > > All the best, > > Norman > > > -- > Norman Gray : http://nxg.me.uk > SUPA School of Physics and Astronomy, University of Glasgow, UK > > >
Received on Saturday, 4 October 2014 13:46:54 UTC