Re: scientific publishing process (was Re: Cost and access) from Breslin, John on 2014-10-05 (semantic-web@w3.org from October 2014)

From: Breslin, John <john.breslin@nuigalway.ie>
Date: Sun, 5 Oct 2014 15:27:49 +0000
To: Ivan Herman <ivan@w3.org>
CC: Daniel Schwabe <dschwabe@inf.puc-rio.br>, W3C Semantic Web IG <semantic-web@w3.org>, W3C LOD Mailing List <public-lod@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Bernadette Hyland <bhyland@3roundstones.com>
Message-ID: <400523D1-C262-4747-867B-E03C37C638E5@nuigalway.ie>
+1

John
http://Bresl.in

> On 5 Oct 2014, at 15:39, "Ivan Herman" <ivan@w3.org> wrote:
> 
> This is not a direct answer to Daniel, but rather expanding on what he said. Actually, he and I were (and still are) in the same IW3C2 committee, ie, we share the experience; and I was one of those (although the credit really goes to Bob Hopgood, actually, who was pushing that the most) who tried to come up with a proper XHTML template.
> 
> The real problem is still the missing tooling. Authors, even if technically savy like this community, want to do what they set up to do: write their papers as quickly as possible. They do not want to spend their time going through some esoteric CSS massaging, for example. Let us face it: we are not yet there. The tools for authoring are still very poor. This in spite of the fact that many realize that PDF is really not the format for our age; we need much more than a reproduction of a printed page digitally (as someone referred to in the thread I really suffer when I have to read, let alone review, an article in PDF on my iPad...).
> 
> But I do see an evolution that might change in the coming years. Laura dropped the magic word on the early phases if this thread: ePub. ePub is a packaged (zip archived) HTML site, with some additional information. It is the format that most of the ebook readers understand (hey, it can even be converted into a Kindle format:-). Both Firefox and Chrome have ePub reader extensions available and Mac OS comes with a free ebook reader (iBook) that is based on it. I expect (hope) that the convergence between ePub and browsers will bring these even closer in the coming years. Because ePub is a packaged web site, with the core content in HTML5 (or SVG), metadata can be added to the content in RDFa, microdata, embedded JSON-LD; in fact, metadata can also be added to the archive as a separate file so if you are crazy enough you can even add RDF data in RDF/XML (no, please, don't do it:-). And, of course, it can be as much as a hypertext as you can just master:-)
> 
> Tooling? No, not yet:-( Well, not yet for lambda users. But there, too, there is an evolution. The fact is that publishers are working on "XML first" (or "HTML first") workflows. O'Reilly's Atlas tool[1] means that authors prepare their documents in, essentially, HTML (well, a restricted profile thereof), and the output is then produced in EPUB, PDF, or pure HTML at the end. Companies are created that do similar things and where small(er) publishers can develop full projects (Metrodigi, Inkling, Hachette, ...; but I do not think it is possible to use these for a big conference, although, who knows?). Importantly to this community, these tools also include annotation facilities, akin to MS Word's commenting tools.
> 
> Where does it take us _now_? Much against my instinct and with a bleeding heart I have to accept that conferences of the size of WWW, but even ISWC or ESWC, cannot reasonably ask their submitters to submit in ePub (or HTML). Yet. Not today. It is a chicken and egg problem, and change may come only with events, as well as more progressive scholarly publishers, experimenting with this. Just like Daniel (and Bernadette) I would love to see that happening for smaller workshops (if budget allows, I could imagine a workshop teaming up with, say, Metrodigi to produce the workshop's proceedings). But I am optimistic that the change will happen within a foreseeable time and our community (as any scholarly community, I believe) will have to prepare itself for a change in this area. 
> 
> Adding my 2¢ to Daniel's:-)
> 
> Ivan
> 
> P.S. For LaTeX users: I guess the main advantage of LaTeX is the math part. And this is the saddest story of all: MathML has been around for a long time, and it is, actually, part of ePUB as well, but authoring proper mathematics is the toughest with the tools out there. Sigh...
> 
> P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that space...
> 
> 
> [1] https://atlas.oreilly.com
> [2] http://metrodigi.com
> [3] https://www.inkling.com
> 
> 
> 
>> On 04 Oct 2014, at 04:14 , Daniel Schwabe <dschwabe@inf.puc-rio.br> wrote:
>> 
>> As is often the case on the Internet, this discussion gives me a terrible sense of dejá vu. We've had this discussion many times before.
>> Some years back the IW3C2 (the steering committee for the WWW conference series, of which I am part) first tried to require HTML for the WWW conference paper submissions, then was forced to make it optional because authors simply refused to write in HTML, and eventually dropped it because NO ONE (ok, very very few hardy souls) actually sent in HTML submissions.
>> Our conclusion at the time was that the tools simply were not there, and it was too much of a PITA for people to produce HTML instead of using the text editors they are used to. Things don't seem to have changed much since.
>> And this is simply looking at formatting the pages, never mind the whole issue of actually producing hypertext (ie., turning the article's text into linked hypertext), beyond the easily automated ones (e.g., links to authors, references to papers, etc..). Producing good hypertext, and consuming it, is much harder than writing plain text. And most authors are not trained in producing this kind of content. Making this actually "semantic" in some sense is still, in my view, a research topic, not a routine reality.
>> Until we have robust tools that make it as easy for authors to write papers with the advantages afforded by PDF, without its shortcomings, I do not see this changing.
>> I would love to see experiments (e.g., certain workshops) to try it out before making this a requirement for whole conferences.
>> Bernadette's suggestions are a good step in this direction, although I suspect it is going to be harder than it looks (again, I'd love to be proven wrong ;-)).
>> Just my personal 2c
>> Daniel
>> 
>> 
>>> On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider <pfpschneider@gmail.com> wrote:
>>> 
>>> In my opinion PDF is currently the clear winner over HTML in both the ability to produce readable documents and the ability to display readable documents in the way that the author wants them to display.  In the past I have tried various means to produce good-looking HTML and I've always gone back to a setup that produces PDF.  If a document is available in both HTML and PDF I almost always choose to view it in PDF.  This is the case even though I have particular preferences in how I view documents.
>>> 
>>> If someone wants to change the format of conference submissions, then they are going to have to cater to the preferences of authors, like me, and reviewers, like me.  If someone wants to change the format of conference papers, then they are going to have to cater to the preferences of authors, like me, attendees, like me, and readers, like me.
>>> 
>>> I'm all for *better* methods for preparing, submitting, reviewing, and publishing conference (and journal) papers.  So go ahead, create one.  But just saying that HTML is better than PDF in some dimension, even if it were true, doesn't mean that HTML is better than PDF for this purpose.
>>> 
>>> So I would say that the semantic web community is saying that there are better formats and tools for creating, reviewing, and publishing scientific papers than HTML and tools that create and view HTML.  If there weren't these better ways then an HTML-based solution might be tenable, but why use a worse solution when a better one is available?
>>> 
>>> peter
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> On 10/03/2014 08:02 AM, Phillip Lord wrote:
>>>> [...]
>>>> 
>>>> As it stands, the only statement that the semantic web community are
>>>> making is that web formats are too poor for scientific usage.
>>> [...]
>>>> 
>>>> Phil
>> 
>> Daniel Schwabe                      Dept. de Informatica, PUC-Rio
>> Tel:+55-21-3527 1500 r. 4356        R. M. de S. Vicente, 225
>> Fax: +55-21-3527 1530               Rio de Janeiro, RJ 22453-900, Brasil
>> http://www.inf.puc-rio.br/~dschwabe
> 
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
> 
>
Received on Sunday, 5 October 2014 15:28:21 UTC