Re: scientific publishing process (was Re: Cost and access) from Hugh Glaser on 2014-10-05 (public-lod@w3.org from October 2014)

From: Hugh Glaser <hugh@glasers.org>
Date: Sun, 5 Oct 2014 18:17:35 +0100
To: Ivan Herman <ivan@w3.org>
Cc: Laura Dawson <Laura.Dawson@bowker.com>, Daniel Schwabe <dschwabe@inf.puc-rio.br>, W3C Semantic Web IG <semantic-web@w3.org>, W3C LOD Mailing List <public-lod@w3.org>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Eric Prud'hommeaux <eric@w3.org>, "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Bernadette Hyland <bhyland@3roundstones.com>
Message-Id: <D3642989-0D9F-4AB0-A374-E0543723E83D@glasers.org>
Hi Ivan,
> On 5 Oct 2014, at 16:42, Ivan Herman <ivan@w3.org> wrote:
> 
> 
> On 05 Oct 2014, at 16:47 , Laura Dawson <Laura.Dawson@bowker.com> wrote:
> 
>> I think I mentioned previously, Ivan, but perhaps not on this thread -
>> Hugh McGuire has developed a Wordpress tool called PressBooks which allows
>> you to write a book in HTML and export it as an EPUB file. He even
>> supports schema.org markup in a separate plugin.
>> (http://www.pressbooks.com)
> 
> Indeed, I forgot!
> 
> The problem with this service (but also for the others I guess) is that, at least through the standard offers on the sites), they may not be appropriate for a workshop, that would require leaving access to a large(r) numbers of submitters in the submission phase, followed by a selection process to end up in a small number of the submissions in the final book. This does not really fit in the business models. It should be up to the scholarly publishers to pick this up…
Yes, we must keep remembering that the documents are simply one bit of a social machine, long before they get anywhere near (the unlikely event of them) being published.
> 
> (But I guess we digress greatly from the main topic of this mailing list, ie, semantic web…)
We did that quite a while ago, I think :-)
But in the end you just gotta go with the flow, man.

Best
Hugh
> 
> Ivan
> 
>> 
>> On 10/5/14, 10:34 AM, "Ivan Herman" <ivan@w3.org> wrote:
>> 
>>> This is not a direct answer to Daniel, but rather expanding on what he
>>> said. Actually, he and I were (and still are) in the same IW3C2
>>> committee, ie, we share the experience; and I was one of those (although
>>> the credit really goes to Bob Hopgood, actually, who was pushing that the
>>> most) who tried to come up with a proper XHTML template.
>>> 
>>> The real problem is still the missing tooling. Authors, even if
>>> technically savy like this community, want to do what they set up to do:
>>> write their papers as quickly as possible. They do not want to spend
>>> their time going through some esoteric CSS massaging, for example. Let us
>>> face it: we are not yet there. The tools for authoring are still very
>>> poor. This in spite of the fact that many realize that PDF is really not
>>> the format for our age; we need much more than a reproduction of a
>>> printed page digitally (as someone referred to in the thread I really
>>> suffer when I have to read, let alone review, an article in PDF on my
>>> iPad...).
>>> 
>>> But I do see an evolution that might change in the coming years. Laura
>>> dropped the magic word on the early phases if this thread: ePub. ePub is
>>> a packaged (zip archived) HTML site, with some additional information. It
>>> is the format that most of the ebook readers understand (hey, it can even
>>> be converted into a Kindle format:-). Both Firefox and Chrome have ePub
>>> reader extensions available and Mac OS comes with a free ebook reader
>>> (iBook) that is based on it. I expect (hope) that the convergence between
>>> ePub and browsers will bring these even closer in the coming years.
>>> Because ePub is a packaged web site, with the core content in HTML5 (or
>>> SVG), metadata can be added to the content in RDFa, microdata, embedded
>>> JSON-LD; in fact, metadata can also be added to the archive as a separate
>>> file so if you are crazy enough you can even add RDF data in RDF/XML (no,
>>> please, don't do it:-). And, of course, it can be as much as a hypertext
>>> as you can just master:-)
>>> 
>>> Tooling? No, not yet:-( Well, not yet for lambda users. But there, too,
>>> there is an evolution. The fact is that publishers are working on "XML
>>> first" (or "HTML first") workflows. O'Reilly's Atlas tool[1] means that
>>> authors prepare their documents in, essentially, HTML (well, a restricted
>>> profile thereof), and the output is then produced in EPUB, PDF, or pure
>>> HTML at the end. Companies are created that do similar things and where
>>> small(er) publishers can develop full projects (Metrodigi, Inkling,
>>> Hachette, ...; but I do not think it is possible to use these for a big
>>> conference, although, who knows?). Importantly to this community, these
>>> tools also include annotation facilities, akin to MS Word's commenting
>>> tools.
>>> 
>>> Where does it take us _now_? Much against my instinct and with a bleeding
>>> heart I have to accept that conferences of the size of WWW, but even ISWC
>>> or ESWC, cannot reasonably ask their submitters to submit in ePub (or
>>> HTML). Yet. Not today. It is a chicken and egg problem, and change may
>>> come only with events, as well as more progressive scholarly publishers,
>>> experimenting with this. Just like Daniel (and Bernadette) I would love
>>> to see that happening for smaller workshops (if budget allows, I could
>>> imagine a workshop teaming up with, say, Metrodigi to produce the
>>> workshop's proceedings). But I am optimistic that the change will happen
>>> within a foreseeable time and our community (as any scholarly community,
>>> I believe) will have to prepare itself for a change in this area.
>>> 
>>> Adding my 2¢ to Daniel's:-)
>>> 
>>> Ivan
>>> 
>>> P.S. For LaTeX users: I guess the main advantage of LaTeX is the math
>>> part. And this is the saddest story of all: MathML has been around for a
>>> long time, and it is, actually, part of ePUB as well, but authoring
>>> proper mathematics is the toughest with the tools out there. Sigh...
>>> 
>>> P.S.2 B.t.w., W3C has just started work on Web Annotations. Watch that
>>> space...
>>> 
>>> 
>>> [1] https://atlas.oreilly.com
>>> [2] http://metrodigi.com
>>> [3] https://www.inkling.com
>>> 
>>> 
>>> 
>>> On 04 Oct 2014, at 04:14 , Daniel Schwabe <dschwabe@inf.puc-rio.br> wrote:
>>> 
>>>> As is often the case on the Internet, this discussion gives me a
>>>> terrible sense of dejá vu. We've had this discussion many times before.
>>>> Some years back the IW3C2 (the steering committee for the WWW
>>>> conference series, of which I am part) first tried to require HTML for
>>>> the WWW conference paper submissions, then was forced to make it
>>>> optional because authors simply refused to write in HTML, and eventually
>>>> dropped it because NO ONE (ok, very very few hardy souls) actually sent
>>>> in HTML submissions.
>>>> Our conclusion at the time was that the tools simply were not there,
>>>> and it was too much of a PITA for people to produce HTML instead of
>>>> using the text editors they are used to. Things don't seem to have
>>>> changed much since.
>>>> And this is simply looking at formatting the pages, never mind the
>>>> whole issue of actually producing hypertext (ie., turning the article's
>>>> text into linked hypertext), beyond the easily automated ones (e.g.,
>>>> links to authors, references to papers, etc..). Producing good
>>>> hypertext, and consuming it, is much harder than writing plain text. And
>>>> most authors are not trained in producing this kind of content. Making
>>>> this actually "semantic" in some sense is still, in my view, a research
>>>> topic, not a routine reality.
>>>> Until we have robust tools that make it as easy for authors to write
>>>> papers with the advantages afforded by PDF, without its shortcomings, I
>>>> do not see this changing.
>>>> I would love to see experiments (e.g., certain workshops) to try it out
>>>> before making this a requirement for whole conferences.
>>>> Bernadette's suggestions are a good step in this direction, although I
>>>> suspect it is going to be harder than it looks (again, I'd love to be
>>>> proven wrong ;-)).
>>>> Just my personal 2c
>>>> Daniel
>>>> 
>>>> 
>>>> On Oct 3, 2014, at 12:50  - 03/10/14, Peter F. Patel-Schneider
>>>> <pfpschneider@gmail.com> wrote:
>>>> 
>>>>> In my opinion PDF is currently the clear winner over HTML in both the
>>>>> ability to produce readable documents and the ability to display
>>>>> readable documents in the way that the author wants them to display.
>>>>> In the past I have tried various means to produce good-looking HTML and
>>>>> I've always gone back to a setup that produces PDF.  If a document is
>>>>> available in both HTML and PDF I almost always choose to view it in
>>>>> PDF.  This is the case even though I have particular preferences in how
>>>>> I view documents.
>>>>> 
>>>>> If someone wants to change the format of conference submissions, then
>>>>> they are going to have to cater to the preferences of authors, like me,
>>>>> and reviewers, like me.  If someone wants to change the format of
>>>>> conference papers, then they are going to have to cater to the
>>>>> preferences of authors, like me, attendees, like me, and readers, like
>>>>> me.
>>>>> 
>>>>> I'm all for *better* methods for preparing, submitting, reviewing, and
>>>>> publishing conference (and journal) papers.  So go ahead, create one.
>>>>> But just saying that HTML is better than PDF in some dimension, even if
>>>>> it were true, doesn't mean that HTML is better than PDF for this
>>>>> purpose.
>>>>> 
>>>>> So I would say that the semantic web community is saying that there
>>>>> are better formats and tools for creating, reviewing, and publishing
>>>>> scientific papers than HTML and tools that create and view HTML.  If
>>>>> there weren't these better ways then an HTML-based solution might be
>>>>> tenable, but why use a worse solution when a better one is available?
>>>>> 
>>>>> peter
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On 10/03/2014 08:02 AM, Phillip Lord wrote:
>>>>> [...]
>>>>>> 
>>>>>> As it stands, the only statement that the semantic web community are
>>>>>> making is that web formats are too poor for scientific usage.
>>>>> [...]
>>>>>> 
>>>>>> Phil
>>>>>> 
>>>> 
>>>> Daniel Schwabe                      Dept. de Informatica, PUC-Rio
>>>> Tel:+55-21-3527 1500 r. 4356        R. M. de S. Vicente, 225
>>>> Fax: +55-21-3527 1530               Rio de Janeiro, RJ 22453-900, Brasil
>>>> http://www.inf.puc-rio.br/~dschwabe
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C 
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
> 
> 
> 
> 
> 

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Received on Sunday, 5 October 2014 17:18:08 UTC