- From: Alexander Garcia Castro <alexgarciac@gmail.com>
- Date: Mon, 6 Oct 2014 10:39:50 -0700
- To: Martynas Jusevičius <martynas@graphity.org>
- Cc: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>, Phillip Lord <phillip.lord@newcastle.ac.uk>, Luca Matteis <lmatteis@gmail.com>, Ivan Herman <ivan@w3.org>, Daniel Schwabe <dschwabe@inf.puc-rio.br>, W3C Semantic Web IG <semantic-web@w3.org>, W3C LOD Mailing List <public-lod@w3.org>, "Eric Prud'hommeaux" <eric@w3.org>, Bernadette Hyland <bhyland@3roundstones.com>
- Message-ID: <CALAe=OKg810wSNH9Xwvw_Lh1XF01H-OD=zRmORN79GsEs0bKmA@mail.gmail.com>
I would be much more generic here, show me how to query a bunch of PDFs with anything... of course, the answer will go like "you can extract the text and do A and the B and then get a relatively decent text depending on A B and C". then someone else will chime in and say "and this is just because people dont know how to generate PDFs, if one generates a PDF using ADOBE tools like A B and C then the PDF will be perfect for text mining and bla bla bla.... PDF is ok for a consistent layout, HTML is great for what it was created. but neither of those formats, AFAIK were conceived, engineered for scientific papers, executable, self describing, embedded within the web of data, etc. On Mon, Oct 6, 2014 at 9:19 AM, Martynas Jusevičius <martynas@graphity.org> wrote: > Dear Peter, > > please show me how to query PDFs with SPARQL. Then I'll believe there > are no benefits of XHTML+RDFa over PDF. > > Addressing the issue from the reviewer perspective only is too narrow, > don't you think? > > > Martynas > > On Mon, Oct 6, 2014 at 6:08 PM, Peter F. Patel-Schneider > <pfpschneider@gmail.com> wrote: > > > > > > On 10/06/2014 08:38 AM, Phillip Lord wrote: > >> > >> "Peter F. Patel-Schneider" <pfpschneider@gmail.com> writes: > >>> > >>> I would be totally astonished if using htlatex as the main way to > produce > >>> conference papers were as simple as this. > >>> > >>> I just tried htlatex on my ISWC paper, and the result was, to put it > >>> mildly, > >>> horrible. (One of my AAAI papers was about the same, the other one > >>> caused an > >>> undefined control sequence and only produced one page of output.) > >>> Several > >>> parts of the paper were rendered in fixed-width fonts. There was no > >>> attempt > >>> to limit line length. Footnotes were in separate files. > >> > >> > >> > >> The footnote thing is pretty strange, I have to agree. Although > >> "footnotes" are a fairly alien concept wrt to the web. Probably hover > >> overs would be a reasonable presentation for this. > >> > >> > >>> Many non-scalable images were included, even for simple math. > >> > >> > >> It does MathML I think, which is then rendered client side. Or you could > >> drop math-mode straight through and render client side with mathjax. > > > > > > Well, somehow png files are being produced for some math, which is a > > failure. I don't know what the way to do this right would be, I just > know > > that the version of htlatex for Fedora 20 fails to reasonably handle the > > math in this paper. > > > >>> My carefully designed layout for examples was modified in ways that > >>> made the examples harder to understand. > >> > >> > >> Perhaps this is a key difference between us. I don't care about the > >> layout, and want someone to do it for me; it's one of the reasons I use > >> latex as well. > > > > > > There are many cases where line breaks and indentation are important for > > understanding. Getting this sort of presentation right in latex is a > pain > > for starters, but when it has been done, having the htlatex toolchain > mess > > it up is a failure. > > > >>> That said, the result was better than I expected. If someone upgrades > >>> htlatex > >>> to work well I'm quite willing to use it, but I expect that a lot of > work > >>> is > >>> going to be needed. > >> > >> > >> Which gets us back to the chicken and egg situation. I would probably do > >> this; but, at the moment, ESWC and ISWC won't let me submit it. So, I'll > >> end up with the PDF output anyway. > > > > > > Well, I'm with ESWC and ISWC here. The review process should be > designed to > > make reviewing easy for reviewers. Until viewing HTML output is as > > trouble-free as viewing PDF output, then PDF should be the required > format. > > > >> This is why it is important that web conferences allow HTML, which is > >> where the argument started. If you want something that prints just > >> right, PDF is the thing for you. If you you want to read your papers in > >> the bath, likewise, PDF is the thing for you. And that's fine by me (so > >> long as you don't mind me reading your papers in the bath!). But it > >> needs to not be the only option. > > > > > > Why? What are the benefits of HTML reviewing, right now? What are the > > benefits of HTML publishing, right now? If there were HTML-based tools > that > > worked well for preparing, reviewing, and reading scientific papers, then > > maybe conferences would use them. However, conference organizers and > > reviewers have limited time, and are thus going for the simplest solution > > that works well. > > > > If some group thinks that a good HTML-based solution is possible, then > let > > them produce this solution. If the group can get pre-approval of some > > conference, then more power to them. However, I'm not going to vote for > any > > pre-approval of some future solution when the current situation is > > satisficing. > > > >> Phil > > > > > > peter > > > > > > -- Alexander Garcia http://www.alexandergarcia.name/ http://www.usefilm.com/photographer/75943.html http://www.linkedin.com/in/alexgarciac
Received on Monday, 6 October 2014 17:40:38 UTC