- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Thu, 02 May 2013 22:30:04 +0200
- To: Sarven Capadisli <info@csarven.ca>
- CC: public-lod@w3.org
Hi Sarven, PDF has several big advantages: - easy to produce by latex, because of good editor - I can be sure of how it looks like in 99% of the PDF viewers - there aren't any incentives for me to switch (personal benefits seem marginal) Let's be honest: HTML is not really perfect and it doesn't have all the advantages you would like it to have. As you might know, HTML 5 now tries to fix a lot of practical problems, i.e. browser compatibility, a thing PDF does not have. Also: *both* PDF as well as HTML can not be scraped well and they also can not be addressed well. Please look at Sören, Jens and my citation page: http://www.informatik.uni-leipzig.de/~auer/index.php?n=Main.Publications http://jens-lehmann.org/publications http://bis.informatik.uni-leipzig.de/SebastianHellmann#h520-8 Mine is not up to date and I would rather invest more time in updating the content, than layout or machine readable information. So they are pretty much the same as references in PDF. Links pointing into HTML are terribly under-developed as well. There are only anchors and xpointer/xpath[1]. The second one is not implemented by browsers like Firefox. Please note that xpointer/xpointer is not a finished standard[2]. I think, the advantages of HTML are over-rated at the moment. It is getting better, but still a long way to go. Actually, I tried using HTML already, when sending out call for papers. First as attachment [4], but these were removed at some mailing lists. Then I tested to write the call in HTML directly, but the layout was terrible. So now, I am back to Markdown [5], because I seem to suck at producing well layouted HTML . I really would like to focus on content and have the rest handled by machines. My job title is "researcher" not "layouter" . Markdown, Latex, PDF seem to get the job done. Also being a chair means, that you write several hundred emails, micro-manage peer-reviewing, publish call for papers, make a schedule, etc.... I am quite happy, when everybody hands in decent latex (an not .doc ) + a signed license agreement. There is just no time for more. So the real problem in my opinion is, that we are really not there yet, technologically as well as research-wise. HTML copy and paste only seems to work 2/3 of times due to boundary problems, recently I copied google doc content (also HTML) into Wordpress TinyMCE and it looked terrible. This discussion is going in circles because HTML fans are over-eager and fail to judge HTML realisticly. I think, we should try to provide content in structured format and then research ways to transform them effectively. This seemed to be the idea behind XML + XSLT as well as HTML + CSS, maybe we can take it one step further.... @Sarven: If you are so interested in this, why don't you dig down systematically and try to find the current problems and barriers. This is actually a great research project in my opinion. all the best, Sebastian PS: By the way, content is findable fine in any format with a little help from our friend [3] [1] http://www.w3.org/TR/xpath20/ [2] http://www.w3.org/TR/xptr-xpointer/ [3] http://lmgtfy.com/?q=Linked-Data+Aware+URI+Schemes+for+Referencing+Text [4] http://lists.w3.org/Archives/Public/public-lod/2012Nov/0001.html [5] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0456.html Am 02.05.2013 19:38, schrieb Sarven Capadisli: > On 05/02/2013 06:55 PM, Norman Gray wrote: >> I'm now thoroughly confused by this conversation. > > Allow me to summarize: "Linked Science is brought to you by PDF" [1] > >> Talking about LaTeX... >> >> On 2013 May 2, at 17:02, phillip.lord@newcastle.ac.uk (Phillip Lord) >> wrote: >> >>> Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> writes: >>> >>>> Plus it is widely used and quite good for PDF typesetting. >>> >>> And sucks on the web, which is a shame. If I could get good HTML >>> out of it, I would be a happy man. >> >> _What_ sucks on the web? Certainly not PDF. > > HTML/Web, PDF/Desktop? > >> There are hassles with PDFs, yes. In particular, (i) embedding >> metadata is underdeveloped (XMP is undertooled), and (ii) >> deep-linking into PDFs could be better, as has been discussed. HTML >> is naturally better at both of these, but neither is a real problem. >> (i) between DOIs and metadata from journal webpages, most of the >> important stuff is available without major difficulty, and various >> organisations (eg ORCID) are labouring away at making a very messy >> problem better. (ii) would be nice to solve (and perhaps Utopiadocs >> is the way to do it), but doesn't, as far as I can see, offer major >> advantages beyond 'See sect. xxx'. Most text is, after all, consumed >> by humans, and articles tend not to be tens of pages long. >> >> Thus HTML can do some unimportant things better than PDF, > > Web pages. It will never take off. > > but what it >> can't do, which _is_ important, is make things readable. The visual >> appearance -- that is, the typesetting -- of rendered HTML is almost >> universally bad, from the point of view of reading extended pieces. >> I haven't (I admit) yet experimented with reading extended text on a >> tablet, but I'd be surprised if that made a major difference. > > I think you are conflating the job of HTML with CSS. Also, I think you > are conflating readability with legibility as far as the typesetting > goes. Again, that's something CSS handles provided that suitable fonts > are in use. What you are probably viewing on an average webpage is the > common "works on most machines" fonts e.g., Arial. I don't know > whether the PDF reader for instance does magic behind the scenes to > smooth things out or crisp things up - whatever additional > instructions it may have. Needless to say, this is the job of the > reader AFICT. If you put the effort into CSS, it might just give > something pretty. > > I'll also admit that I have not experimented with the exact > differences in quality. > >> Also, HTML is not the same as linked data; there's no 'dog food' here >> for us to eat. > > That's quite a generalization there? So, I would argue that "HTML" is > more about eating dogfood in the Linked Data mailing list than > parading on PDF. We are trying to build things one step at a time; > HTML today, a URI that it can sit on tomorrow. Additional > machine-friendly stuff the day after. > > So, if conferences want to promote PDF, perhaps they should jump over > to public-lod-pdf-print-industry-and-friends mailing list? :) > >> Is it possible that folk here are conflating 'LaTeX' with the quite >> startlingly ugly ACM style? That's almost as unreadable as HTML. > > Nothing to do with HTML unless you are thinking of loading the default > browser styles and using that as the measure for readability. > > [1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html > > -Sarven > -- Dipl. Inf. Sebastian Hellmann Department of Computer Science, University of Leipzig Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, Deadline: *July 8th*) Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://nlp2rdf.org , http://linguistics.okfn.org , http://dbpedia.org/Wiktionary , http://dbpedia.org Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann Research Group: http://aksw.org
Received on Thursday, 2 May 2013 20:30:38 UTC