Re: Final CFP: In-Use Track ISWC 2013 from Sebastian Hellmann on 2013-05-02 (public-lod@w3.org from May 2013)

From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
Date: Thu, 02 May 2013 22:30:04 +0200
To: Sarven Capadisli <info@csarven.ca>
CC: public-lod@w3.org
Message-ID: <5182CCCC.5000204@informatik.uni-leipzig.de>
Hi Sarven,
PDF has several big advantages:
- easy to produce by latex, because of good editor
- I can be sure of how it looks like in 99% of the PDF viewers
- there aren't any incentives for me to switch (personal benefits seem 
marginal)

Let's be honest: HTML is not really perfect and it doesn't have all the 
advantages you would like it to have.  As you might know, HTML 5 now 
tries to fix a lot of practical problems, i.e. browser compatibility, a 
thing PDF does not have.

Also: *both* PDF as well as HTML can not be scraped well and they also 
can not be addressed well.

Please look at Sören, Jens and my citation page:
http://www.informatik.uni-leipzig.de/~auer/index.php?n=Main.Publications
http://jens-lehmann.org/publications
http://bis.informatik.uni-leipzig.de/SebastianHellmann#h520-8

Mine is not up to date and I would rather invest more time in updating 
the content, than layout or machine readable information. So they are 
pretty much the same as references in PDF.

Links pointing into HTML are terribly under-developed as well. There are 
only anchors and xpointer/xpath[1]. The second one is not implemented by 
browsers like Firefox.
Please note that xpointer/xpointer is not a finished standard[2].

I think, the advantages of HTML are over-rated at the moment. It is 
getting better, but still a long way to go.
Actually, I tried using HTML already, when sending out call for papers. 
First as attachment [4], but these were removed at some mailing lists. 
Then I tested to write the call in HTML directly, but the layout was 
terrible. So now, I am back to Markdown [5], because I seem to suck at 
producing well layouted HTML .

I really would like to focus on content and have the rest handled by 
machines. My job title is "researcher" not "layouter" . Markdown, Latex, 
PDF seem to get the job done.

Also being a chair means, that you write several hundred emails, 
micro-manage peer-reviewing, publish call for papers, make a schedule, 
etc....  I am quite happy, when everybody hands in decent latex (an not 
.doc ) + a signed license agreement. There is just no time for more.

So the real problem in my opinion is, that we are really not there yet, 
technologically as well as research-wise.
HTML copy and paste only seems to work 2/3 of times due to boundary 
problems, recently I copied google doc content (also HTML) into 
Wordpress TinyMCE and it looked terrible.
This discussion is going in circles because HTML fans  are over-eager 
and fail to judge HTML realisticly.  I think, we should try to provide 
content in structured format and then research ways to transform them 
effectively. This seemed to be the idea behind XML + XSLT  as well as 
HTML + CSS, maybe we can take it one step further....

@Sarven: If you are so interested in this, why don't you dig down 
systematically and try to find the current problems and barriers. This 
is actually a great research project in my opinion.

all the best,
Sebastian
PS: By the way, content is findable fine in any format with a little 
help from our friend [3]


[1] http://www.w3.org/TR/xpath20/
[2] http://www.w3.org/TR/xptr-xpointer/
[3] http://lmgtfy.com/?q=Linked-Data+Aware+URI+Schemes+for+Referencing+Text
[4] http://lists.w3.org/Archives/Public/public-lod/2012Nov/0001.html
[5] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0456.html

Am 02.05.2013 19:38, schrieb Sarven Capadisli:
> On 05/02/2013 06:55 PM, Norman Gray wrote:
>> I'm now thoroughly confused by this conversation.
>
> Allow me to summarize: "Linked Science is brought to you by PDF" [1]
>
>> Talking about LaTeX...
>>
>> On 2013 May 2, at 17:02, phillip.lord@newcastle.ac.uk (Phillip Lord)
>> wrote:
>>
>>> Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> writes:
>>>
>>>> Plus it is widely used and quite good for PDF typesetting.
>>>
>>> And sucks on the web, which is a shame. If I could get good HTML
>>> out of it, I would be a happy man.
>>
>> _What_ sucks on the web?  Certainly not PDF.
>
> HTML/Web, PDF/Desktop?
>
>> There are hassles with PDFs, yes.  In particular, (i) embedding
>> metadata is underdeveloped (XMP is undertooled), and (ii)
>> deep-linking into PDFs could be better, as has been discussed. HTML
>> is naturally better at both of these, but neither is a real problem.
>> (i) between DOIs and metadata from journal webpages, most of the
>> important stuff is available without major difficulty, and various
>> organisations (eg ORCID) are labouring away at making a very messy
>> problem better.  (ii) would be nice to solve (and perhaps Utopiadocs
>> is the way to do it), but doesn't, as far as I can see, offer major
>> advantages beyond 'See sect. xxx'.  Most text is, after all, consumed
>> by humans, and articles tend not to be tens of pages long.
>>
>> Thus HTML can do some unimportant things better than PDF,
>
> Web pages. It will never take off.
>
> but what it
>> can't do, which _is_ important, is make things readable.  The visual
>> appearance -- that is, the typesetting -- of rendered HTML is almost
>> universally bad, from the point of view of reading extended pieces.
>> I haven't (I admit) yet experimented with reading extended text on a
>> tablet, but I'd be surprised if that made a major difference.
>
> I think you are conflating the job of HTML with CSS. Also, I think you 
> are conflating readability with legibility as far as the typesetting 
> goes. Again, that's something CSS handles provided that suitable fonts 
> are in use. What you are probably viewing on an average webpage is the 
> common "works on most machines" fonts e.g., Arial. I don't know 
> whether the PDF reader for instance does magic behind the scenes to 
> smooth things out or crisp things up - whatever additional 
> instructions it may have. Needless to say, this is the job of the 
> reader AFICT. If you put the effort into CSS, it might just give 
> something pretty.
>
> I'll also admit that I have not experimented with the exact 
> differences in quality.
>
>> Also, HTML is not the same as linked data; there's no 'dog food' here
>> for us to eat.
>
> That's quite a generalization there? So, I would argue that "HTML" is 
> more about eating dogfood in the Linked Data mailing list than 
> parading on PDF. We are trying to build things one step at a time; 
> HTML today, a URI that it can sit on tomorrow. Additional 
> machine-friendly stuff the day after.
>
> So, if conferences want to promote PDF, perhaps they should jump over 
> to public-lod-pdf-print-industry-and-friends mailing list? :)
>
>> Is it possible that folk here are conflating 'LaTeX' with the quite
>> startlingly ugly ACM style?  That's almost as unreadable as HTML.
>
> Nothing to do with HTML unless you are thinking of loading the default 
> browser styles and using that as the measure for readability.
>
> [1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html
>
> -Sarven
>


-- 
Dipl. Inf. Sebastian Hellmann
Department of Computer Science, University of Leipzig
Events: NLP & DBpedia 2013 (http://nlp-dbpedia2013.blogs.aksw.org, 
Deadline: *July 8th*)
Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf
Projects: http://nlp2rdf.org , http://linguistics.okfn.org , 
http://dbpedia.org/Wiktionary , http://dbpedia.org
Homepage: http://bis.informatik.uni-leipzig.de/SebastianHellmann
Research Group: http://aksw.org
Received on Thursday, 2 May 2013 20:30:38 UTC