Re: Final CFP: In-Use Track ISWC 2013

On 05/02/2013 10:30 PM, Sebastian Hellmann wrote:
> Hi Sarven,

Hi Sebastian. So, let me get this right; you are arguing the point that 
PDF is somehow better for accessing and sharing knowledge? Last I 
checked, the Web runs on HTML and friends, not PDF. There is a lot of 
FUD in your reply, I'll to respond to your points.

> PDF has several big advantages:
> - easy to produce by latex, because of good editor
> - I can be sure of how it looks like in 99% of the PDF viewers
> - there aren't any incentives for me to switch (personal benefits seem
> marginal)
>
> Let's be honest: HTML is not really perfect and it doesn't have all the
> advantages you would like it to have.  As you might know, HTML 5 now
> tries to fix a lot of practical problems, i.e. browser compatibility, a
> thing PDF does not have.

Which HTML pages are you having difficulty viewing in your browser?

> Also: *both* PDF as well as HTML can not be scraped well and they also
> can not be addressed well.

That's true to some extent, however, the point to just get things 
rolling with HTML is that we can add RDFa/Microdata/microfomats or your 
own home-baked semantic goodness. It is a foundation that we can work with.

> Please look at Sören, Jens and my citation page:
> http://www.informatik.uni-leipzig.de/~auer/index.php?n=Main.Publications
> http://jens-lehmann.org/publications
> http://bis.informatik.uni-leipzig.de/SebastianHellmann#h520-8
>
> Mine is not up to date and I would rather invest more time in updating
> the content, than layout or machine readable information. So they are
> pretty much the same as references in PDF.

It is your decision what you wish to invest time on. For me, I'm super 
content to write an (X)HTML+RDFa document (for blog post/article type of 
things) pretty much on any text-editor on the face of this earth. Can 
you say the same for the LaTeX text/WYSIWYG editor on the platform that 
you use? Either way, whatever works for you. There is not much to debate 
here about :)

> Links pointing into HTML are terribly under-developed as well. There are
> only anchors and xpointer/xpath[1]. The second one is not implemented by
> browsers like Firefox.
> Please note that xpointer/xpointer is not a finished standard[2].

If I have to xpath an HTML document, I usually try to make sure it is 
valid (or run it through Tidy if I really have to) and simply do it from 
command-line. Why bother with the browser for that? I'm not sure what 
your goals are.

> I think, the advantages of HTML are over-rated at the moment. It is
> getting better, but still a long way to go.
> Actually, I tried using HTML already, when sending out call for papers.
> First as attachment [4], but these were removed at some mailing lists.
> Then I tested to write the call in HTML directly, but the layout was
> terrible. So now, I am back to Markdown [5], because I seem to suck at
> producing well layouted HTML .

I don't think HTML is over-rated. It worked incredibly well on the Web, 
don't you think?

I can only handle Markdown or say Mediawiki markup up to a point. At 
some point, I get frustrated and do it directly from HTML. "Wrappers" 
are inherently limited.

You mean you suck at well-structured HTML, as the layout or presentation 
would be handled by CSS.

> I really would like to focus on content and have the rest handled by
> machines. My job title is "researcher" not "layouter" . Markdown, Latex,
> PDF seem to get the job done.

Understood.

We are also computer scientists working on the Web. You may not be 
comfortable with HTML, but I personally love it in comparison to 
alternatives to represent information for the Web. It really is a 
no-brainer to type <p> as opposed to /paragraph.

> Also being a chair means, that you write several hundred emails,
> micro-manage peer-reviewing, publish call for papers, make a schedule,
> etc....  I am quite happy, when everybody hands in decent latex (an not
> .doc ) + a signed license agreement. There is just no time for more.

Good to know that you are putting the chair's or reviewer's needs ahead 
of the authors or the society in general which might actually benefit 
from all of the funding that gets poured into research.

If there is a consistency shortcoming, we should be open to improving 
that instead of quitting on the problem.

> So the real problem in my opinion is, that we are really not there yet,
> technologically as well as research-wise.
> HTML copy and paste only seems to work 2/3 of times due to boundary
> problems, recently I copied google doc content (also HTML) into
> Wordpress TinyMCE and it looked terrible.

You've copied a "rendered" version of that HTML, not the source HTML, am 
I correct? How do copy/pasting of PDFs look? I thought so.

> This discussion is going in circles because HTML fans  are over-eager
> and fail to judge HTML realisticly.  I think, we should try to provide
> content in structured format and then research ways to transform them
> effectively. This seemed to be the idea behind XML + XSLT  as well as
> HTML + CSS, maybe we can take it one step further....

I'm sorry to say, but the failure that you speak of for judging HTML 
realistically is actually on your end.

As I've mentioned earlier, if we build on top of HTML, we can get higher 
levels of structure and semantics easier. Not to mention that it will 
"plug and play" with the whole Web stack.

> @Sarven: If you are so interested in this, why don't you dig down
> systematically and try to find the current problems and barriers. This
> is actually a great research project in my opinion.

I am keen about it. But, as I've said a million times, I don't see 
technical challenges for us. We also have the ability to make things 
happen if we are honest about making it happen wherever the real 
shortcomings are. The core problem is social. For instance, majority of 
your arguments above has to do with your comfort-zone.

My proposal is quite simple and I still fail to see what is 
mind-boggling for some to simply "welcome" HTML as an alternative. Don't 
worry, you can still use your favourite LaTeX editor and so on. No one 
is trying to take that away.

Does it hurt the conferences to say "hey, we also welcome 
(X)HTML(+RDFa), just make sure that to use the provided lncs.css file"?

ISWC COLD 2013 understood this and followed through. And, I know that 
there will be at least one another workshop making an announcement along 
those lines ;)

> all the best,

:)

> Sebastian
> PS: By the way, content is findable fine in any format with a little
> help from our friend [3]
>
>
> [1] http://www.w3.org/TR/xpath20/
> [2] http://www.w3.org/TR/xptr-xpointer/
> [3] http://lmgtfy.com/?q=Linked-Data+Aware+URI+Schemes+for+Referencing+Text
> [4] http://lists.w3.org/Archives/Public/public-lod/2012Nov/0001.html
> [5] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0456.html


-Sarven

Received on Thursday, 2 May 2013 21:10:18 UTC