W3C home > Mailing lists > Public > public-lod@w3.org > May 2013

Re: Final CFP: In-Use Track ISWC 2013

From: Alexander Garcia Castro <alexgarciac@gmail.com>
Date: Thu, 2 May 2013 19:56:40 +0200
Message-ID: <CALAe=O+6kGHC3C8X87TCHHYKev0tFxmD_MFjB9SEJGLEWi55BQ@mail.gmail.com>
To: Sarven Capadisli <info@csarven.ca>
Cc: Linking Open Data <public-lod@w3.org>
>From some emails in this discussion the one thing that strikes me the
most is just that it seems that now the PDF has all of the sudden
became the format of choice for the realization of linked science. I
am right now struggling with a task as simple as getting citation data
from PDFs. I dont want to say that the PDF is all bad but... come on,
it had a place in the time when desktop was king. now we need to make
effective use of content, the reality is simply that content is locked
up in PDFs. Of course, many people will jump with examples of tools
that allow us to get usable content from the PDF. Well... we have
tried a lot of them, very little usable outcomes from the available
tools. a lot of work is required. UTOPIA, great I quite like it -nice
user experience, useful for every case? not sure, in any case I have
to say I quite like it although is just, IMHO, extends our dependency
on the PDF. I hope the paradigm behind UTOPIA gains traction, but I
definitely hope not to depend on PDF readers just because  I hope that
in the future publishers make my life easier just by releasing the XML
for scientific publications, I am NOT asking for PDF to be eliminated,
I am just asking for usable content with NO extra effort.

On Thu, May 2, 2013 at 7:38 PM, Sarven Capadisli <info@csarven.ca> wrote:
> On 05/02/2013 06:55 PM, Norman Gray wrote:
>> I'm now thoroughly confused by this conversation.
> Allow me to summarize: "Linked Science is brought to you by PDF" [1]
>> Talking about LaTeX...
>> On 2013 May 2, at 17:02, phillip.lord@newcastle.ac.uk (Phillip Lord)
>> wrote:
>>> Sebastian Hellmann <hellmann@informatik.uni-leipzig.de> writes:
>>>> Plus it is widely used and quite good for PDF typesetting.
>>> And sucks on the web, which is a shame. If I could get good HTML
>>> out of it, I would be a happy man.
>> _What_ sucks on the web?  Certainly not PDF.
> HTML/Web, PDF/Desktop?
>> There are hassles with PDFs, yes.  In particular, (i) embedding
>> metadata is underdeveloped (XMP is undertooled), and (ii)
>> deep-linking into PDFs could be better, as has been discussed.  HTML
>> is naturally better at both of these, but neither is a real problem.
>> (i) between DOIs and metadata from journal webpages, most of the
>> important stuff is available without major difficulty, and various
>> organisations (eg ORCID) are labouring away at making a very messy
>> problem better.  (ii) would be nice to solve (and perhaps Utopiadocs
>> is the way to do it), but doesn't, as far as I can see, offer major
>> advantages beyond 'See sect. xxx'.  Most text is, after all, consumed
>> by humans, and articles tend not to be tens of pages long.
>> Thus HTML can do some unimportant things better than PDF,
> Web pages. It will never take off.
> but what it
>> can't do, which _is_ important, is make things readable.  The visual
>> appearance -- that is, the typesetting -- of rendered HTML is almost
>> universally bad, from the point of view of reading extended pieces.
>> I haven't (I admit) yet experimented with reading extended text on a
>> tablet, but I'd be surprised if that made a major difference.
> I think you are conflating the job of HTML with CSS. Also, I think you are
> conflating readability with legibility as far as the typesetting goes.
> Again, that's something CSS handles provided that suitable fonts are in use.
> What you are probably viewing on an average webpage is the common "works on
> most machines" fonts e.g., Arial. I don't know whether the PDF reader for
> instance does magic behind the scenes to smooth things out or crisp things
> up - whatever additional instructions it may have. Needless to say, this is
> the job of the reader AFICT. If you put the effort into CSS, it might just
> give something pretty.
> I'll also admit that I have not experimented with the exact differences in
> quality.
>> Also, HTML is not the same as linked data; there's no 'dog food' here
>> for us to eat.
> That's quite a generalization there? So, I would argue that "HTML" is more
> about eating dogfood in the Linked Data mailing list than parading on PDF.
> We are trying to build things one step at a time; HTML today, a URI that it
> can sit on tomorrow. Additional machine-friendly stuff the day after.
> So, if conferences want to promote PDF, perhaps they should jump over to
> public-lod-pdf-print-industry-and-friends mailing list? :)
>> Is it possible that folk here are conflating 'LaTeX' with the quite
>> startlingly ugly ACM style?  That's almost as unreadable as HTML.
> Nothing to do with HTML unless you are thinking of loading the default
> browser styles and using that as the measure for readability.
> [1] http://lists.w3.org/Archives/Public/public-lod/2013Apr/0291.html
> -Sarven

Alexander Garcia
Received on Thursday, 2 May 2013 17:57:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:21:44 UTC