Re: Html to pdf conversion -- for beach reading from Pat Hayes on 2006-01-28 (public-semweb-lifesci@w3.org from January 2006)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 27 Jan 2006 23:37:58 -0600
To: Adrian Walker <adrianw@snet.net>
Cc: Bob Futrelle <bob.futrelle@gmail.com>, public-semweb-lifesci@w3.org
Message-Id: <p06230902c000a8f8b8ea@[192.168.2.2]>
>Hi Pat and All --
>
>There are actually quite a few html to pdf conversion programs out there.

Thanks, but on a Mac I can do that part natively 
just by printing from my browser. The problem is 
printing out a whole website without going 
through and printing each piece separately, or 
printing each slide of a whole Slydr presentation 
one at a time. For example, what would that 
program make of the OWL semantics spec. 
http://www.w3.org/TR/owl-semantics/, which is an 
HTML 'multidocument' ? The editors of that 
helpfully supplied a single-HTML-file version at 
http://www.w3.org/TR/owl-semantics/semantics-all.html, 
but few websites are this accommodating.

In their original life as hyperlinks between (or 
within) documents, href links still allow a 
document to be printed, even though you can't 
click on the links. But when they are used to in 
effect "bind together" several distinct pieces of 
HTML into a single conceptual document, the 
straightforward relationship between an HTML file 
and a "document" in the traditional sense has 
been violated, and browsers don't seem to be able 
to adequately cope with this situation. They no 
longer print what they are designed to display. 
Perhaps this is inevitable, but there is no doubt 
that the result is very inconvenient, and it 
seems odd to me that browsers havn't yet been 
made smart enough to cope with it.

It occurs to me that RDF/A could be of use here, 
by allowing invisible markup which could indicate 
to a savvy browser which links were to 'parts' of 
the conceptual document and which were 
'external'. But first we would need an RDF 
ontology for document structure, which might be a 
challenge.

Thanks again for being helpful :-)

Pat

>
>Attached is a pdf of the RIF charter  ( 
>http://www.w3.org/2005/rules/wg/charter ) made 
>with a program called "HTML2PDF Pilot".
>
>Download sites are easy to find via Google.
>
>The "demo" version of the program is free.  I 
>have only tried it on the one example above, but 
>it seems straightforward.
>
>Of course, the page breaks may not be the best 
>when the pdf is printed, but the hard copy 
>should be readable.
>
>Now we can all take our work to the beach (:-)
>
>                                     Cheers,  -- Adrian
>
>
>
>Internet Business Logic (R)
>Executable open vocabulary English
>Online at www.reengineeringllc.com
>Shared use is free
>
>Adrian Walker
>Reengineering
>PO Box 1412
>Bristol
>CT 06011-1412 USA
>
>Phone: USA 860 583 9677
>Cell:    USA  860 830 2085
>Fax:    USA  860 314 1029
>
>
>At 02:56 PM 1/27/2006 -0600, you wrote:
>
>>>I didn't just say 'only', I also said, "for many purposes".
>>>
>>>If you know of an automated way to take 30 *separate* web pages and
>>>turn them  into a single PDF, I'd like to know about it.
>>
>>Yes, that is the key point. I had the same 
>>immediate reaction as Xiaoshu, but then I went 
>>and actually tried to print out TimBL's SW2004 
>>slidyshow as a single document. I couldn't find 
>>any way to do it. It is most frustrating: all 
>>the slide contents and formattings are actually 
>>specified there in the HTML source, but if you 
>>attempt to print it (even when using an HTML 
>>editor like GoLive), you only get one of the 
>>slides, at best.
>>
>>So, here is a suggested Principle of Good 
>>Practice for browsers: anything that can be 
>>rendered on the screen by processing a single 
>>source document, should be printable by a 
>>single command. Maybe it will be a mess, 
>>needing some work to organize; but at least it 
>>will be on paper, so you can stuff it all into 
>>your pocket and sort it out later on the beach.
>>
>>Pat Hayes
>>
>>>-  Bob
>>>
>>>On 1/27/06, wangxiao <wangxiao@musc.edu> wrote:
>>>>
>>>>  > The only good document, for many purposes, is one that can be
>>>>  > printed out in a reasonably compact form and then read, with
>>>>  > no computer or web (!) connection, in a coffee shop or on the
>>>>  > beach (some months from now here in the North).  But as I
>>>>  > look for documents explaining the Semantic Web, I keep
>>>>  > finding collections of 20 or 30 web pages each, each page of
>>>>  > which has to be printed separately.  Slidy seems to have the
>>>>  > same problem, and I've inquired separately about that. Most
>>>>  > mags and newspapers offer "printer-friendly" versions of
>>>>  > multi-page docs.
>>>>
>>>>  I would be careful about the "only". :-)
>>>>
>>>>  I think it is made as webs document intensionally.  After all, what W3C
>>>>  wants is to get people used to web. If 
>>>>people wants to get it in some other
>>>>  form, it is not difficult to do so with some software.  For instance, you
>>>>  can easily create a PDF out of the web document and do whatever you are
>>>>  confortable with it later.
>>>>
>>>>  Xiaoshu
>>>>
>>>
>>>
>>>--
>>>Robert P. Futrelle
>>>     Associate Professor
>>>Biological Knowledge Laboratory
>>>College of Computer and Information Science
>>>Northeastern University MS WVH202
>>>360 Huntington Ave.
>>>Boston, MA 02115
>>>
>>>Office: (617)-373-4239
>>>Fax:    (617)-373-5121
>>>http://www.ccs.neu.edu/home/futrelle
>>>http://www.bionlp.org
>>>http://www.diagrams.org
>>>http://biologicalknowledge.com
>>
>>
>>--
>>---------------------------------------------------------------------
>>IHMC            (850)434 8903 or (650)494 3973   home
>>40 South Alcaniz St.    (850)202 4416   office
>>Pensacola                       (850)202 4440   fax
>>FL 32502                        (850)291 0667    cell
>>phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>
>
>
>Attachment converted: betelguese2:W3C_RIF_charter.pdf (PDF /«IC») (00230334)


-- 
---------------------------------------------------------------------
IHMC		(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32502			(850)291 0667    cell
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Saturday, 28 January 2006 05:37:40 UTC