Re: Library 5.0a : Feedback & Questions from Henrik Frystyk Nielsen on 1996-10-20 (www-lib@w3.org from October to December 1996)

From: Henrik Frystyk Nielsen <frystyk@w3.org>
Date: Sun, 20 Oct 1996 13:28:20 -0400
To: Adam Jack <ajack@netcom.com>, www-lib@w3.org
Message-Id: <3.0b28.32.19961020132816.0091bc40@pop.w3.org>
At 08:24 AM 10/20/96 -0600, Adam Jack wrote:

>I don't know if it helps to have feedback from a newbie, but here are some 
>from my experiences with 5.0a on WindowsNt/Workstation(4.0) with VC++4.2.

>Ok -- so I am 'rolling my own' project ('cos I don't know how to import
>your .mak file) and I opted (initially) to avoid the DLL configuration,
>hoping for simplicity. I added all the .c files to the project & set
>_CONSOLE.

There are in fact make files for MSVC++4.0 which I assume will also work
for 4.2. These files are referenced from

	http://www.w3.org/pub/WWW/INSTALL.html

>1) HTWAIS.c failed to compile -- stating that it could not find 
>diagnosticRecord. (I simply removed it from my project since I won't be
>using it.) 

The WAIS gateway relies on the freeWAIS library to be there. I don't know
if that exists on Windows or not. In any case, it is the right thing simply
to take it out of your project. This is described in

	http://www.w3.org/pub/WWW/Library/User/WAIS.html

>2) When I tried to link against the library -- lots of modules failed to
>find _HText. A bit of digging & I found I needed to create this myself,
>and I had to go hack a piece of the Browser code to base changes.

This is only if you want to use the default HTML parser which comes with
libwww. This parser is not very well maintained and has the funny interface
where you have to define a set of HText functions. If you want to use the
libwww SGML/HTML parser then you can set it up by calling HTMLInit. This is
described in more detail in

	http://www.w3.org/pub/WWW/Library/User/Start.html

>Is this my mistake? Did I add too many files to the library? I saw nothing
>in the release notes about this, and I found no stub/seed file. Further,
>nothing has, to my understanding, called wither of the two constructors
>(in what lib usage I have performed.)

By putting it in to one big statis library you do not use the DLL
interfaces at all. Libwww is based on the model that you have the libwww
core in which you register the set of protocols, filters and streams that
your application needs. 

>I like the concept of the HText 'class' -- I would like to have these
'methods'
>called when some HTML is received and decomposed. How do I instigate that?

It it described in

	http://www.w3.org/pub/WWW/Library/User/Start.html

You can also have a look at how the robot and the line mode browser uses
the HText interface.

>3) When I tried to run a simple test program, using the profile :
>
>	HTProfile_newPreemptiveClient
>
>it failed. I traced this to the WSA library not having been initialized. I
>traced that to EventInit() not being called. When I call that prior to
>creating the client profile it works again. I see that other profiles do
>call EventInit(). Is there a reason for this difference from other profiles
>and also from earlier library behavior?

The preemptive version (blocking sockets) do not need an event loop as
libwww in this mode blocks while waiting on IO.

>Questions :
>
>In short, my objective with working with these modules is to have the ability
>to access raw content data and additionally get some metadata about the 
>content.

Then you don't need the HTML parser at all. You can do this by calling
either the preemtive or the non-preemptive client profile.

>1) I have been having some difficulty determining whether I should get 
>setting my own output stream for all data types, or working with those
>streams & structured streams that currently exist. Obviously the reuse in 
>latter would be preferable but I do not know how I can acheive what I want.

There are two ways to set up output streams in libwww:

1) Register them as converters wiht an input format and an output format.
In this case the stream pipe builder picks them up dynamically when
building the stream pipe

2) Set the output stream of a request explicitly. This is the much more
"hardcoded" method as you apriori says: I know what I want to do with the
data coming down this pipe. As this is not always the case, I recommned
number 1) unless the output stream is "format independent".

>For example -- I'd like to have HTML text decomposed & passed to my via
>the HText API. What converters would I need to configure to allow this?

You register a set of converters in a List object. This list object can be
used in two ways:

1) By using the HTFormat API, you assign the list of converters globally to
all requests.

2) By using the Request API, you assign the list of converters locally to
this specific request object.

In both cases you can initialize the list by calling HTMLInit.

>2) I have written my own 'memory stream', that merely spools data into a
>memory block. Obviously I would like to set a maximum data size for this.

You can also use the HTLoadToChunk() which does most of what you want. This
is part of the WWWApp interface.

You can also use the HTPipeBuffer_new which is a FIFO buffer where you can
flush the data at a later point in time. This is part of the WWWStream
interface.

For a complete list of interfaces, hava a look at

	http://www.w3.org/pub/WWW/Library/User/Guide/

>Say, I set up two converters from */* to www/present one a 'Spool2File'
>(based upon, if not, your one) and one the memory stream. Also, say I set the
>memory stream to have a higher 'quality factor' and a fixed maximum 
>number of bytes. 
>
>What would happen if an HTTP response were received without a Content-Length
>header (is this still allowed in HTTP/1.1?) and the returned data went on
>to break the maximum setting?
>
>(I can write my own stream that spools to a memory stream until the limit 
>is reached, and then retracts it & redirects it and all subsequent
>content to a file stream. Obviously I would prefer to use what is there.)

The stream pipe builder does not take into account the number of bytes- the
reason beeing that you may not know before you have read the whole
document. This is for example the case with chunked transfer encoding.
Instead what I would suggest is to have your own little stream which keeps
buffering, for example using the pipe buffer. If then it becomes too biug
then it opens a temp file, flushes the content of the pipe buffer and
continues loading the rest of the document.

Hope this helps,

Henrik

--
Henrik Frystyk Nielsen, <frystyk@w3.org>
World Wide Web Consortium, MIT/LCS NE43-356
545 Technology Square, Cambridge MA 02139, USA
Received on Sunday, 20 October 1996 13:31:27 UTC