Z39.50 and the World Wide Web

Eliot Christian (echristi@usgs.gov)
Tue, 06 Feb 1996 16:11:19 -0500


Message-Id: <2.2.32.19960206211119.00706348@isdmnl.wr.usgs.gov>
Date: Tue, 06 Feb 1996 16:11:19 -0500
To: gils@cni.org
From: Eliot Christian <echristi@usgs.gov>
Subject: Z39.50 and the World Wide Web

In the context of encouraging information providers to support Z39.50 in
addition to HTTP, I am often asked to address the question:
 
   How does Z39.50 fit into and improve what's on the World Wide Web today?

I am especialy interested in making clear the advantages of Z39.50 support
from the perspective of commercial information services. 

I'd appreciate any thoughts you might have on the following rough draft.
Also, please do feel free to pass this on to other people who you think may
have some thoughts on this matter. 

Since this may have been forwarded to you, please send your response
directly to me <echristi@usgs.gov>. 

Thanks!

-----------------------------------------------------------

What is the proposal for fitting Z39.50 into the Web?

Most activity in the World Wide Web today is centered on Web browsers
gaining access to information resources on servers through the Hypertext
Transfer Protocol (HTTP). Just as the same resource is often made available
at the server through multiple protocols such as HTTP, gopher and FTP, this
proposal is to make the resource searchable at the server end by adding
support for the Z39.50 protocol. (More ubiquitous Z39.50 client software for
agents and end users, as through Java or other mechanisms, is addressed
separately.)

In essence, Z39.50 provides a common computer-to-computer search protocol
between diverse information resources and diverse information access
mechanisms. A range of software to implement Z39.50 in this way is
available, from freeware to various commercial offerings worldwide.

Because Z39.50 does not dictate the way information is managed at the server
end, providers can support various data and information management
approaches yet make all the information commonly searchable. Because Z39.50
does not dictate how information is presented at the client end, intelligent
software agents are enabled and user interfaces can be customized (in
hardware, software, language, sophistication, graphical design, etc.) for
each particular market.

In developing a new collection of information for a particular market, a
provider can search the contents of other resources via Z39.50 and create
pointers to just those portions most relevant to their specific market. If
the provider also adds z39.50 support onto the new collection, the resource
gains exposure to seekers of information outside of the targeted audience.

How does Z39.50 improve the Web?

1. Different players have a common problem

Content Seekers sometimes want to include many disparate sources of
information in their searching--not just Web pages, not just the resources
of one provider, not just things in the English language, and not just
snippets of ASCII text. Better search mechanisms are desperately needed due
to the sheer size and diversity of information that people would like to
take into account. The Internet has huge amounts of content itself and
increasingly acts as a pointer mechanism to the vast information stores of
off-line media. However, just as in libraries centuries ago, the Internet
has incredible diversity of content but lacks basic agreements on how to tag
information objects so they can be found.

Content Owners want their products to be found by all potentially interested
seekers. Today, the only recourse is to somehow acquire advertising space
from all of the intermediaries (e.g., "I'll pay you to point to my page from
your page").

Intermediaries must support non-exclusive distribution arrangements and are
finding new roles as brokers connecting particular groups of seekers to the
best sources for their needs.

Research and development efforts in advanced information discovery need a
common protocol for interoperability to deploy next generation solutions.

2. The client-server model is crucial for progress

Server-based searching is inherently limited. If searching is done at the
server, the server designer must package the search for the particular
target audience (e.g., what information is included, what language(s) does
the user know, is the search simplistic or robust). Particular servers can
only be comprehensive for their narrowly defined target audience, because
they only provide a "packaged view" of the content. So, to reach seekers
outside of the narrow-cast, the content must be exposed to unanticipated
searching. 

Intelligent software agents will become increasingly important acting as
gatherers of information tailored to very specific interests. Designers of
software agents, such as Web crawlers, are frustrated by presentation
protocols because the agent has no human driver to interpret the wide
variations among packaged information. Consequently, Web crawlers can only
deal with bits and pieces of Internet content that happen to be in text
form. And, Web crawlers cannot handle content behind interface programs
(e.g., CGI scripts, Java applets, database access or search forms, etc.)
Lacking distributed search mechanisms, the crawler is also constrained to
find only those pages that happen to have a unbroken trail of links back to
the starting points.

Support of a search protocol with client software allows for next generation
software agents. These intelligent agents will characterize the content of
information sources and perform distributed searches for those who need
periodic updating of volatile information.

3. Z39.50 is the strategic choice for client-server search.

Z39.50 is already adopted widely to provide access to important classes of
information, including: existing bibliographic catalogs for libraries,
museums, and archives worldwide; government information at the national
level in several countries and increasingly at the state and other
government levels; environmental information at all levels in the U.S. and
internationally; all kinds of geo-referenced (map) data and information.

Hundreds of resources representing information valued in the tens of
billions of dollars are already freely accessible through Z39.50--more is
available on a fee basis. There are also hundreds of Z39.50 WAIS databases
available, and thousands more WAIS databases are maintained behind HTTP
servers. (Unfortunately, most Web browsers are not enabled to handle the
WAIS Z39.50 protocol directly as search clients.)

Increasingly important to address global markets, Z39.50 incorporates the
agreed international standards to address multi-language support. Z39.50 can
also be expected to provide a path toward the handling of information search
at the semantic level, to finally fulfill the goal of finding data and
information based on what its content actually means rather that just the
text in which it is represented.

The Z39.50 protocol has also demonstrated extensibility to support search
based on generalized pattern-matching techniques. These techniques will be
increasingly important for finding abstract information such as chemical
configurations, gene sequences, fingerprints, faces, video imagery, and
numeric trend data.

The Z39.50 protocol is implemented on OSI networks as well as TCP/IP, and
its implementation is defined through the Abstract Syntax Notation to
enhance interoperability. As a binary protocol exchanging data structures
rather than merely passing commands, Z39.50 is relatively more secure than
other Internet protocols.

In addition to free software for Z39.50 servers, there are freeware and
commercial implementations of gateways to resources such as X.500 and SQL
databases, as well as to HTTP.

The Z39.50 standard is extensive in specifying how optional features can be
implemented, though it also allows for quite simplistic implementations. By
requiring a subset of features in specific implementation contexts, the
Z39.50 Profiles greatly improve interoperability and simplify server
implementation. Clients can be optimized for access to Z39.50 servers
supporting a specific profile yet still enjoy basic search capability on all
other Z39.50 servers.

Though already quite sophisticated, the base Z39.50 standard and focused
profiles are evolving ever greater power through an effective international
standards process with full involvement of dozens of major corporate
implementors, tied to ISO and IETF, and connected with very active research
at dozens of major universities and programs of national governments worldwide.

-----------------------------------------------------------



Eliot Christian, US Geological Survey, 802 National Center, Reston VA 22092
echristi@usgs.gov  Office 703-648-7245  FAX 703-648-7069  Home 703-476-6134