Web Service Discovery - Version 2

3.5 Web Service Discovery - Version 2

Editorial note
Based on Web Services Architectural Roles

The overall process of engaging a Web service was outlined in the Introduction, and included the following steps: (1) the requester and provider entities "become known to each other"; (2) the requester and provider entities agree on the service description and semantics that will govern the interaction between the requester and provider agents; (3) the service description and semantics are embodied in the requester and provider agents; and (4) the requester and provider agents exchange messages. This section expands on Step 1.

If the requester entity does not already know what provider agent it wishes to engage, then the requester entity may need to "discover" a suitable candidate. Discovery is "the act of locating a machine-processable description of a Web service that may have been previously unknown and that meets certain functional criteria." [WSAGLOSS] (If the requester and provider entities are already known to each other, then there is no need for discovery per se -- they merely need to agree on the service description and semantics. In that case, you can think of discovery as either being a null step or a step that took place before the start of the described process.)

A discovery service is a service that facilitates the process of performing discovery. It is a logical role, and could be performed by either the requester agent, the provider agent or a third party agent.

3.5.1 The Discovery Process

Figure 9 ("Discovery Process") expands on Figure 1 to describe the process of engaging a Web service when a discovery service is used.

[Figure 9: Discovery Process]

Service engagement using a discovery service proceeds in roughly the following steps.

The requester and provider entities "become known to each other":

The requester and provider agree on the semantics ("Sem" in Figure 9) of the desired interaction. Although this may commonly be achieved by the provider entity defining the semantics and offering them on a take-it-or-leave-it basis to the requester entity, it could be achieved in other ways. For example, both parties may adopt certain standard service semantics that are defined by some industry standards body. Or in some circumstances the requester could define the semantics. The important point is that the parties must agree on the semantics, regardless of how that is achieved.

Editorial note: dbooth
We need to fix an inconsistency in the document about our use of the word "semantics". Sometimes we are referring to the semantics themselves, whereas other times we are referring to a document that describes the semantics.

Step 2 also requires that the parties agree on the service description that is to be used. However, since the requester entity obtained the Web service description in Step 1.3, in effect the requester and provider entities have already done so.

Editorial note: dbooth
We need to fix a mismatch in granularity between the term "service description" used here, and the term as used in the concepts and relationships section (2.3.2.6). That section assumes that the "service description" includes the semantics; this section (and WSDL) doesn't.

The service description and semantics are input to, or embodied in, both the requester agent and the provider agent, as appropriate.
The requester agent and provider agent exchange SOAP messages on behalf of their owners.

3.5.2 Manual Versus Autonomous Discovery

The discovery process described above is not specific about who or what within the requester entity actually performs the discovery. Under manual discovery, a requester human uses a discovery service to locate and select a service description that meets the desired functional and other criteria. Under autonomous discovery, the requester agent performs this task. Although the steps are similar in either case, the constraints and needs are significantly different, such as:

Interface requirements. The requirements for something that is intended for human interaction are very different from the requirements for something that is intended for machine interaction.
Need for standardization. There is far less need to standardize an interface or protocol that humans use than those that machines are intended to use.
Trust. People do not necessarily trust machines to make decisions that may have significant consequences. This is explained more fully below.

3.5.3 Trust and Discovery

Suppose a requester entity discovers a Web service that was previously unknown. Should the requester trust that service? If the use of that service requires the requester to divulge sensitive information (such as credit card numbers) to the service then there may be significant risk involved.

This decision -- whether or not to trust a particular service -- inherently arises when a requester entity chooses a previously unknown service. This leads to an important difference between manual discovery and autonomous discovery.

When manual discovery is used, a human makes the judgement (perhaps using other, independently obtained information) of whether to trust and engage a previously unknown service that is discovered. Whereas with autonomous discovery, a machine makes this decision. Since people are often skeptical of allowing machines to make significant judgement decisions, agents performing autonomous discovery are often limited to using private discovery services that list only those services that have been pre-screened and deemed trustworthy by the requester entity. This limited form of autonomous discovery would be more precisely called autonomous selection, since the available candidates are already known.

3.5.4 Discovery Service: Registry or Index?

At present, there are two leading viewpoints on how a discovery service should be conceived: as a registry or as an index. What are the differences? For what purpose is one better than the other?

A registry is an authoritative, centrally controlled store of information.

Publishing a service description requires an active step by the provider entity: it must explicitly place the information into the registry before that information is available to others.
The registry owner decides who can place information into the registry. Although the registry owner may delegate permission to approved provider entities that wish to publish their own service descriptions, an arbitrary third party could not publish a description of someone else's service. This means, for example, that company X would not be able to register a functional description of company Y's service, even if that description would be valuable to others and may be superior in some ways to X's own description.
The registry owner decides what information is placed in the registry. Others cannot independently augment that information.
UDDI is an example of the registry approach.

In comparison, an index is a compilation or guide to information that exists elsewhere. It is neither authorative nor centrally controlled.

Publishing is passive: The provider entity exposes the service and functional descriptions on the Web, and those who are interested come and find them.
Anyone can create their own index. When descriptions are exposed, they can be harvested using spiders and arranged into an index. Multiple organizations may have such indexes.
The information contained in an index could be out of date. However, since the index contains pointers to the authoritative information, the information can be verified before use.
An index could include third-party information.
Different indexes could provided different kinds of information -- some richer, some sparser.
Free-market forces determine which index people will use to discover the information that they seek.
Google is an example of the index approach.

It is important to note that the key difference between the registry approach and the index approach is not merely the difference between a registry itself and an index in isolation. Indeed, UDDI could be used as a means to implement an individual index: just spider the Web, and put the results into a UDDI registry. Rather, the key difference is one of control: In the marketplace of discovery ideas, who controls what and how those service descriptions get discovered? In the registry model, it is the owner of the registry who controls this. In the index model, it is the market.