WSA Discovery

This message provides some thoughts that may be relevant to the discussion 
of "discovery" in the Web Service Architecture (WSA) document.

In August, MITRE held a Technical Exchange Meeting (TEM) called Web 
Technology Convergence Symposium (WTCS).  A TEM, in general, is one way 
that MITRE coordinates activities within MITRE and with our customers.  The 
WTCS looked at convergence of web services (WS), semantic web (SW), grid 
computing (GC), and agent technology (AG).

Timbl gave a keynote talk [1], and I asked some questions about Discovery 
in the context of his statement [2] that "Discovery should all be SW-based".

Some follow-up discussion on this point is worth repeating.

Timbl did not use the term "registry", but he did use the term "index".

Paraphrasing Timbl, he feels that exposing descriptions on the web is a 
better model than publishing to a registry (like UDDI).  When descriptions 
are exposed, they can be harvested using spiders and "indexed".  Multiple 
organizations may have such indexers, and free-market forces will determine 
which "index" people use to discover what they are looking for.

Note, there is a concept that discovery is done by a query to an 
index.  Although possible, it is not likely that individual web-service 
consumers would spider the web themselves.  I asked Timbl if he felt a 
"standard" API would be required to query the index.  Timbl responded that 
efforts to define a standard query language are in progress [8], suggesting 
that the "API" to query the index would somehow involve this query language.

Personally, I don't see much difference between an "index" and a "registry" 
from the inquirer's point of view.  For example, a spider could harvest 
descriptions from the web and store the results in a private UDDI registry; 
the implementation of the "index" could be UDDI.  The spider could create a 
tModel for every WSDL file it finds; see [3].  The spider could do some 
sort of automatic classification [13] and insert appropriate categories in 
a UDDI categoryBag.

Then there is the issue of metadata.  UDDI tModels (and other structures) 
have categoryBags where information is stored to categorize the entry.  For 
example, the categoryBag for a tModel that refers to a WSDL file could be 
adorned with a keyValue="wsdlSpec" from the uddi-org:types taxonomy.  The 
UDDI inquiry API (SOAP) lets you search for such Tmodels, which is often 
used by web service development tools to help developers locate WSDL files.

Likewise, a spider could locate OWL-S files, and create a tModel for each 
one it finds.  A pattern is emerging where the overviewURL in a UDDI tModel 
points outside of the registry to some document, and the categoryBag of the 
tModel indicates what the link points to.  It indicates this using a 
taxonomy.  Another example of this pattern is a UDDI Technical Note has 
been drafted for ebXML [6].  For example, the tModel overviewURL can point 
to an ebXML CPPA document, and the tModel classified as 
keyValue="ebXML:CPPA" using their taxonomy [6].

UDDI provides a common framework for a wide variety of metadata.  If each 
"index" were to store the results of their spider/harvest in a different 
format, then the query strings used for discovery would be different for 
each index.

So what is the difference between "publishing to a registry" and "exposing 
a description".  In the case of UDDI, it is not a "repository" for the 
storage of and (direct) access to description files like WSDL.  UDDI simply 
points to the URI of a WSDL file (or other description).  UDDI stores a 
limited amount of metadata in the form of a categoryBag.  So with UDDI, you 
need to first expose WSDL, then publish the tModel.  But if the WSDL is 
already exposed, it presumably is available for some other 
index/spider/harvester to find.  (It may not be too easy to find, unless 
you somehow know where to start looking, such as, WSIL, RDDL within a 
homepage, or some variant (like [5]) on RSS Autodiscovery [4]).


I am wondering what the web service architecture should say about this stuff.

I suggest it should have components like "index" as the entity (agent) that 
requesters query to locate or discover web services and/or their 
metadata.  "Registry" may be overloaded and may suggest to some an overly 
centralized architecture.

Given multiple indexes, a "discovery proxy agent (DPA)" would help in 
federation of discovery.  It would need to know how to locate indexes, 
query multiple indexes simultaneously, and consolidate the results.  It may 
need to translate the results into a common format; perhaps this is a 
separate architectural element, a "discovery translation agent (DTA)" or 
discovery translation service (DTS).  The DPA may cache results.

The DPA "has-a" query interface, which could be UDDI, or some other API 
(also described using WSDL).
The query interface "uses" a query language.  This query is likely to be 
something that is not easily expressed in a URI, which implies that the 
query is put into a message (pushed in a request) or file (pulled by the 
query processor in response to being given the URI of the query file).

A lot of work has been done on query technology, and I am not an expert on 
it.  Some links: [9][10]

I like the idea of a DPA because it is similar to (is-a ?) "proxy" in the 
web architecture (or REST architectural style), which is an example of a 
REST "component" [7]

So to summarize the proposed additions to the WSA document:

Revise the Resource Oriented Model [12]

1a.  Add "Index" as a noun (Concept or Feature per 2.2.3)
1b.  A Discovery Service has-an Index
1c.  An Index has-a Query interface
1d.  A Query interface has-a Query Language
1e.  A Query Language is-identified-by a namespace URI
1f.  Add Discovery Proxy Agent (DPA)
1g.  DPA is-an agent
1h.  Add Discovery Translation Agent (DTA) (and/or Discovery Translation 
Service - DTS)
1i.  A DPA queries an Index
1j.  A DPA may use a DTA/DTS to normalize results from queries to multiple 
indexes
1k.  A DPA discovers Indexes
1l.  An Index stores service metadata (which may include links to other 
metadata stored outside the Index)
1m.  A DPA has-a Query interface
1n.  An Index may harvest service metadata
1o.  An Index may provide a Publish interface

2a.  An agent subscribes-to a discovery service
2b.  A discovery service notifies an agent
2c.  An agent has-a notification interface
2d.  A discovery service has-a subscription interface (asynch query interface)


[1] http://www.w3.org/2003/Talks/08-mitre-tbl/Overview.html
[2] http://www.w3.org/2003/Talks/08-mitre-tbl/slide35-0.html
[3] http://www.oasis-open.org/committees/uddi-spec/doc/bps.htm
[4] http://diveintomark.org/archives/2002/05/31/more_on_rss_autodiscovery
[5] http://lists.oasis-open.org/archives/uddi-spec/200305/msg00056.html
[6] 
http://www.oasis-open.org/apps/org/workgroup/uddi-spec/document.php?document_id=3589
[7] 
http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm#sec_5_2_3
[8] http://www.w3.org/TR/xquery/
[9] http://swordfish.rdfweb.org/rdfquery/
[10] http://139.91.183.30:9090/RDF/publications/tr308.pdf
[11] http://lists.oasis-open.org/archives/uddi-spec/200304/msg00021.html
[12] 
http://dev.w3.org/cvsweb/~checkout~/2002/ws/arch/wsa/wd-wsa-arch-review2.html#resource_oriented_model
[13] http://moguntia.ucd.ie/publications/

Paul 

Received on Wednesday, 15 October 2003 17:33:26 UTC