Re: A Note on URC Querying

Thus spoke Mark Madsen (at least on Fri, 16 Jun 95 at 16:14:32 BST):


> The basic scenario in searching through metadata is to establish the
> context in which the search is to be conducted, and then search for
> appropriate metadata by using the template appropriate to that
> context, filling in the components that one requires, specifying the
> criteria according to which a match is to be considered satisfactory,
> and pulling the big red handle labelled "GO".
> 
> The search context is represented in Ron's model by the attribute set
> of the metadata (and this defines the template). It is assumed that
> if the attribute set is not available, then the attribute set id
> (which is part of the URC) can be resolved to produce the attribute
> set. 

Let me expand on this a bit, and then offer a simplification. Mark
says that the search context is represented by the Attribute set ID (AID).
I believe he refers to the fact that AIDs can be used to restrict the
types of resources sought. For example, if I am looking for satellite
images, I don't need to look at resources described using an attribute set
developed for poetry.

Specifying the attribute set defines all the things that I can ask for.
However, the submitted template will typically only contain a fraction
of the allowed elements. For example, I might issue a query using the
default attribute set, and only fill in the Title or URN elements.

If the client has prepared the template URC
according to some attribute set, the server may not need to resolve
the AID and fetch the AS definition before processing the query. The
element names in the query template are probably good enough for the
server to go on, and any unknown elements get a matching score of 0.


> (NOTE: this suggests that there is a possibly useful
> modification that can be made to the SGML tags of the URC model, which
> is to include the attribute name in each tag. This should not be seen
> as the trusted way to verify attribute sets, but can be used as a
> shortcut that avoids the need for the DNS - or whatever - lookup of
> attribute sets.)

Perhaps you meant "include the attribute set NAME in each tag", ala
<foo aid="bar">
<baz aid="zog">
etc. ?

This was something that was discussed shortly after the San Jose IETF.
Initially I thought this was the thing to do, but Terry Allen convinced
me that this doesn't add any capability and introduces complexity.
The complexity seems clear enough. The fact that it offers no extra
capability comes from the fact that it is easy to declare a new attribute
set that does just the mixing and matching that you might want.


> Consider the following example object (based on one constructed by Ron
> Daniel in the course of a discussion on the searching problem):
>
> <!doctype urc SYSTEM "urn:x-dns:uri.ansa.co.uk:method-dtd-7">
> <urc>
> <author method="m1">Smith, F.</author>
> <subject method="m2">Cats</subject>
> <URL></URL>
> <results>
> (Initially empty, this container holds the results of the
> searches.)
> </results>
> <methods>
> <m1 lang="niceScript">
> (A script written in the niceScript(TM) language that
> takes an argument which is the contents of the author
> field, splits on the comma, looks for an exact match
> of the last name and a first name beginning with what
> is left after the split. Returns NULL if there is no
> match, and a list of attribute:value pairs if there is.
> The attributes in this list lie in the intersection of
> those listed in the template and the Attribute-set of
> the candidate URC and the values are taken from the
> candidate URC.)
> </m1>
> <m2 lang="nastyScript">
> (A script that takes its argument from this template URC,
> compares it with the subject fields in the candidate URC,
> and returns TRUE if any of the subject lines contain the
> string "Cats", FALSE otherwise.)
> </m2>
> </methods>
> <interface>
> <main lang="pseudocode">
> bind input to name GLOB
> if (invoke(SELF.m2(GLOB))
> insert(SELF.results, SELF.m1(GLOB))
> </main>
> </interface>
> </urc>


A question for the real SGML weenies out on the list - can we use
Processing Instructions to convey this information on how matching
should be done? Should we use them for that purpose? To be more specific,
the URC spec could define certain parameter entities such as &exact; and
&contains; that all resolvers would have to support. The resolvers
could define the implementation instructions for those operations
using processing instructions. This does not, however, adress the need
for being able to supply novel processing commands to a resolver.

The example above had a URC with a <results> element.
I am considering changing the default attribute set DTD to allow a URC
to contain other URCs. Doing a C2C search would return a URC that
contained all the URCs that met the search criteria, thus eliminating
the need for the <results> element in the example above. For example:

<!doctype urc SYSTEM "urn:x-dns:uri.iana.org:default-attribs">
<urc>
 <urc>
  <urn>foo</urn>
  <author>Smith, Fred</author>
  <title>Cats in Egypt</title>
  <instance>
   <url>bar</url>
   <form scheme="IMT">text/html</form>
  </instance>
 </urc>
 <urc>
  <urn>baz</urn>
  <author>Nugent, Ted</author>
  <author>Smith, Fred</author>
  <title>Cat Scratch Fever: The Novel</title>
  <subject scheme="KidCode">21.language.sex</subject>
  <instance>
   <url>zog</url>
   <form scheme="IMT">application/postscript</form>
   <access>
    <read scheme="AgeCertifiers">21 or older</read>
   </access>
  </instance>
 </urc>
</urc>
 
Does anyone have particular thoughts on that?


> Now conducting a search for books on Cats by F. Smith is easy. The
> above URC is passed to the search server, which binds it to the local
> temporary name SD984, say, lines up all the candidate URCs that URC
> SD984 is authorised to see, and invokes the main interface, SD984.main
> (this is a matter of convention, and could be built into the syntax
> more smoothly), by passing it a legal URC.

Mark's example provides this main routine, which I have been assuming would
be implicit. Anyone care to argue for or against either approach? I think
leaving it implicit makes it easy to do simple stuff, at the cost of not
being able to do more complex stuff later.



> Now, the detailed syntax is up for a lot of improvement, but this
> scenario captures the basic searching powers that I think should be
> associated with URCs, and it does it in a way that is arbitrarily
> extensible to future searching methods.

Right.


> It is important to notice that there will certainly be a library of
> basic search objects that can be used to build searches. It is
> expected that very few users will construct searching code themselves.

I think users will typically see a form with Author, Title, Subject fields,
and it is up to the browser or script to build the query.

The notion of a library of basic search objects is what I was alluding to
earlier with my mumblings about processing instructions and parameter
entities.

Using the attribute set definition opens up the possibility of constructing
the form on the fly, with the elements from the ASdef. HotJava, here we
come!


> None of this addresses directly the need for a URC query language as
> outlined in the spec. However, I think it shows what role that
> language will play. It will provide base primitives on which the
> scripts can draw. Clearly it will have to provide a basic chunk of
> SQLish functionality like >, <, AND, OR, NOT, and a few more.

I think it is a good first stab at how the query languge might work,
although there are a lot of details to work out.


-- 
Ron Daniel Jr.                email: rdaniel@acl.lanl.gov
Advanced Computing Lab        voice: (505) 665-0597
MS B-287  TA-3  Bldg. 2011      fax: (505) 665-4939
Los Alamos National Lab        http://www.acl.lanl.gov/~rdaniel/
Los Alamos, NM,  87545    tautology: "Conformity is very popular"

Received on Tuesday, 20 June 1995 12:41:58 UTC