A Note on URC Querying

Greetings,

This is my first posting to the URI list, which I have been reading
since March.  In it I describe a proposal for how the querying process
sketched in "An SGML-based URC service" by Ron Daniel could be fleshed
out.  It could be titled something like

Metadata-Based Searching Using URCs
                 or,
URCs Are Objects Represented in SGML

I have been following the discussion on the URC proposal with
interest, and it seems that there is a lot of strong feeling on two
sides about the restrictions that should be placed on URN resolutions.

However, little of the discussion has taken into account the effects
that different resolutions will have on searching, so I will describe
some (now relatively old) ideas on how metadata can be used to search
for resources, and how the metadata itself can be searched in ways
that are distributed and potentially scalable.  Ron's SGML-based URC
service provides the framework.

The basic scenario in searching through metadata is to establish the
context in which the search is to be conducted, and then search for
appropriate metadata by using the template appropriate to that
context, filling in the components that one requires, specifying the
criteria according to which a match is to be considered satisfactory,
and pulling the big red handle labelled "GO".

The search context is represented in Ron's model by the attribute set
of the metadata (and this defines the template).  It is assumed that
if the attribute set is not available, then the attribute set id
(which is part of the URC) can be resolved to produce the attribute
set.  (NOTE: this suggests that there is a possibly useful
modification that can be made to the SGML tags of the URC model, which
is to include the attribute name in each tag.  This should not be seen
as the trusted way to verify attribute sets, but can be used as a
shortcut that avoids the need for the DNS - or whatever - lookup of
attribute sets.)

Consider the following example object (based on one constructed by Ron
Daniel in the course of a discussion on the searching problem):

<!doctype urc SYSTEM "urn:x-dns:uri.ansa.co.uk:method-dtd-7">
<urc>
  <author method="m1">Smith, F.</author>
  <subject method="m2">Cats</subject>
  <URL></URL>
  <results>
    (Initially empty, this container holds the results of the
    searches.)
  </results>
  <methods>
    <m1 lang="niceScript">
       (A script written in the niceScript(TM) language that
       takes an argument which is the contents of the author
       field, splits on the comma, looks for an exact match
       of the last name and a first name beginning with what
       is left after the split. Returns NULL if there is no
       match, and a list of attribute:value pairs if there is.
       The attributes in this list lie in the intersection of
       those listed in the template and the Attribute-set of
       the candidate URC and the values are taken from the
       candidate URC.)
    </m1>
    <m2 lang="nastyScript">
      (A script that takes its argument from this template URC,
      compares it with the subject fields in the candidate URC,
      and returns TRUE if any of the subject lines contain the
      string "Cats", FALSE otherwise.)
    </m2>
  </methods>
  <interface>
    <main lang="pseudocode">
      bind input to name GLOB
      if (invoke(SELF.m2(GLOB))
        insert(SELF.results, SELF.m1(GLOB))
    </main>
  </interface>
</urc>

(Note that exceptions have been ignored, there is no checking or
authorisation, and the signature of this example is inadequately
specified.)

Now conducting a search for books on Cats by F. Smith is easy.  The
above URC is passed to the search server, which binds it to the local
temporary name SD984, say, lines up all the candidate URCs that URC
SD984 is authorised to see, and invokes the main interface, SD984.main
(this is a matter of convention, and could be built into the syntax
more smoothly), by passing it a legal URC.  The results returned in
the appropriate container may look like

author:Fred J. Smith
subject:Cats and their cute habit of vomiting furballs
URL:http://www.feline.org/Smith_FJ/furballs.html

and the entire object could in principle be returned.  (Yes, I know
it's heavier on bandwidth, but I'm thinking mobility here - see
"Agents for Knowledge Resource Mapping in the World-Wide Web"
<URL:http://www.ansa.co.uk/phase3-doc-root/approved/APM.1473.01.html>
for more details.)

Now, the detailed syntax is up for a lot of improvement, but this
scenario captures the basic searching powers that I think should be
associated with URCs, and it does it in a way that is arbitrarily
extensible to future searching methods.

It is important to notice that there will certainly be a library of
basic search objects that can be used to build searches.  It is
expected that very few users will construct searching code themselves.

Also, I implied above that the languages used were interpreted
scripting languages, since these have useful authentication properties
(Java, Safe-Tcl, for example).  But there is nothing to stop a search
server from running binary code from an authenticated source only.

None of this addresses directly the need for a URC query language as
outlined in the spec.  However, I think it shows what role that
language will play.  It will provide base primitives on which the
scripts can draw.  Clearly it will have to provide a basic chunk of
SQLish functionality like >, <, AND, OR, NOT, and a few more.

Apologies for the length of this message.  Thanks in advance for any
comments.

Mark.

--
________________________________________________________________________
Mark Madsen: <msm@ansa.co.uk> <URL:http://www.ansa.co.uk/Staff/msm.html>
Information Services Framework, The ANSA Project, APM Ltd., Castle Park,
Cambridge CB3 0RD, U.K.  <URL:http://www.ansa.co.uk/>;  <apm@ansa.co.uk>
Voice: +44-1223-568934; Reception: +44-1223-515010; Fax: +44-1223-359779

Received on Friday, 16 June 1995 11:14:03 UTC