- From: Mail Delivery Subsystem <MAILER-DAEMON@ansa.co.uk>
- Date: Thu, 22 Jun 95 12:20:08 +0100
- To: <msm@ansa.co.uk>
----- Transcript of session follows ----- <<< RCPT To:<rdaniel%acl.lanl.govuri@bunyip> <<< DATA 550 bunyip.tcp... 550 Host unknown 550 <rdaniel%acl.lanl.govuri@bunyip>... Host unknown ----- Unsent message follows ----- Received: by yscydion.ansa.co.uk; Thu, 22 Jun 95 12:20:08 +0100 Return-Path: <msm> Received: by euclid.ansa.co.uk; Thu, 22 Jun 95 12:20:07 +0100 From: Mark Madsen <msm> Message-Id: <9506221120.AA01944@euclid.ansa.co.uk> Subject: Re: A Note on URC Querying To: rdaniel%acl.lanl.govuri@bunyip Date: Thu, 22 Jun 95 12:20:07 BST Cc: msm In-Reply-To: <9506201041.ZM2373@idaknow.acl.lanl.gov>; from "Ronald E. Daniel" at Jun 20, 95 10:41 am Mailer: Elm [revision: 70.85] Ron Daniel commented: > Thus spoke Mark Madsen (at least on Fri, 16 Jun 95 at 16:14:32 BST): > > > The basic scenario in searching through metadata is to establish the > > context in which the search is to be conducted, and then search for > > appropriate metadata by using the template appropriate to that > > context, filling in the components that one requires, specifying the > > criteria according to which a match is to be considered satisfactory, > > and pulling the big red handle labelled "GO". > > > > The search context is represented in Ron's model by the attribute set > > of the metadata (and this defines the template). It is assumed that > > if the attribute set is not available, then the attribute set id > > (which is part of the URC) can be resolved to produce the attribute > > set. > > Let me expand on this a bit, and then offer a simplification. Mark > says that the search context is represented by the Attribute set ID (AID). > I believe he refers to the fact that AIDs can be used to restrict the > types of resources sought. For example, if I am looking for satellite > images, I don't need to look at resources described using an attribute set > developed for poetry. This is exactly right: search_context <-> attribute_set_id, and search_template <-> attribute_set. Satellite images and poetry can reasonably be expected to have distinguishable attribute sets (and ids), while poetry about satellite images and satellite images of poetry are expected to have attribute sets that correspond to derived classes of those for poetry and sattelite images respectively. > Specifying the attribute set defines all the things that I can ask for. > However, the submitted template will typically only contain a fraction > of the allowed elements. For example, I might issue a query using the > default attribute set, and only fill in the Title or URN elements. Indeed, and this behaviur captures all our normal prejudices about searching: the more specific your description of what you want, the less you will generally recieve in return. > If the client has prepared the template URC > according to some attribute set, the server may not need to resolve > the AID and fetch the AS definition before processing the query. The > element names in the query template are probably good enough for the > server to go on, and any unknown elements get a matching score of 0. This is one option among many, but probably represents the simplest heuristic in the absence of resolution to the complete attribute set. This behaviour may make a sensible default for most cases, but should not be defined as a requirement. > > (NOTE: this suggests that there is a possibly useful > > modification that can be made to the SGML tags of the URC model, which > > is to include the attribute name in each tag. This should not be seen > > as the trusted way to verify attribute sets, but can be used as a > > shortcut that avoids the need for the DNS - or whatever - lookup of > > attribute sets.) > > Perhaps you meant "include the attribute set NAME in each tag", ala > <foo aid="bar"> > <baz aid="zog"> > etc. ? Yes, that is what I meant, the s-e-t bit got lost in the typing ("there's many a slip 'twixt brainstem and fingertip" :-). > This was something that was discussed shortly after the San Jose IETF. > Initially I thought this was the thing to do, but Terry Allen convinced > me that this doesn't add any capability and introduces complexity. > The complexity seems clear enough. The fact that it offers no extra > capability comes from the fact that it is easy to declare a new attribute > set that does just the mixing and matching that you might want. The point is that you can then avoid the need resolve all segments of the attribute set in general: you only need to resolve derived attribute sets back to the level where you have a cache of higher-level attribute sets. It's also in line with the principle that an object should be auto-aware. > > Consider the following example object (based on one constructed by Ron > > Daniel in the course of a discussion on the searching problem): > > > > <!doctype urc SYSTEM "urn:x-dns:uri.ansa.co.uk:method-dtd-7"> > > <urc> > > <author method="m1">Smith, F.</author> > > <subject method="m2">Cats</subject> > > <URL></URL> > > <results> > > (Initially empty, this container holds the results of the > > searches.) > > </results> > > <methods> > > <m1 lang="niceScript"> > > (A script written in the niceScript(TM) language that > > takes an argument which is the contents of the author > > field, splits on the comma, looks for an exact match > > of the last name and a first name beginning with what > > is left after the split. Returns NULL if there is no > > match, and a list of attribute:value pairs if there is. > > The attributes in this list lie in the intersection of > > those listed in the template and the Attribute-set of > > the candidate URC and the values are taken from the > > candidate URC.) > > </m1> > > <m2 lang="nastyScript"> > > (A script that takes its argument from this template URC, > > compares it with the subject fields in the candidate URC, > > and returns TRUE if any of the subject lines contain the > > string "Cats", FALSE otherwise.) > > </m2> > > </methods> > > <interface> > > <main lang="pseudocode"> > > bind input to name GLOB > > if (invoke(SELF.m2(GLOB)) > > insert(SELF.results, SELF.m1(GLOB)) > > </main> > > </interface> > > </urc> > > A question for the real SGML weenies out on the list - can we use > Processing Instructions to convey this information on how matching > should be done? Should we use them for that purpose? To be more specific, > the URC spec could define certain parameter entities such as &exact; and > &contains; that all resolvers would have to support. The resolvers > could define the implementation instructions for those operations > using processing instructions. This does not, however, adress the need > for being able to supply novel processing commands to a resolver. Being able to supply novel processing capabilities is exactly what extensibility requires, especially in the world of objects. The issue of how the search server provides the necessary support should also be left open, as I implied in the example. I would also appreciate hearing how much of my example could be reworked into SGML Processing Instructions. However, I would be averse to tying the querying/searching model tightly to the capabilities of SGML only. > The example above had a URC with a <results> element. > I am considering changing the default attribute set DTD to allow a URC > to contain other URCs. Doing a C2C search would return a URC that > contained all the URCs that met the search criteria, thus eliminating > the need for the <results> element in the example above. For example: > > <!doctype urc SYSTEM "urn:x-dns:uri.iana.org:default-attribs"> > <urc> > <urc> > <urn>foo</urn> > <author>Smith, Fred</author> > <title>Cats in Egypt</title> > <instance> > <url>bar</url> > <form scheme="IMT">text/html</form> > </instance> > </urc> > <urc> > <urn>baz</urn> > <author>Nugent, Ted</author> > <author>Smith, Fred</author> > <title>Cat Scratch Fever: The Novel</title> > <subject scheme="KidCode">21.language.sex</subject> > <instance> > <url>zog</url> > <form scheme="IMT">application/postscript</form> > <access> > <read scheme="AgeCertifiers">21 or older</read> > </access> > </instance> > </urc> > </urc> > > Does anyone have particular thoughts on that? There's the obvious problem that you can fall down an infinite well because you resolve the toplevel URC, and get another URC inside it. You resolve this one, and get another URC containing the URN of the first. But just because you know that someone is going to shoot themselves in the foot with it doesn't mean that it's not a good idea :-) In fact, I think that the spec as it stands already allows this. To avoid it, you would have to explicitly disallow it. BTW, is that 21 in human years or cat years? :-) > > Now conducting a search for books on Cats by F. Smith is easy. The > > above URC is passed to the search server, which binds it to the local > > temporary name SD984, say, lines up all the candidate URCs that URC > > SD984 is authorised to see, and invokes the main interface, SD984.main > > (this is a matter of convention, and could be built into the syntax > > more smoothly), by passing it a legal URC. > > Mark's example provides this main routine, which I have been assuming would > be implicit. Anyone care to argue for or against either approach? I think > leaving it implicit makes it easy to do simple stuff, at the cost of not > being able to do more complex stuff later. The problem of entry to an object is an old one. Either you provide a starting instruction inside the object which is excuted when the object receives the "you are being invoked" message, or you allow it to have multiple interfaces to present to the outside and these interfaces each have their own invocation hook inside. I mainly made it explicit in the example so as to provoke exactly that question. > > Now, the detailed syntax is up for a lot of improvement, but this > > scenario captures the basic searching powers that I think should be > > associated with URCs, and it does it in a way that is arbitrarily > > extensible to future searching methods. > > Right. Glad you agree, since that is the most important point in the original message. > > It is important to notice that there will certainly be a library of > > basic search objects that can be used to build searches. It is > > expected that very few users will construct searching code themselves. > > I think users will typically see a form with Author, Title, Subject fields, > and it is up to the browser or script to build the query. Typically one has a menu of standard searches with the fill-in form dependent on the search type chosen, and the ability to customise any search already constructed. The database/HCI people have ths end of things well tied down already. The tricky bit is providing adequate support for what the users will ultimately expect. > The notion of a library of basic search objects is what I was alluding to > earlier with my mumblings about processing instructions and parameter > entities. > > Using the attribute set definition opens up the possibility of constructing > the form on the fly, with the elements from the ASdef. HotJava, here we > come! Exactly so, except for Java, which also appears to lack much of the support that will be needed for truly mobile and asynchronous agents. > > None of this addresses directly the need for a URC query language as > > outlined in the spec. However, I think it shows what role that > > language will play. It will provide base primitives on which the > > scripts can draw. Clearly it will have to provide a basic chunk of > > SQLish functionality like >, <, AND, OR, NOT, and a few more. > > I think it is a good first stab at how the query languge might work, > although there are a lot of details to work out. Thank you for your reply, your comments, and your assessment. I agree that there are a lot of details to work out, and I posted the earlier sketch to the URI list in the hope of feedback that would enable these details to be worked out fully. Regards, Mark. -- ________________________________________________________________________ Mark Madsen: <msm@ansa.co.uk> <URL:http://www.ansa.co.uk/Staff/msm.html> Information Services Framework, The ANSA Project, APM Ltd., Castle Park, Cambridge CB3 0RD, U.K. <URL:http://www.ansa.co.uk/>; <apm@ansa.co.uk> Voice: +44-1223-568934; Reception: +44-1223-515010; Fax: +44-1223-359779
Received on Thursday, 22 June 1995 09:44:01 UTC