Submission: draft-henderson-dasl-scenarios-00.txt

INTERNET-DRAFT                                        Rick Henderson
draft-henderson-dasl-scenarios-00.html       Netscape Communications
September 18, 1998
Expires Mar 23, 1999


Scenarios for DASL
  
  

Status of this Memo
  
  This document is an Internet draft. Internet drafts are working
  documents of the Internet Engineering Task Force (IETF), its areas and
  its working groups. Note that other groups may also distribute working
  information as Internet drafts.
  
  Internet Drafts are draft documents valid for a maximum of six months
  and can be updated, replaced or obsoleted by other documents at any
  time. It is inappropriate to use Internet drafts as reference material
  or to cite them as other than as "work in progress".
  
  To learn the current status of any Internet draft please check the
  "lid-abstracts.txt" listing contained in the Internet drafts shadow
  directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
  munnari.oz.au (Pacific Rim), ds.internic.net (US East coast) or
  ftp.isi.edu (US West coast). Further information about the IETF can be
  found at URL: http://www.ietf.org/.
  
  Distribution of this document is unlimited. Please send comments to
  the mailing list at www-webdav-dasl@w3.org, which may be joined by
  sending a message with subject "subscribe" to
  www-webdav-dasl-request@w3.org.
  
  Discussions of the list are archived at
  http://www.w3.org/pub/WWW/Archives/Public/www-webdav-dasl.

Abstract
  
  The Distributed Authoring and Versioning protocol [WEBDAV] defines
  simple mechanisms to assign and retrieve values for properties. This
  document presents scenarios for a WebDAV extension to support
  efficient searching for resources based on WEBDAV properties and
  content. These scenarios are intended to suggest some of the uses that
  DASL could be put to. This may in turn motivate decisions on what is
  essential to DASL and what may be considered extra.

1. Introduction
  
  The scenarios below are intended to provoke discussion of what DASL
  should and shouldn't do. It is not necessarily true that DASL should
  support all of these or to what extent DASL should support them and to
  what extent DASL is a small piece of what it would take to support
  them. At least one is probably impossible. These scenarios should

Henderson                                                      [Page 1]

Internet Draft              DASL Scenarios               September 1998


  encompass most of the sorts of things that we expect DASL to play a
  part in.

2. Scenarios
  
  The scenarios below are roughly grouped into scenarios dealing with
  the following topics: Document Management, Seeking Information,
  Navigation, and "Search Isn't Always Enough".

2.1 Resource Management
  
  Search could be used to help keep track of what is going on with a set
  of DAV resources. Some DASL queries that might help with this:
     * Find the owners of all locked resources.
     * Search for all the owners of locked resources.
     * Search for resources that have been locked for more than 1 week.
       [Though desirable this is impossible since DAV does not record
       the time when a resource was locked]
     * Search for resources that have not changed in the last year.
  These queries could help find resources that are likely to be
  undergoing changes, who is changing them, what resources have been
  locked for too long, what resources aren't dynamic anymore.

2.2 Seeking Information
  
  

2.2.1 Finding a specific resource using content search
  
  Another user's information need may be like this: "I need that article
  I saw a while back that made a connection between epilepsy, migraines,
  and zinc." They can do a content based search seeking resources with
  all of the words, epilepsy, migraine, and zinc.

2.2.2 Finding a specific resource by phrase
  
  A user remembers a resource that they liked and want to see again but
  doesn't have it book marked or remember the location. They do remember
  a key phrase from the content though. They can search for the phrase
  such as "invisible car", and find the resource without picking through
  a large number of irrelevant resources. Here the phrase search is
  important to use instead of just finding resources with both invisible
  and car since these are common enough words that they will overlap
  much more than the phrase invisible car.

2.2.3 Finding a specific resource by author and date range
  
  A user's information need may be expressed something like this: "I
  need that trip report that John Doe wrote last spring." They don't
  know its location or its title. They can search for resources with
  author equal to "John Doe" and create date greater than 1998/01/01 and
  less than 1998/06/01. This may yield few enough resources to easily

Henderson                                                      [Page 2]

Internet Draft              DASL Scenarios               September 1998


  find the one of interest.

2.2.4 Finding a specific resource using both content and property search
  
  The user who wanted to find the trip report that John Doe wrote last
  spring may find that John Doe was very prolific and wrote several
  hundred things last spring. The user may do better using both content
  and property search. They can search for resources with author equal
  to "John Doe" and create date greater than 1998/01/01 and less than
  1998/06/01 that contain the some of the words IETF, Redmond, and DASL.

2.2.5 Finding resources of a particular kind
  
  DASL could be used to find resources of a particular kind such as
  images. This could be used directly by an end user looking for
  interesting images, or by a program that does some kind of processing
  on the images like select gif images that are portraits. A query that
  asked for mime-type = image/* could gather that data.

2.2.6 Finding resources in a particular language
  
  Assuming that a language attribute is set, then a search could be
  restricted to resources that are in a particular language, say German.
  It would be possible for a site to automatically set this tag using
  language recognition technology.

2.2.7 Searching for information on multiple servers
  
  A user seeking information of some sort may not know what server(s)
  contain the information they are seeking. The DASL client program can
  send the content based query to a several servers without having to
  translate the query into a different query syntax for each server. For
  property queries, the DASL client can query the attribute schema on
  the DASL servers and send a property query or a mixed property and
  content query to a set of DASL servers that have common property
  schema. The results from such a cross server search can be sorted
  according to property values or according to relevance score.

2.2.8 Stemming
  
  If a user is searching for information about the hobby of building
  model cars, resources that are likely to contains various forms of
  those words, model, models, modeling, as well as car and cars.
  Stemming saves them from entering all the various forms of the words
  they may want to match. Stemming is sometimes confused with right
  truncation, but it is quite different. In languages such as English
  one can approximate stemming by right truncation of words, e.g.
  "model*" matches "model", "models", "modeling", "modeler" etc. This
  doesn't work well for shorter words. Car* not only matches car and
  cars, but also carbon, carcinoma, card etc. For many languages right
  truncation doesn't work well since the forms of a word are changed by
  changing something in the middle or the beginning of the word.

Henderson                                                      [Page 3]

Internet Draft              DASL Scenarios               September 1998


 2.2.9 Word proximity
  
  In the stemming example our user was searching for fairly common
  words, car and model, in an effort to find information on building
  model cars. Many resources that have nothing to do with model cars or
  building models of cars might contains both words. What the user wants
  is resources where model and car are close together. A search that
  takes into account the proximity of the search terms would help filter
  out the irrelevant resources. This is distinct from phrase search as
  described in 2.2.2 and the conjunctive content search in 2.2.1. It is
  different from phrase search in that the user here is probably also
  interested in "car models", "model cars", and "model of a car". It is
  also different from conjunctive search in that the user has a
  reasonable expectation that the words are likely to occur together in
  a relevant resource.

2.2.10 Query By Example
  
  A user has done a search and found some relevant or nearly relevant
  resources and some clearly irrelevant resources. Desiring a broader
  and more specific set of resources, they specify one or more of the
  relevant result resources and one or more of the irrelevant resources
  to a query by example type operator. The result is a new set of
  resources having more overlap in keywords than the irrelevant
  resources. This type of operator saves the user the considerable
  trouble of constructing a new query that will filter out the
  irrelevant resources while expanding the set of keywords from the
  relevant resources.

2.3 Navigation
  
  

2.3.1 Site Navigation
  
  While DAV itself is sufficient for basic site navigation, DASL can
  support fancier site navigation, where resources are sorted on the
  server, or filtered out on the server.

2.3.2 Browse Tree for exploring a resource space
  
  A DASL application could present a browse tree for a set of resources.
  In a browse tree some property is selected at each level of the tree
  to branch on. Thus if the top level property selected were resource
  type, then the unique values of the resource type property for all the
  resources would be the branches of the tree and would be presented to
  the user. So the user might see a list of resource types, say
  "Administrative memo", "Design spec", "Requirements spec", "Test
  plan", "Project schedule". Beneath that another property could be
  selected, say Project, which might display project names with values
  such as "Tuolemne", "Calaveras", "Russian", "Sacramento", "American",
  "Merced". At that point the user might want to view the list of

Henderson                                                      [Page 4]

Internet Draft              DASL Scenarios               September 1998


  resources within these categories and there might be only a few or
  just one project schedule for project Russian. The same resource space
  might also be explored using properties like Date and Author. (Note:
  DASL will most likely not explicitly support browse trees, but
  searches like 'docType = "Design spec" AND project = "Tuolemne" sorted
  by date' could be used to gather the raw data to generate the
  information for a node in the browse tree)

2.3.3 Finding information on a particular topic in an organized
collection
  
  A collection may have been organized according to some taxonomy and
  the keywords chosen accordingly. The user, knowing or having scanned
  the taxonomy, presents a query for general subject equal to gardening
  and subordinate subject equal to bonsai.

2.3.4 Finding information on a particular topic in an unorganized
collection
  
  A collection may not have been organized according to some taxonomy or
  the taxonomy may not be detailed enough for the user's purposes, or
  may be irrelevant to the user's interest. In this case content based
  search becomes crucial. A user could search for resources containing
  all three of the words "small", "Japanese", and "trees", and likely
  obtain articles on bonsai. If the collection were organized with a
  taxonomy that the user didn't know about they could then discover the
  keywords from the resource found and use that to find other resources
  with the same categorization.

2.3.5 External taxonomy to view a DASL collection
  
  A user could view various DASL supporting collections according to the
  user's own taxonomy. Here we assume that the user has a taxonomy where
  for each category there is a complex query for which the relevance
  score returned establishes a resources degree of membership in the
  category. A DASL application could issue a series of these queries on
  a collection resource and thus categorize the resources within the
  resource.

2.4 Search Isn't Always Enough
  
  The following scenarios deal with uses of search where the initial
  search or the basic result list isn't enough by itself to solve the
  user's information need.

2.4.1 Finding the right information by looking at the hit highlights
  
  Natural language being so context dependent means that content based
  search inevitably retrieves false positives if it is getting very many
  of the true positives. The user is left to pick through the resources
  returned to find the ones that are actually relevant. Highlight
  information can be used to make this easier. A DASL application could

Henderson                                                      [Page 5]

Internet Draft              DASL Scenarios               September 1998


  present a list of the sentences that had the hit words in them. This
  is likely to allow the user to discard most of the false positives
  without having to view the whole resource.

2.4.2 Finding the information in a large resource
  
  The user may do a content based search that returns a large resource
  of many pages but the relevant part of the resource is in only one or
  a few parts of the resource. Hit highlighting will help the user find
  those parts. A smart DASL application could present links to jump to
  the next hit or concentration of hits.

2.4.3 Saved query result
  
  A user does a search and gets a very large set of results. The user
  then progressively narrows the search down by adding constraints to
  the previous search.

2.4.4 Saved query result II
  
  A user does a search and spends some time improving the query so that
  it catches a large set of information on a particular topic without
  bringing in much noise. The query is made available to other users
  with similar information needs. The others are likely to combine that
  query with their own more temporary constraints to achieve their own
  information needs. If saved searches are explicitly part of the DASL
  protocol, it may be easier for servers to recognize repeated queries
  and avoid full re-execution of a search.

3. Internationalization
  
  The queries described above should work equally well for resources or
  properties in any language that can be expressed with Unicode. In
  particular, this means that when two strings are compared for equality
  or ordering, the customary language specific rules should be used.
  These rules will typically include rules for how case sensitivity is
  determined, the significance of diacritics, ordering of base
  characters, and sorting rules for strings. For example, in the Dutch
  language, a name such as "van Bree" sorts under "B" not "v". HTTP
  provides means of indicating the language of a entity, and XML
  provides means of indicating the language of an XML resource (the
  xml:lang attribute), and these should be used in DASL. Note that
  comparisons of strings from different languages is out of scope for
  DASL.

4. References
  
  [WEBDAV] Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R. Carter,
  D. Jensen, "Extensions for Distributed Authoring and
  Versioning on the World Wide Web", April, 1998. internet-draft,
  work-in-progress, draft-ietf-webdav-protocol-08.txt.


Henderson                                                      [Page 6]

Internet Draft              DASL Scenarios               September 1998


 5. Authors' Addresses
  
  Rick Henderson
  Netscape Communications
  501 E. Middlefield Road
  Mountain View CA 94043
  
  













































Henderson                                                      [Page 7]

Received on Friday, 18 September 1998 22:21:27 UTC