INTERNET-DRAFT Rick Henderson
draft-dasl-scenarios-00.html Netscape Communications
September 4, 1998
Expires Mar 4, 1999
Scenarios for DASL
Status of this Memo
This document is an Internet draft. Internet drafts are working documents
of the Internet Engineering Task Force (IETF), its areas and its working groups.
Note that other groups may also distribute working
information as Internet drafts.
Internet Drafts are draft documents valid for a maximum of six months
and can be updated, replaced or obsoleted by other documents at
any time. It is inappropriate to use Internet drafts as reference material
or to cite them as other than as "work in progress".
To learn the current status of any Internet draft please check the "lid-abstracts.txt"
listing contained in the Internet drafts shadow
directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au
(Pacific Rim), ds.internic.net (US East coast) or ftp.isi.edu (US
West coast). Further information about the IETF can be found at URL:
http://www.ietf.org/.
Distribution of this document is unlimited. Please send comments to
the mailing list at www-webdav-dasl@w3.org, which may be
joined by sending a message with subject "subscribe" to www-webdav-dasl-request@w3.org.
Discussions of the list are archived at http://www.w3.org/pub/WWW/Archives/Public/www-webdav-dasl.
Abstract
The Distributed Authoring and Versioning protocol [WEBDAV] defines simple
mechanisms to assign and retrieve values for properties. This document
presents scenarios for a WebDAV extension to support efficient searching
for resources based on WEBDAV properties and content. These scenarios are
intended to suggest some of the uses that DASL could be put to. This
may in turn motivate decisions on what is essential to DASL and what may
be considered extra.
1. Introduction
The scenarios below are intended to provoke discussion of what DASL should
and shouldn't do. It is not necessarily true that DASL should support
all of these or to what extent DASL should support them and to what extent
DASL is a small piece of what it would take to support them. At least
one is probably impossible. These scenarios should encompass most
of the sorts of things that we expect DASL to play a part in.
2. Scenarios
The scenarios below are roughly grouped into scenarios dealing with the
following topics: Document Management, Seeking Information, Navigation,
Controversial Scenarios, Document Management Alliance, and Various types
of documents DASL could search over.
2.1 Document Management
Search could be used to help keep track of what is going on with a set
of DAV resources. Some DASL queries that might help with this:
-
Search for all the documents that are locked.
-
Search for all the owners of locked documents.
-
Search for documents that have been locked for more than 1 week. [Though
desirable this is impossible since DAV does not record the time when a
document was locked]
-
Search for documents that have not changed in the last year.
These queries could help find documents that are likely to be undergoing
changes, who is changing them, what documents have been locked for too
long, what documents aren't dynamic anymore.
2.2 Seeking Information
2.2.1 Finding a specific document by phrase
A user remembers a document that they liked and want to see again but doesn't
have it book marked or remember the location. They do remember a
key phrase from the content though. They can search for the phrase
such as "invisible car", and find the document without picking through
a large number of irrelevant documents.
2.2.2 Finding a specific document by author and date range
A user's information need may be expressed something like this: "I need
that trip report that John Doe wrote last spring." They don't know
its location or its title. They can search for documents with author
equal to "John Doe" and create date greater than 1998/01/01 and less than
1998/06/01. This may yield few enough documents to easily find the one
of interest.
2.2.3 Finding a specific document using content search
Another user's information need may be like this: "I need that article
I saw a while back that made a connection between epilepsy, migraines,
and zinc." They can do a content based search using the words, epilepsy,
migraine, and zinc.
2.2.4 Finding a specific document using both content and property
search
The user who wanted to find the trip report that John Doe wrote last spring
may find that John Doe was very prolific and wrote several hundred things
last spring. The user may do better using both content and property
search. They can search for documents with author equal to
"John Doe" and create date greater than 1998/01/01 and less than 1998/06/01
that contain the some of the words IETF, Redmond, and DASL.
2.2.5 Finding documents of a particular kind
DASL could be used to find documents of a particular kind such as images.
This could be used directly by an end user looking for interesting images,
or by a program that does some kind of processing on the images like select
gif images that are portraits. A query that asked for mime-type =
image/* could gather that data.
2.2.6 Finding documents in a particular language
Assuming that a language attribute is set, then a search could be restricted
to documents that are in a particular language, say German. It would
be possible for a site to automatically set this tag using language recognition
technology.
2.2.7 Searching for information on multiple servers
A user seeking information of some sort may not know what server(s) contain
the information they are seeking. The DASL client program can send
the content based query to a several servers without having to translate
the query into a different query syntax for each server. For property
queries, the DASL client can query the attribute schema on the DASL servers
and send a property query or a mixed property and content query to a set of
DASL servers that have common property
schema. The results from such a cross server search can be sorted
according to property values or according to relevance score.
2.2.8 Stemming
If a user is searching for information about the hobby of building model
cars, documents that are likely to contains various forms of those words,
model, models, modeling, as well as car and cars. Stemming saves
them from entering all the various forms of the words they may want to
match. Entering all these forms can be much more problematic in more inflected languages
than English.
2.2.9 Word proximity
In the stemming example our user was searching for fairly common words,
car and model, in an effort to find information on building model cars.
Many documents that have nothing to do with model cars or building models
of cars might contains both words. What the user wants is documents
where model and car are close together. A search that takes into
account the proximity of the search terms would help filter out the irrelevant
documents.
2.2.10 Query By Example
A user has done a search and found some relevant or nearly relevant documents
and some clearly irrelevant documents. Desiring a broader and more
specific set of documents, they specify one or more of the relevant result
documents and one or more of the irrelevant documents to a query by example
type operator. The result is a new set of documents having more overlap
in keywords than the irrelevant documents. This type of operator
saves the user the considerable trouble of constructing a new query that
will filter out the irrelevant documents while expanding the set of keywords
from he relevant documents.
2.3 Navigation
2.3.1 Site Navigation
While DAV itself is sufficient for basic site navigation, DASL can support
fancier site navigation, where documents are sorted on the server, or filtered
out on the server.
2.3.2 Browse Tree for exploring a document space
A DASL application could present a browse tree for a set of documents.
In a browse tree some property is selected at each level of the tree to
branch on. Thus if the top level property selected were document
type, then the unique values of the document type property for all the
documents would be the branches of the tree and would be presented to the
user. So the user might see a list of document types, say "Administrative
memo", "Design spec", "Requirements spec", "Test plan", "Project schedule".
Beneath that another property could be selected, say Project, which might
display project names with values such as "Tuolemne", "Calaveras", "Russian",
"Sacramento", "American", "Merced". At that point the user might
want to view the list of documents within these categories and there might
be only a few or just one project schedule for project Russian. The
same document space might also be explored using properties like Date and
Author. (Note: DASL will most likely not explicitly support browse
trees, but searches like 'docType = "Design spec" AND project = "Tuolemne"
sorted by date' could be used to gather the raw data to generate the information
for a node in the browse tree)
2.3.3 Finding information on a particular topic in an organized collection
A collection may have been organized according to some taxonomy and the
keywords chosen accordingly. The user, knowing or having scanned
the taxonomy, presents a query for general subject equal to gardening and
subordinate subject equal to bonsai.
2.3.4 Finding information on a particular topic in an unorganized
collection
A collection may not have been organized according to some taxonomy or
the taxonomy may not be detailed enough for the user's purposes, or may
be irrelevant to the user's interest. In this case content based
search becomes crucial. A user could search for documents containing
all three of the words "small", "Japanese", and "trees", and likely obtain
articles on bonsai. If the collection were organized with a taxonomy
that the user didn't know about they could then discover the keywords from
the document found and use that to find other documents with the same categorization.
2.3.5 External taxonomy to view a DASL collection
A user could view various DASL supporting collections according to the
user's own taxonomy. Here we assume that the user has a taxonomy
where for each category there is a complex query for which the relevance
score returned establishes a documents degree of membership in the category.
A DASL application could issue a series of these queries on a collection
resource and thus categorize the documents within the resource.
2.4 Controversial Scenarios
These are scenarios where there is great doubt as to if they will be supported
in the protocol.
2.4.1 Finding the right information by looking at the hit highlights
Natural language being so context dependent means that content based search
inevitably retrieves false positives if it is getting very many of the
true positives. The user is left to pick through the documents returned
to find the ones that are actually relevant. Highlight information
can be used to make this easier. A DASL application could present
a list of the sentences that had the hit words in them. This is likely
to allow the user to discard most of the false positives without having
to view the whole document.
2.4.2 Finding the information in a large document
The user may do a content based search that returns a large document of
many pages but the relevant part of the document is in only one or a few
parts of the document. Hit highlighting will help the user find those
parts. A smart DASL application could present links to jump to the
next hit or concentration of hits.
2.4.3 Saved query result
A user does a search and gets a very large set of results. The user
then progressively narrows the search down by adding constraints to the
previous search.
2.4.4 Saved query result II
A user does a search and spends some time improving the query so that it
catches a large set of information on a particular topic without bringing
in much noise. The query is made available to other users with similar
information needs. The others are likely to combine that query with
their own more temporary constraints to achieve their own information needs.
If saved searches are explicitly part of the DASL protocol, it may be easier
for servers to recognize repeated queries and avoid full re-execution of
a search.
2.5 Document Management Alliance
The DAV/DASL capabilities could be implemented via an implementation of
the Document Management Alliance (a document management API standard).
This would allow the documents from a feature rich document application
to be exposed on the web via DAV and DASL.
2.6 Various types of "documents" that DASL could search over
Many different sorts of documents and types of information can be searched
for using the DASL protocol. Besides the usual notion of documents
written by a person with the intent of conveying some kind of information,
other possibilities are:
2.6.1 Source Code
Computer program source code contains a large amount of information of
a somewhat structured nature as well as unstructured natural language comments.
Much structured information can and is extracted and could be made available
to CASE tools or actual programmers.
2.6.2 Phone conversations
Phone conversations are often recorded. They could have voice recognition
applied allowing content based search on the contents of the conversation
along with property search on information about the call, e.g. caller,
callee, time of call, possibly voice recognition or voice separation
info.
2.6.3 Mug shots
Any standardized type of image could have a lot of structured information
extracted and made available for search. There might be applications
in law enforcement, talent search, or genealogy.
3. References
[WEBDAV] Y. Y. Goland, E. J. Whitehead, Jr., A. Faizi, S. R. Carter,
D. Jensen, "Extensions for Distributed Authoring and
Versioning on the World Wide Web", April, 1998. internet-draft, work-in-progress,
draft-ietf-webdav-protocol-08.txt.
4. Authors' Addresses
Rick Henderson
Netscape Communications
501 E. Middlefield Road
Mountain View CA 94043