DASL BOF February 10 1998 1 Minutes of DASL BOF Meeting Date: 10th Feb, 1998 9AM to 4:30 PM Location: Xerox PARC, Palo Alto, CA Prepared By: Surendra Reddy, Oracle Corp. (skreddy@us.oracle.com) 2 Agenda 9:00 am Welcome & Agenda 9:30 am Discuss proposed working group charter 10:00 am Presentation of current requirements Internet Draft 10:30 am Break 10:45 am Presentation on STARTS (Kevin Chang, Stanford) 11:00 am Presentation on DMA (Chuck Fay, Filenet) 11:15 am Presentation on other Internet search work 11:30 am Discussion of requirements draft 12:00 noon Lunch 1:00 pm Discussion of requirements draft 2:45 pm Break 3:00 pm Presentation of protocol "sketch" 3:15 pm Discussion of protocol 4:30 pm End of meeting 2.1 Attendees Alan Babich, Filenet(ababich@filenet.com) Alex Hopmann, Microsoft(alexhop@microsoft.com) Chen Chuan Kevin Chang, Stanford ( kevin@db.stanford.edu) Chuck Fay, Filenet(cfay@filenet.com) Dennis Hamilton, Xerox(dhamilton@parc.xerox.com) Jim Cunningham, Netscape(jfc@netscape.com) Jim Davis, Xerox(jdavis@parc.xerox.com) Jim Whitehead(ejw@ics.uci.edu) Larry Masinter, Xero(masinter@parc.xerox.com) Michelle Baldonado, Xerox(baldonado@parc.xerox.com) Ralph Swick, WWW(swick@w3.org) Rick Henderson, Netscape(rickh@netscape.com) Saveen Reddy, Microsoft(saveenr@microsoft.com) Steve Cousins, Xerox(cousins@parc.xerox.com) Surendra K Reddy, Oracle Corp. (skreddy@us.oracle.com) DASL BOF February 10 1998 3 Discussion on Working Group Charter: After the welcome and introduction session, Alex started the meeting with a brief introduction to IETF process. Following is the summary of IETF process as explained by Alex: Members - those who show up at IETF meeting(s) and who are actively involved in the mailing list discussions. Process of creating IETF standard - Hold BOF; define working group charter. Charter: who does what, what is the scope of the WG; what WG will do and what WG won't do. DASL mini-BOF was held in IETF meeting in Washington DC (December 1997). This meeting is the follow-up. Being active in the DASL mailing list is very important - all legitimate technical concerns raised in the mailing list need to be discussed. Rough consensus on these issues are very important for Working Group success; determine when rough consensus exists; publish internet draft. An internet draft is temporary, valid only for 6 months, and gets deleted after 6 months if no further action is taken on the document. Submit to IESG when we get rough consensus; IETF publishes it as RFC. Working Group last call is not formally part of the IETF process; but it gives the good chance to the working group members to review any legitimate concerns; Minimum 2 weeks for last call and draft should be an internet draft at least for 3 weeks. There is no formal voting in IETF. Even informal voting is discouraged. IESG scrutinizes the document more thoroughly and either rejects the document or return with comments or advances it to Proposed Standard. To move it to Draft Standard, there must be two independent, interoperable implementations. DASL BOF February 10 1998 If any changes are done at this stage, it has to cycle through the whole process. To progress to a standard, protocol must be widely deployed and understood. We can only refer to stable documents; XML references can be made in DASL documents. 4 Working Group Charter Definition 4.1 Within Scope: 4.1.1 Define a protocol document which will define: How do you express search: semantics and syntaxes What exactly you are searching: Scope and Resources What results look like: the server response syntax Arbitrary data model; Content searches are in scope for DAV resources; define what sort of server document types should be able to search; We need to focus on what we need to search on resources' properties and contents (as specified in the DAV protocol); how do express search; data model for searching; what does result set look like; 4.1.2 Query syntax 4.1.3 Authorization Need to define Authorization and search impact on ACL 4.2 Out of Scope Default property sets Server-to-server communication Non-Text Content searches Client Control of server indexing Must be HTTP 1.1 complaint Jeff Cunningham - should be able specify the search syntax; Discovery of syntaxes supported by server; Need to address security considerations 4.3 Milestones: Feb 98: ID for Requirements Documents Draft Mar 98: ID for DASL protocol DASL BOF February 10 1998 Mar 98: Develop Requirements and Protocol Documents at LA IETF Aug 98: Develop Protocol document at Chicago Oct 98: Submit Requirements as Informational RFC Mar 99: Submit Protocol documents as Proposed Standards 4.4 Who Does What? Chair: Alex Hopmann Protocol Document Authors: Saveen Reddy, Del Jensen, Surendra Reddy, Rick Henderson; Editor: Rick Henderson Requirement Document Authors: Judith Slein, Saveen Reddy 5 Notes from Working Group Charter Presentation Goal is to develop a search and locating mechanism for: -data exposed by DAV servers -Variety of underlying storage systems Problems: -WebDAV mechanisms insufficient for common searching, operations - PROFIND & GET: client-side search Solution: - DASL is server-side search; define a protocol - Why server side search? - Less network usage - Use server intelligence - Ultimately: Better performance - NO Internet Standard way to search website 6 Stanford Proposal for Internet Meta Searching Presentation NOTE: Presentation Given by Chen Chuan Kevin Chang, Stanford University Problems: -No information about sources -Different query languages -Different ranking algorithms 6.1 STARTS Solution - Meta Searcher(Goals) - Resource discovery DASL BOF February 10 1998 - Query translation - Rank merging - Source meta data to export; information in query results - Source Meta Data; Queries: Filter + Ranking - All qualified documents are ranked per the ranking expression, what fields are returned in the result set; Also specifies number of documents need to be returned - Simplifying decisions: ( No non-textual/nested information); sessionless communication; no security; no error reporting; minimum requirements - But it supports options for sophisticated sources; ways to specify supported options; support for multiple languages and character sets ( Emphasis on character sets; UTF8 encodings) - STARTS Status { Reference implementation built at CORNELL ) - Z39.50 Profile on STARTS being designed 6.2 More Information on STARTS: URL: http://www-db.stanford.edu/~gravano/starts_home.html NOTES: Need to include more information on "being able to return highlighting information is desirable for many applications; this need to be discussed more in requirements for DASL 7 DMA Query Overview(Presentation) NOTE: Presented by Chuck Fay, Filenet Notes from Chuck's presentation: DMA 1.0 specification approved in Dec 97. DMA Coverage (Document Classes/Properties; versioning; Renditions; Containment; Dynamic discovery of DM repository capabilities 7.1 Query Coordinated query across multiple repositories via DMA middleware Query capabilities can be determined by inspection at run- time Powerful, but sparse set of required capabilities supports repositories with few services Implementation of DMA ->May in AIIM show DASL BOF February 10 1998 Scope is always searchable; property may be searchable or not Renditions are identified by MIME types; Classes are like database. Tables and columns are properties Every DMA object is self describing DMA 1.0 is expressed as set of MS COM objects Possible to implement a Lightweight Document Management; Whole query mechanism in DMA is optional; Finding by navigation; by ID; or by Query -- Enables legacy systems to participate in coordinated queries through Use of alias GUIDs; Class and Property specific operators, operands Three Values logic for undefined or null values, e.g.,True and Unknown = Unknown ; True or Unknown = TRUE DMA Queries are expressed as a set of DMA classes and interfaces that describe search capabilities of a DOC Space - SQL Like constructs - Self describing and extremely scalable - Object Oriented, parse tree representation - Asynchronous query execution and retrieval supported - Controversy on modeling on SQL like model(Challenge in DMA/SQL mapping - DASL will define basic simple mechanism; XML vs. SQL ) - Merged Scopes are created from a list of list of component scopes and a combination rule, initially one of -intersection or union - Union/Intersection unification of - searchable classes; properties - Merged scopes distribute queries to their component scopes and collect and merge results - Merged scopes may alter the query prior to distributing it to a component scope 8 Application Search Areas: NOTE: Presented by Alex Hopmann Alex presented brief overview of search capabilities present in LDAP, IMAP and ACAP protocols. * IMAP Search - searches message containers by a sequence of keywords DASL BOF February 10 1998 * LDAP - search expression tree; equality; proximity search; naming authority to assign name space - extensible query syntax * ACAP - Allows depth operations; Notes: Jim Whitehead suggested presenting a brief note comparing LDAP, ACAP and IMAP4 search features and contrast with what we are proposing in DASL. 9 DASL Requirements Presented By: Saveen Reddy, Microsoft Saveen presented overview of the current DASL requirements. Discussion Notes: - Resource: anything that is addressable to HTTP URL is a resource; property could be a resource ( Refer to WEBDAV /HTTP ) - Information about scope; schema; statistics of collection (What kind of searches does server support ) - Need <, <=, =>, >, ==, != for all order types - Should be able to search over nested XML - Need to discuss on supporting GREP functionality and its impact on the performance. - Should be able to perform scoping within the document; Structurally constrained - Should be able to specify the search criteria; expressing a Natural language - Should be able to Specify the cost of the search criteria, Relevance criteria; Ranking criteria - Should be able to support: Stemming, Phonetics, truncation, Keyword expansion, Case- sensitive(internationalization) - Paged result is a more of a burden on the server side - are there any efficient means of implementing this - needs flow control; caching process; or do we throw everything to the client? - Life Cycle Commitment of the paged results - providing a search interface should be able to provide a scrollable mechanism to view the result set - Network performance vs. Server side performance; repercussions on the server side? Leave it to the capabilities of the server - It should be possible to cache the results at the server side; Client should be able to make use of the cached data. DASL BOF February 10 1998 - Properties can be self describing - Protocol overhead on using XML - Alex explained that using content transfer encoding/compression would improve the performance - - HTTP Proxy support/caching issues/ETAGS - entity is the on the wire representation - Protocol has to be designed to work right with the proxies - Do we need to advertise a DASL server and need to be supported by DAV? - Even in the external members, external members are indexed in the local store and maintain in the catalog; pull the content index it and store it locally - Discovery: Scopes; Reference property; Root of the web server can be a DAV collection or references of other serves who maintains the index - Simple Query Search Syntax - a very simple grammar for interoperability 9.1 Comments: Surendra Reddy - Support for proximity operators like IN, CONTAINS, LIKE Surendra Reddy - LDAP Style query syntaxes, matching rules and filters Jim Davis - Discovery of operators are useless. Discovery of properties is more useful Larry Masinter - Web Forms based queries defines the query specification … is tied to the form … allows more complicated searches between the client and server. Chuck Fay - felt the need for discoverability of the search mechanism Discovery of operators, operands etc. Jim Whitehead - We must support extensibility of the searches Authentication - Fields are searchable not selectable? Depends on DAV ACLs Jeff Cunningham - Need to address security attacks based on the database search Internationalization - Charsets; languages and string matching, sorting For internationalized content. Is there any relationship with DAV says? It is hard not to talk about internationalization. There may be documents in different languages on the same repository. DASL BOF February 10 1998 Internationalization requirements ? Recent policy on IETF are strings are need to be identified by the language. Requirements for hit list high lighting - extension mechanism to convey the high lighting - Rank is query relevant; Let us not try and limit solutions in the requirements document [LARRY] DASL Server Capability Advertisement? 10 Protocol Sketch XML is preferred query string than SQL Query syntax discovery should be able to return query strings supported Requirements document should mention how this will fit into HTTP extensions? 10.1 Comments: Jim Davis - no SQL; go for XML strongly as SQL does not fit very well for internationalization requirements; most of the members present supported XML using in QUERY syntax rather than SQL-like syntax. Saveen Reddy - Presented SQL like language("Saveen" QL) as a query language. Surendra Reddy - OK with XML; but sever should be able to support multiple query syntaxes such as SQL. 11. Meeting Close Alex then asked if those present all supported the formation of the DASL working group, subject to revision of the charter. All meeting attendees expressed their support. Alex thanked the hosts from Xerox and closed the meeting.