Comments on draft-reddy-dasl-protocol-04 from Jim Whitehead on 1999-04-26 (www-webdav-dasl@w3.org from April to June 1999)

From: Jim Whitehead <ejw@ics.uci.edu>
Date: Mon, 26 Apr 1999 14:14:04 -0700
To: www-webdav-dasl@w3.org
Message-ID: <001101be9029$b6b92b40$d115c380@ics.uci.edu>
Well, I have completed a review of the DASL protocol specification,
something I've been meaning to do for a long time, and I've attached my
comments to this message.

Before getting into the meat of the comments, I wanted to thank the authors
of this specification, as well as Alex, for doing a super job taking the
spec. this far.  As-is, the specification defines a very useful search
facility that will greatly enhance the capabilities of clients interacting
with WebDAV servers.  Searching is a very difficult topic, and the deep
experience of the authoring team in searching definitely shows in this
draft.  I'm really glad you've taken it this far, and I look forward to
working with you by giving more feedback in the future until the spec.
ships.

However, like any complex specification, especially one which is fitting
into the HTTP/DAV world, this specification has some areas which, if
addressed, will tremendously improve the protocol.  So, I submit these
comments in the spirit of improving the DASL protocol specification.

Since some of these issues may require further discussion, I ask that when
you reply, please don't quote the entire message in your reply, and please
divide the responses into "issue-sized" chunks.  Changing the subject line
to something more meaningful to each issue would also help.

OK, here goes:

Comments on draft-reddy-dasl-protocol-04.txt (Nov. 18, 1999):

---------
Comments:
---------

* Section 2.2.2 states that the server MUST recognize a text/xml
  request, and may understand requests transported in other content
  types.  This section should reference RFC 2376 (XML Media Types) as
  giving correct guidance on packaging XML. It should also make it a
  MUST for servers to understand application/xml as well, since it is
  possible that text/xml may be deprecated in the future, and since
  both text/xml and application/xml are supported by RFC 2376.

* This specification essentially defines a new type of Web resource,
  of type "search arbiter".  This raises a number of questions
  regarding how this kind of resource interacts with existing HTTP
  methods.  I would expect to see a section which goes through and
  details the interactions between HTTP and WebDAV methods and search
  arbiters. For example, it seems reasonable to me to allow a search
  arbiter to potentially reply to GET (perhaps with a human-meaningful
  description of the capabilities of the arbiter), and for this GET
  response to potentially be authorable using PUT, and locked using
  LOCK.  However, I wouldn't expect COPY, MOVE, or DELETE to work,
  although I would expect PROPPATCH and PROPFIND to work OK.  Another
  issue is what kind of resource type a search arbiter returns in the
  resourcetype property (I'd expect a <searcharbiter/> element).

* How does a search arbiter respond to searches, if the search arbiter
  URI is within a search scope?  The answer to this is related to the
  answer to whether a search arbiter has its own properties, body,
  etc.

* Section 2.5 states that the 507 (Insufficient Storage) status code
  should be returned when SEARCH produces more responses that the
  server is willing to immediately return.  A 5xx status code isn't
  appropriate for this case, since the response does have valid search
  results, indicating that the client correctly submitted a search,
  and this search was successfully performed by the server, even if it
  isn't returning all search results.  I recommend defining a new
  status code for this case, 208 (Partial Results).

* On the topic of partial search results, DASL currently has no way
  for a client to request the next chunk of a set of search results.
  Since *every* search service I've interacted with on the Internet
  has a feature for returning the next set of search results, I really
  would expect this feature to be in DASL. An explanation for why this
  feature isn't present should be in the protocol specification if it
  is not going to be supported.

* I would expect the SEARCH method to return a 102 (Processing)
  response code if the server is taking a long time (over N seconds,
  for smallish N) to perform the search.

* Can a SEARCH be redirected by a 301/302 response?  I see no reason
  not to, unless it would expose privacy concerns.  I know there are
  facilities in place for arbiter redirection, but it still could
  occur that a SEARCH would get issued to a URL that responds with a
  301 or a 302.  If SEARCH can be redirected, does it make sense for
  arbiter redirection to be handled by the 301/302 mechanism too?

* Is the response from SEARCH cacheable?  The spec. is silent on this
  point.

* How does a DAV client discover which search arbiter can be used to
  search a portion of the DAV namespace?  At present, the
  specification seems to imply two things (a) that "/" might be a
  typical arbiter, and (b) that other arbiters can exist and you can
  get redirected to them.  If this issue isn't addressed in the
  specification, it might lead to clients having hard-coded search
  arbiter locations, thus forcing servers to put an arbiter at those
  locations or be non-interoperable.  Or, it will require clients to
  be configured with the search arbiter location, which also seems
  bad.  It seems far better to have a predefined mechanism which
  clients can use to discover the location of the search arbiter. One
  simple mechanism would be to define a property on each collection
  (but not each resource) which gives the location(s) of appropriate
  arbiters.

* How would a client use the results from a query schema discovery?
  Is the expectation that a client will first perform a QSD before
  they issue their first query against a given scope?  A section
  discussing this topic would be helpful.

* In Section 5.2, is it an error that a search can only have a single
  scope, or is it intentional that a search only have a single scope?

* In section 5.4.1, allowing relative URIs doesn't seem to be
  particularly compelling, since a search arbiter would not, I expect,
  be in the same part of the namespace as the content being searched.

* Section 5.6 should have a separate section, with a separate heading,
  for the description of ascending and descending.  I had a hard time
  finding these descriptions without this section heading.

* In Section 5.10, what is a literal value?  Also, exactly how does
  the xml:space attribute affect DAV:literal.  I think this should be
  spelled out.  Also, can a client always put a wildcard pattern (from
  Section 5.12.1) inside a literal element, or can a client only use
  the wildcard for a literal inside a DAV:like.  If the latter, then
  perhaps some element other than DAV:literal should be used, since it
  seems to be bad practise to have the semantics of an element vary
  depending on whether it is enclosed by another element.

* The BNF for a wildcard permits the entry of "</d:literal>" which
  would confuse parsers.  Also, the BNF sequence for text should use
  characters instead of octets, to better handle multi-octet character
  set representations (like UTF-16).

* Section 5.9: A non-native English speaker might not map "lt" to less
  than, "lte" to less than or equal, etc.  These need to be spelled
  out -- for example, do they only apply to numbers?  If so, what is a
  number?  Since gt and lt are used by the sort orders ascending and
  descening, it also appears they apply to strings as well.  I suspect
  their definitions will not be trivial once i18n is considered.

* Section 5.13 should discuss the case where a server receives a query
  in UTF-16, but the resources being searched are stored in UCS-2,
  UTF-8, etc.  Seems that cannonicalizing to UCS-4 internally (at
  least logically) might be a way to go.  At the very least, using XML
  means that this could happen, and server implementors should be made
  aware that this can happen, and perhaps given some guidance on how
  to address the issue.

* Section 5.16: this section needs to give guidance on how case
  sensitivity is handled in non-latin character sets.

* Section 5.18: I'm assuming that the reason DAV:iscollection
  exists is because doing a search for DAV:resourcetype equal
  to DAV:collection would be too expensive.  Perhaps this
  should be mentioned in Section 5.18.

* I had a hard time following the discussion in section 5.19.2.
  Perhaps if the paragraph started by stating at a high level what a
  DAV:propdesc does, perhaps in terms of how a client would use it.  I
  didn't find the first sentence to be that helpful.

* Perhaps instead of the huge URN in Section 5.19.3, a shorter URI
  could be used, such as "DAV:/DASL/datatypes/".  While WebDAV has not
  been too concerned with message size, using a GUID URN doesn't
  appear to be justified here, and it sure is looong.

* The types in 5.19.3 are underspecified.  Some areas which need
  improvement:
  - It needs to be made explicit that these types will appear as XML
    elements, and every XML element should have a DTD entry for it in
    the spec.
  - There should be a BNF description for each data type.  This is
    especially necessary for the float and datetime types.
  - A string should probably be a triple of contents, character set
    encoding, and natural language (hmm, well perhaps the character
    set encoding doesn't have to be listed here, but natural language
    should be present.)

* I think this specification would be greatly strengthened by adding a
  few examples which perform queries from the scenarios document.
  Some I would be interested in seeing:
  - Scenario 2.2.3, "Finding a specific resource by author and date
    range"
  - Scenario 2.2.4, "Finding a specific resource using both
    content and property search"
  - I'd also be interested in an example where a search was submitted
    in a non-latin character set, and the results come back sorted
    according to the rules of a non-latin character set.

* The Internationalization Considerations section can use some
  improvement. Here are a few issues which need to be addressed:

  - The DASL spec. needs to make some policy statement about sort
    order in non-latin character sets, if only to give server
    implementors some kind of hint as to how they should handle this
    case.  There must be some books/standards available which address
    this issue, so they should be mentioned and referenced.

  - Some text on handling of character sets would be helpful.  For
    example, I suspect DASL wants to limit the valid character set
    encodings to just ISO 10646 variants.  This allows all character
    sets to be mapped back to their canonical ISO 10646 values.  This
    section should explicitly note that a query might be submitted in
    a different character set than the properties or content of the
    resource.

  - How do string equality and the language tag interact?  It isn't OK
    to just fail a search if the language tags are different, since a
    search submitted in en-us might match a en-uk string.

  - In a submitted query, where is it valid to have an xml:lang attribute?

* In the Security Considerations section, the XML security
  considerations should be copied in from the WebDAV specification.

* The Security Considerations section should explicitly mention that
  there might be privacy risks associated with queries, especially
  queries which require a user to first authenticate themselves.  For
  example, you might not want someone else to know you're searching
  for patents on X.

---------------
Minor comments:
---------------

The dashed lists in sections 1.1 and 1.5 are not indented enough.

The references for WebDAV, XML Namespaces, and DASL Requirements all
need to be updated.

Section 2.2.1, first sentence: remove "per se" -- the sentence is
clear without these latin words

In Section 2.4.2, the response resource should have a trailing slash
after "siamsiam.com".

Section 2.6.6 defines the redirectarbiter element, but doesn't specify
that its contents must be a URL.

Section 3, first sentence "by a resource" --> "by a search arbiter
resource"

Section 3.2, the Coded-URL production is defined in Section 9.4 of RFC
2518

Section 3.3: Since the DAV:basicsearch must be supported by all
implementations of SEARCH, the example in 3.3 should list the
DAV:basicsearch URI in one of the DASL headers.

Section 4.1.1: The example should be complete, instead of having a
natural language forward pointer to section 5.19.9 within the
basicsearchschema element.

Section 5.1, second paragraph: Perhaps a dashed list giving the
element name and its contents would be easier to read.

Section 5.3 should explicitly state that the result record is a set of
properties.

Section 5.4, for completeness describe the semantics of DAV:depth of 0.

Section 5.4: there is a spurious "8.5.1" at the end of the paragraph.

Section 5.6: ANSI SQL should be added to the list of references,
putting it in a "Non-normative References" section.

Section 5.11: "on a resource on a resource" --> "on a resource"

Section 5.19.2: "provide a hints" --> "provide hints"

Security Considerations: "Server should prepare" --> "A server should
prepare"

Section 14: you may wish to update Jim Davis' contact information.

- Jim
Received on Monday, 26 April 1999 17:32:19 UTC