Re: Comments on the HTTP/1.0 draft. from Marc VanHeyningen on 1994-12-02 (ietf-http-wg@w3.org from October to December 1994)

From: Marc VanHeyningen <mvanheyn@cs.indiana.edu>
Date: Thu, 01 Dec 1994 22:50:50 -0500
To: "Roy T. Fielding" <fielding@avron.ICS.UCI.EDU>
Cc: http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
Message-Id: <25721.786340250@moose.cs.indiana.edu>
Highlighting a few issues, which I hope will not create the image that
I am just trying to disagree with Roy on everything :-)...

- If-Modified-Since.  Part of the whole point of how this mechanism
  was defined is that servers that don't support it will just ignore
  it and return the whole object, which may sometimes be inefficient
  but won't break anything.  I think servers "should" implement this
  feature.  "Must" is too strong for a feature that increases
  efficiency but won't break anything by its absence.

- Non-ASCII characters in headers.  I don't think this is a big deal
  at all, though I'll be surprised if there isn't already somebody
  somewhere using non-ASCII in the comment section of the From: line
  or something, and I hope it's being done according to 1522 instead
  of somebody assuming the character set used in his particular nation
  is the universal character set for the whole world.

- HTTP-Dates.  It's not that including the day of the week is
  unfathomably difficult, but changing things in general.  It's
  confusing to say "An rfc1123-date in HTTP actually only allows a
  restrictive subset of what RFC 1123 specifies," and for little if
  any gain.

  I am uncomfortable with deviating from existing specifications
  without more compelling reasons for doing so.  I mean, heck, if we
  just want a date that's easy to parse, how about an integer of the
  number of seconds since the beginning of 1970?  Easy to implement,
  at least under UNIX. :-)

- Canonicalization of content.

  I'll drop this if everyone else thinks I'm just being a pedantic
  dork, but I really believe the purpose of a specification is to
  establish precise, correct behavior in which neither clients nor
  servers need to do heuristic guessing about what means what.

  Chuck Sutton suggests:
  > IMHO, it should state, and CRLF should all be interpreted
  > equally as EOL when used as line ends. This avoids any problems with
  > machine dependent EOL symbols, and fairly represents the current practice.
  > (It also avoids forcing clients and especially servers to do line-by-line
  > translations of EOL for all outgoing response information, which is a BIG
  > performance hit.)

  (Aside: Does somebody have benchmarks to establish the magnitude of
  this "big performance hit"?)

  This is probably sensible behavior, and something along these lines
  (possibly modulo the suggested changes from Ari) should go in an
  appendix on tolerant, robust implementations.  This is in keeping
  with the oft-cited philosophy of "be liberal in what you accept."

  However, the other half of that is "be conservative in what you
  send."  Being conservative means sending objects in canonical form
  only, and not assuming the program on the other end will be clever
  enough to guess what you really meant.  The spec should say this.

  How about with new developments?  If UNICODE support is desired, how
  should line breaks be represented and detected in a robust fashion?
  Do we really want to have to include low-level stuff like this in
  the spec, instead of just saying "do it in canonical form"?

  Aside:  The issue of canonicalization is, in principle, not wedded
  to any particular content-type family, but in practice seems almost
  exclusive associated with line endings.  In principle, this isn't
  really true; for instance, discarding the resource fork from a Mac
  file and sending on the data could be considered converting it to
  canonical form, and obviously that's needed.  Or should we expect
  all clients to be clever enough to recognize that and discard it? :-)

  OK, end of tirade (maybe.)  If people simply must ship around
  objects with different ways of representing the same thing, there
  should be an out-of-bandwidth way to indicate that.  A
  Content-Encoding of "unix-text", for instance, could indicate that
  line breaks are represented with LF.  Obviously a provision for
  multiple C-Es would be needed to describe things like "gzipped UNIX
  text".  This should be a C-E, though, not a C-T-E.  A proposed C-T-E
  for UNIX text would probably trigger an uproar of laughter on the
  MIME mailing list (and rightly so.)

- Passing thought:  If a request contains a Message-ID header, should
  the server include that message-ID in the response, maybe in an
  In-Reply-To: header?


   - Marc
--
Marc VanHeyningen  <URL:http://www.cs.indiana.edu/hyplan/mvanheyn.html>
Received on Thursday, 1 December 1994 19:51:49 UTC