Re: Comments (Part 1) on HTTP I-D Rev 05 (ADAMS1) from Jim Gettys on 1998-11-13 (ietf-http-wg@w3.org from October to December 1998)

From: Jim Gettys <jg@pa.dec.com>
Date: Fri, 13 Nov 1998 11:38:26 -0800
To: "Adams, Glenn" <gadams@spyglass.com>
Cc: http-wg@cuckoo.hpl.hp.com
Message-Id: <9811131938.AA28109@pachyderm.pa.dec.com>
Here's what I percieve to be the less significant of Glenn's comments, with
my intended resolutions.  If anyone wants to take issue with any of these
resolutions, please indicate the comment number in the subject
line so we can keep track of them.
				- Jim

> From:	Adams, Glenn
> Sent:	Monday, October 26, 1998 11:13 AM
> To:	'http-wg@cuckoo.hpl.hp.com'
> Subject:	Comments (Part 1) on HTTP I-D Rev 05
> 
> I'm not certain which form is preferred, sending comments en masse or
> individually. If the
> latter is desired, let me know and I'll break these out. Of the
> following, comments 6, 10, 22,
> 25, 30, 37, 38, and 41 are potentially substantive issues. These
> comments cover sections
> 1-11; I intend to complete my comments later this week on the remaining
> sections.
> 
> 1. Section 1.2 fails to state that implementations that fail
> to satisfy statements marked as "REQUIRIED" would not qualify
> as compliant. Otherwise, suggest replacing REQUIRED with MUST or
> MUST NOT for the sake of consistency.

Ok, I'll say "An implementation is not compliant if it fails to to satisfy
MUST or REQUIRED level requirements for the protocols it implements."

> 
> 2. Section 1.2 should indicate the status of these keywords in
> "Notes". Are the use of these keywords in notes normative?

I don't believe this is a problem in draft 05; we already took
normative text out of notes, and a quick scan of the document
looks like I succeeded.

> 
> 3. Section 2.1, pg. 15, "implied *LWS", contains what appears
> to be an editorial note "[jg13]".
> 

Microsoft Word droppings.  I'll fix.


> 4. Section 2.2, pg. 16, definition of "CTL", fails to consider that
> ASCII (and ISO646-1993) consider SPACE (040) to be a control character
> of the same status as DEL (177).

Sorry, no, we handle space differently than CTL, and the BNF reflects this.

> 
> 5. Section 2.2, pg. 17, 1st para., has a forward reference to
> "parameter value". Should add a cross reference to the section that
> defines this non-terminal.

I'll add a cross reference to section 3.6, where parameter values are
defined first.

> 
> 6. Section 3.4, pg. 21, specifies that "the definition associated with
> a MIME character set name MUST fully specify the mapping ...". Should
> this not be a requirement placed on the registrant of a MIME character
> set and not an HTTP implementation? Or, is this requirement really
> stating that any HTTP implementation must maintain a table of registered
> character sets known to satisfy this requirement and MUST NOT use any
> character set not present in this table? Overall, this seems an onerous
> requirement for an HTTP implementation.
> 

I'm not the MIME expert of the working group, but I take this to mean 
that this is just a restriction on which character sets may be used, and 
implies there are character sets that do not meet this requirement by 
having external profiling information.  Maybe a MIME expert can confirm 
this one.


> 7. Section 3.6, pg. 24, 3rd para., states "... (IANA) acts as a registry
> for transfer-coding value tokens" and goes on to list the initial set
> of registered tokens in which Content-Encoding tokens are included.
> Should
> this not state "acts as a registry for transfer and content coding value
> tokens"?

There are two independent registries, and the document defines
both independently.  I don't think this is a help.

> 
> 8. Section 3.6, pg. 25, 5th para., uses the term "optional metadata"
> without
> providing further definition of what such "metadata" might be. Suggest
> an
> example here or clarification.

No, I don't think an example will be helpful here; it might just be
misleading.

> 
> 9. Section 3.6, pg. 25, 6th para., discusses a "situation" regarding
> interoperability failure. This "situation" should be described more
> fully
> or an example given to make clear what the problem is.

This was discussed at length in the list, and is quite subtle, requiring
quite a few conditions before the situation arises.

It would take more space to describe than is appropriate for a protocol
spec, though we allude to the problem enough that someone might
be able to figure it out from first principles.  So since we don't
have annotation capabilities, I think it is best to keep things
as it is now.

> 
> 10. Section 3.7.1, pg. 26, 1st para., states "An entity-body transferred
> via HTTP messages MUST be represented in the appropriate canonical form
> prior to its transmission except for "text" types ...". Is it actually
> the
> case that servers are validating canonical status of entity bodies? This
> contradicts the "entity-body as payload" philosophy.

No, entities are always payload.  The requirement is that you have to
play by MIME rules for that data type, but we acknowledge the UNIX usage
of newline line terminators means that text document line terminators
don't play by MIME rules.  The Web has worked this way (just ship the
bits) from day one, and any arguments that it should play by MIME rules
for text payload at this date are doomed to failure.

> 
> 11. Section 3.7.1, pg. 26, 2nd para., uses the phrases "allows" and
> "allows
> the use of". Should these be rephrased using the "MAY" keyword? The same
> comment applies elsewhere when the work "allows" or "permitted" is used.
> 

Maybe they should be rephrased.  But not at this date. There are alot 
of instances of "allows" or "permitted", and the rephrasing is more than 
just a single word change, and often might result in more awkward text; 
this should have been brought up earlier this year when we were doing 
the general MUST/MAY/SHOULD audit.  At this stage, I want to lean toward 
"first, do no harm".

> 12. Section 3.7.2, pg. 27, 2nd para., states "In all other cases, an
> HTTP
> user agent SHOULD follow the same or similar behavior as a MIME user
> agent
> would ...". This "implied" behavior needs to be made explicit. What is
> the behavior of a MIME user agent in this context?

I think you should go read the MIME specs to find out; HTTP incorporating
recommendations for what MIME should do here is a great way for specs
to end up contradictory.

> 
> 13. Section 3.7.2, pg. 27, 4th para., contains a note regarding
> "multipart/
> form-data". Why is this specific type given a special note? How about
> "multipart/byte-ranges"?

Just to get a reference in to another multipart type used in the Web.
Multipart/byteranges are defined in this document.

> 
> 14. Section 3.8, pg. 28, 1st para., states "Product tokens SHOULD be
> short
> and to the point." and "They MUST NOT be used for advertising or other
> non-essential information." As an implementer, how can one interpret
> these
> requirements? Either make quantify them or remove them.

I think common sense is in order here. Keep'em short.  Your customers will
thank you (lower latency, fewer bytes).

We've seen people put the kitchen sink in them.

Unless others complain, I plan to keep these as is.

> 
> 15. Section 3.9 refers to "short 'floating point' numbers". I would
> suggest
> replacing this with "real numbers" since both "short" and "floating
> point"
> seems to implementation specific.

The BNF follows and is very specific.  I think it is fine as is.

> 
> 16. Section 3.10 never actually says that RFC1766 language tags "MUST"
> be
> used. I'd suggest adding stronger language here.

I think it is pretty clear as is.

> 
> 17. Section 4.2, pg. 31, 4th para., states "It MUST be possible ...". I
> would suggest replacing this with a statement that uses the converse
> using the
> form "MUST NOT ... unless ..."; e.g., "Multiple header fields MUST NOT
> be
> combined into one header unless ...".
> 

No, this is a requirement on the protocol, and for future HTTP protocols, 
due to exisiting practice.  See my reply to issue ROSS05 for further 
discussion.

> 18. Section 4.3, pg. 31, 5th para., states "The presence of a
> message-body
> in a request is signaled by the inclusion of Content-Length or Transfer-
> Encoding header field ...".  However, "multipart/byte-ranges" may
> include
> a message-body without either of these headers.

The operative words are in the first sentence:
"The presense of a message-body *in a request*".

> 
> 19. Section 4.4, pg. 32, 2nd para., has the relative clause "... which
> MUST
> NOT ...". This is not a requirement, so should not use these keywords.
> Suggest
> using "does not".

Adding quotes around "MUST NOT" is better than saying "does not",
as a response might be empty on other types of responses, therefore be
incorrect.

> 
> 20. Section 4.4, pg. 32, last para., the "Note" uses "may" and "must".
> If
> keyword usage in notes is not normative, then this should be stated in
> Section 1.2.

Probably shouldn't be a note; I believe the
requirements are actually explicit elsewhere. The may should be "might"
in any case (we are specifying HTTP/1.1 here, not what HTTP/1.0
proxies might do), though the MUST arguably should be capitalized.

> 
> 21. Section 4.4, pg. 32, 1st para., uses the phrase "cannot be". Suggest
> rephrasing to use "MUST NOT".

No, it is an explanation of why a client can't take the same action
a server might take.

> 
> 22. Section 4.4, pg. 32, 5th para., states "HTTP/1.1 user agents MUST
> notify the user when an invalid length is received and detected." This
> does
> not seem to be reflected by current industry practice (cf. IE4 and
> Netscape
> Communicator 4 behavior). If this standard is intended to capture
> current
> practice, then this is a broadening of current practice. I'd suggest
> using
> the keyword "MAY" instead.

Look, this is a server being fundamentally broken.  We don't say
that a client can't attempt to continue in the face of this fundamental
brokenness, but we do want people to complain to the server operator
about it.  Having worked hard at getting persistent connections going,
we really need to get this to work properly, and having brokenness go
undetected is a good way for systems to never get fixed or implemented
correctly.  We don't constrain how the user gets notified, and industry
practice is not an issue here: there are NO conforming HTTP/1.1 user
agents at this time.  We are defining new practice, not old (a fundamental
difference between 2068/this document and RFC 1945).

> 
> 23. Section 5.1.2, pg. 35, 3rd para., has "three options" when four
> are described.
> 

Yup.  Dave Kristol reported this one too.

> 24. Section 5.1.2, pg. 35, 5th para., uses the keyword "REQUIRED"
> instead
> of "MUST". It seems that "MUST" is given preference throughout this
> document. The same comment applies to the use of "OPTIONAL" vs. "MAY".
> 

See section 1.2 and RFC 2119, as referenced in the document.

> 25. Section 7.2.1, pg. 41, 4th para., gives considerable flexibility to
> a recipient regarding the heuristic guessing of an entity's content
> type.
> In particular, no default interpretation is dictated. In contrast, no
> flexibility is given in the heuristic determination of a "text" content
> type's
> character set (cf. Section 3.4, where a default of ISO8859-1 is
> dictated).
> I wonder why the two quite different approaches are maintained. In
> particular,
> I do know that the requirements of Section 3.4 will "break" many
> existing
> implementations which assume that the "default" is applied as a no more
> than
> a default heuristic in the absence of an explicit CHARSET and not as an
> immediate override to any heuristics. I fully expect our East Asian
> customers
> to require this feature of Section 3.4 to be permanently disabled to
> accommodate
> existing practice.

We've already been through all this with our East Asian friends.  It
reflects the reality of server's content (where Content-Type is often
not provided).  This is the mess of existing servers.
The best we can do with the current situation is
require 1.1 to behave properly when told explicitly, and deal with exising
brokenness.

> 
> 26. Section 8.1.3, p. 43, 1st para., has the typo "in14.10." Should
> instead
> read "in section 14.10.".

Yup. I'll fix this.

> 
> 27. Section 8.1.4, pg. 44, 6th para., has the phrase "... SHOULD
> maintain
> AT MOST 2 connections ..."; since "AT MOST" is not a keyword, suggest
> rephrasing his requirement using "SHOULD NOT maintain more than 2
> connections".
> 

Good rewording.  Thanks.


> 
> 29. Section 8.2.4, pg. 45, 1st para., uses the term "end-client". This
> term seems to be nonstandard with other terminology regarding
> communicating
> parties in the HTTP context.

Yes, I think "client" should be sufficient here.

> 
> 30. Section 9, pg. 48, 2nd para., appears to be partially redundant with
> Section 5.1.2, pg. 35, line 2078 (in file). Furthermore, does this
> requirement
> actually hold for forms of Request-URI other than abs_path? For example,
> does an OPTIONS * HTTP/1.1 request require a Host header?

Yes, that sentence is redundant, and exact wording elsewhere is already under
discussion in ROSS15.

> 
> 31. Section 9.2., pg. 49, 2nd para., states "Response to this method are
> not cachable." Should this be made stronger with either MUST NOT or
> SHOULD NOT?
> The same comment applies in a variety of other context regarding the
> suitability or non-suitability of caching a response.
> 

See separate message on this topic.

> 32. Section 9.3, pg. 50, 4th para., uses the expression "if and only if
> ...".
> Suggest using "MUST NOT unless" instead.

No, the actual requirements are in section 13.  Saying MUST NOT here
is being repetatively redundant. :-)  If and only if is pointing
out to readers that they really better follow the cross reference.

> 
> 33. Section 9.6, pg. 51, 1st para., uses the phrase "the origin server
> can
> create ...". Suggest using MAY instead. Should review other uses of
> "can"
> in this document for similar substitution. Same comment applies to uses
> of
> "cannot" which in most cases should be replaced with "MUST NOT".

I don't think this helps; and in fact, cannot is generally used in
stating fact, not requirements on the protocol.

> 
> 34. Section 9.6, pg. 52, 3rd para., uses the phrase "server" where
> "origin
> server" appears to be implied. Suggest reviewing uses of "server" for
> possible
> narrower semantics.

Servers can be origin servers or proxies.

Origin servers are places where documents originate.

I think the terminology is correct as applied; we've tried to be very 
careful here.  Please see the terminology section in 1.3 for definitions.  
Certainly your specific complaint in 9.6 applies to servers in general, 
not just to origin server.

If you have specific complaints, please make them.

> 
> 35. Section 9.8, pg. 53, 3rd para., note "Responses to this method MUST
> NOT
> be cached." while most other methods have "Responses to this method are
> not
> cachable." (cf. Section 9.6, 9.7). Suggest making this language more
> consistent.

See separate mail on this topic.

> 
> 36. Section 9.9 may wish to substitute its reference [44] with the new
> I-D
> <draft-luotonen-web-proxy-tunneling-01.txt>. However, note that the
> argument to the CONNECT method prescribed by this I-D is not conformant
> with the specification of "Request URI" in Section 5.1.2. Perhaps the
> reference to the tunneling draft should be removed altogether with this
> keyword just stated as "reserved"?

If you look at the reference [44], you will see it is to Ari's draft.

All the HTTP spec does is reserve the name CONNECT; I can't control
what Ari's draft says.

Not having any reference seems braindead; people should be able to find
out what the method name is used for.

> 
> 37. Section 10.2.5, pg. 56, 2nd para., states "any new or updated
> metainformation SHOULD be applied to the document currently in the user
> agent's active view." This conditional requirement seems to be place a
> constraint on UA semantics outside the scope of HTTP proper. Suggest
> changing SHOULD to MAY.

We don't constrain how or what an UA does with the metainformation, just 
that if it is in view and they get a 204 No Content, whatever is viewed 
gets updated.  This seems reasonable to me.  A content provider should 
be able to predict what happens at the end user viewpoint or what good 
would this response be?  For example, even though the content hasn't changed, 
the expiration date might have changed.  If a client is displaying that 
meta-data, it had really better display the updated expiration date.  
Remember, this is an optimization so that you don't have to send a whole 
entity just because something else has changed.  If it can't be relied 
on to show the user what has changed, the only alternative a content provider 
will have is to send the whole entity again, defeating the attempted 
optimization entirely.

> 
> 38. Section 10.2.6 states "the user agent SHOULD reset the document
> view".
> This conditional requirement seems to place a constraint on UA semantics
> outside the scope of HTTP proper. Suggest changing SHOULD to MAY.

Relaxing the requirement would defeat the purpose entirely.  It is
entirely proper for HTTP to constrain the UA semantics; it is inappropriate
to demand how the semantic meaning be met (i.e. the UA syntactic
presentation of the semantic intent).  If a content provider can't
rely on the semantics of a message, what can they rely on?

> 
> 39. Section 10.2.7, pg. 56, 1st para., uses "MUST" in the past tense.
> Suggest rephrasing this to not use past tense.

It seems appropriate to me as is to use the past tense; it is referring
to the request on a response.

> 
> 40. Section 10.2.7, pg. 57, 2nd para., states "the response MUST include
> all of the entity-headers that would have been returned ...".  Which
> entity-headers are these precisely?

Whatever headers you would have returned on a 200 response, as the
text indicates.  Since this is an arbitrary set you might generate,
there is no way to enumerate them; what you can do is to tell
people to "do what they'd do in the normal case".

The point of this code was extensively discussed on the mailing list;
the problem is that the headers can be very large, and having to
always send them on a range request would be a bummer.


> 
> 42. Section 10.3.2, pg. 58, 2nd para., states "the entity of the
> response
> SHOULD contain a short hypertext note ...". Suggest formalizing this to
> state a specific content type 

No, you can't specify the content type; for example, with XML's deployment,
it will be perfectly appropriate if the note is in XML (if sent to
an XML capable user agent).

> or, alternatively, not use the term
> hypertext.
> The same comment applies in a number of other Sections: search for
> "short
> hypertext note".

Come now, this is HTTP after all "HYPER-TEXT TRANSPORT PROTOCOL", after
all.  The notes are for the benefit of the deployed base of user
agents that don't know about the new features, so that end users
can continue to use them (though more clumsily).

The UA's I'm familiar with all understand HTML...

> 
> 43. Section 10.3.3, pg. 58, 1st para., states "This response is only
> cachable if indicated by a Cache-Control or Expires header field." In
> contrast,
> other Sections (cf. 10.3.1, 10.3.2, etc.) have "This response is
> cachable
> unless indicated otherwise." Suggest making these more consistent if
> possible
> or referring to Section 13.4.

I think things are already consistent; there are two cases here:
things that are cachable unless marked uncachable, and things that are
not cachable by default unless marked cachable.

These seem the same, but are not quite the same, as for historical 
(hysterical) reasons, HTTP has not had its act together on a consistent 
caching model.  So we acknowledge existing caching practic, but allow 
implementations to relax the current constraints if the server says it 
is OK.  This should enable a wider range of responses to be cachable.

For example, alot of POST responses are potentially cachable, but aren't
cached today; in HTTP/1.1 a server can be clever and mark the response
cachable if it is safe to do so.  A classic example is a search engine
which updates its underlying database once per day; it should be
able to mark all responses valid to the same query for at least the
time until the next database update.


> 
> 44. Section 10.3.6 has a note describing "significant security
> consequences".
> Could these consequences be detailed somewhere in this specification?
> 

It could be, but as few people had implemented 305 before we understood 
the problem and issued a previous draft with corrections, I don't think 
it is worth the space.  It had extensive discussion in the mailing list.

The reason it is limited to origin servers is that only the origin
server is authoritative for that resource.  A proxy might otherwise
be redirecting things over which it is not authoritative.


>45. Section 10.3.7 has a typo. Change "... specification, and is no
> longer ..."
> to "... specification, is no longer ...".

Sure.  I'll fix.

> 
> 46. Section 10.4, pg. 61, 1st para., has a superfluous comma after "the
> response".

It is actually the second paragraph.  I'll fix.

> 
> 47. Section 10.4.8 has "This code is similar to 401 (Unauthorized), but
> indicates that the client MUST first authenticate ..." This doesn't seem
> to be a requirement but a statement of fact. Suggest changing to "but
> indicates that the client did not first authenticate itself or its
> credentials were not accepted ...".

See issue ROSS12 for resolution.

> 
> 48. Section 10.4.10, pg. 63, 2nd para., has the phrase "the server
> might".
> Suggest changing to "the server MAY". Should review other uses of
> "might"
> in this specification.

As this code is an extensibility hook for WEBDAV or similar extensions, 
it is a hook and the text is written hypothetically to show it intended 
use, rather than a normative fashion.

> 
> 49. Section 10.4.10, pg. 63, 2nd para., has the phrase "would likely".
> Suggest
> rephrasing to use MAY or SHOULD instead.

Same as 48.

> 
> 50. Section 10.4.11 has "This response is cachable ...". Suggest
> rephrasing
> as "MAY be cached". It may be useful here to point out that this is the
> only
> cachable 4XX response (according to Section 13.4).

Yes, it is cachable as you want dead links to be able to be detected without 
a round trip. Previous file system experience is that this can be a very 
significant performance optimization.

I guess I can't get very concerned about saying "is cachable" vs.
"MAY be cached" to want to fix all the occurances of the "is cachable"
phrase at this date, particularly since it is more awkward.

> 
> 51. Section 11 uses the term "OPTIONAL" as a keyword in a non-keyword
> context.
> 

We're just trying to make it clear that if all you want to do is build
a piece of software to get documents from a public web server, you
don't necessarily have to go to the trouble of doing authentication.
			- Jim
Received on Friday, 13 November 1998 11:43:08 UTC