- From: Chuck Shotton <cshotton@biap.com>
- Date: Thu, 18 May 1995 17:58:20 -0500
- To: Brian Behlendorf <brian@organic.com>
- Cc: David - Morris <dwm@shell.portal.com>, John Franks <john@math.nwu.edu>, luotonen@netscape.com, http-wg%cuckoo.hpl.hp.com@hplb.hpl.hp.com
At 2:55 PM 5/18/95, Brian Behlendorf wrote: >On Thu, 18 May 1995, Chuck Shotton wrote: >However, if, as I proposed, the client takes everything after the ; and >makes them HTTP request headers, the server never sees ";byterange=blah" in >the GET line of the HTTP request. This requires changes to WWW clients AND servers. The current proposal leaves clients unaffected. >> I agree. The first byte in a file is byte number 1. If I want the second >> thru fourth bytes of a file, I want to specify the range 2-4, not 1-3. The >> latter is hardly intuitive. > >But if people aren't constructing these ranges by hand, what difference >does it make whether the number maps to the mnemonic that "the 1st object >should be called object number 1"? Well, what is gained by calling the first byte in a file "0", or the 50th byte in a file 49? This is a tiny nit, but the world isn't a C array variable. It's a heck of a lot easier to look at 20-30 (when debugging, developing, or anything else) and realize it means bytes twenty through thirty and not twenty one through thirty one. >> I still maintain that "?" is the appropriate separator for byte range >> syntax. > >Uh, how then do I use CGI QUERY_STRING variables along with byterange? Details, details. :) But how does a VMS server know that ";22" means byte 22 and not version 22 of the file? How about if we re-visit the "#" proposal again and investigate whether or not this really hoses clients as I first suspected? >> You are asking the server to search for a particular range of bytes >> in a file, which is consistent with searching for keywords in a file, or >> coordinates on a map. As I mentioned earlier, semicolon has a specific >> meaning in many file systems that will conflict with its use as a separator >> for byte range info. The use of ";" is a bad choice. > >But isn't "?" a perfectly valid Mac filename character too? Sure, but it SHOULD be encoded as part of the file name with %xx encodings, were the ? used as a search arg separator isn't encoded. If you parse URLs PRIOR TO %xx decodings, special chars retain their meaning. I guess this is an argument for VMS servers (and HTML authors) to encode ";" in file names. >Ugh, sometimes this whole system just makes you want to scream. Unfortunately, what we are really doing is taking away the opacity of URLs. This was a big discussion point between myself and Adobe when we originally hashed out some of the problems with their first CGI-based byte range scheme. Ideally, there should be no need for a "standard" byte range syntax, because URLs are either entered by document authors or generated by server side applications. Since all URLs currently originate from the server side (unless hand-entered by a user), there hasn't been a need to standardize the path portion of URLs. The reason this is an issue (though Ari hasn't explicitly said it) is that suddenly, there is a need for client side helpers to be able to generate and request "non-standard" URLs in a cross platform way. Specifically, Acrobat needs a cross-platform way to request byte ranges. Suddenly, the path portions of URLs aren't coming from the server (either being generated or contained in HTML docs). Instead, smart things on the client side are trying to invade the turf of the server's interpretation of URLs and make them up themselves. This eliminates the latitude that servers have enjoyed in keeping URL information private as far as semantic interpretation is concerned. I proposed an alternate scheme, where servers TELL the clients how to request byte ranges, so that the client may do it in a server specific way without having to have knowledge of the specific server's syntax for byte ranges. See the "*" section below for more details. It works fine for specific viewer apps, but needs more work to be a general solution. This is a paradigm shift that shouldn't be allowed to pass without careful scrutiny of the implications. I agree that in this specific case, a common byte range syntax would be nice. It would be extra nice if servers all supported it. The problem is that we are paying for this niceness by forcing servers to give up total control over the interpretation of the path portion of a URL. This could drastically complicate the URL standard and the coordination required between client and server apps. I think we should view this particular proposal as a single-case solution and avoid trying to generalize things for a bit. It would actually be better if this was a vendor proposed extension that could optionally be supported by server authors, rather than trying to shoehorn it into the existing standards. As I said, it appears to perturb the opaque URL assumptions that the Web is based on. In specific cases such as the Acrobat example, the impact is minimal and the benefits are large. However, allowing clients to continue to drive server behavior in regard to URL interpretation is a slippery path. >> Making a new HTTP header means that it will never gain support. Allowing it >> to be part of the URL (where it belongs in my opinion) means that it can be >> retrofitted into existing servers with the addition of a CGI. And as for #, >> ? is a better choice than that or ";". > >Ask the URI working group where they would rather see this functionality >implemented, and they'll probably say HTTP. Why won't new HTTP headers >get support? Because existing clients will have to be modified, distributed, and be in use before byte range URLs will work universally. If it is tagged onto the URL, existing clients will work without modification. The place to fix this is in the server, where you can fix it once, rather than having to battle with upgrading the entire installed base of WWW clients. If a server doesn't support byte ranges, it's a safe bet that it won't be serving URLs pointing back to itself that specify byte ranges. If, on the other hand, byte ranges are implemented as HTTP request header fields, and a client doesn't support it, a server that generates URLs with byte ranges won't be able to operate with a client that doesn't understand them. * Alternate proposal for byte range URL generation: Originally, Adobe proposed a CGI-based syntax for retrieving byte ranges, which passed a numeric range to a CGI, which would read the bytes and return them. The syntax of the URL that Acrobat generated assumed all CGIs live in /cgi-bin and have path arguments separated from the URL by a "/". This obviously breaks on many non-Unix servers. As an alternative, I suggested that Adobe develop a CGI that when called from Acrobat with no arguments, returned the server's preferred syntax (e.g., a C sprintf format statement) for specifying the URL to the CGI and byte range arguments. The client/viewer (Acrobat) could then use this syntax in subsequent URL requests (sent thru the WWW client) to request byte ranges. Of course, this scheme implies an intelligent viewer like Acrobat, which is simply using the HTTP server as a convenient way to get random access to a distribute file system. This doesn't handle the general purpose case for all documents that have ranges of bytes in them that *could* be viewed by a dumb WWW client. However, it does show that there are alternate methods to solving this problem besides hacking URL syntax or HTTP header contents. And, it is possible to retain a server's control over the interpretation of the URL paths sent to it. I would really like to encourage everyone to spend some time considering these proposals in detail before we rush off to add some more duct tape and bailing wire to the existing standards. If we can figure a way to do this with some standard CGI behavior, the entire HTTP/HTML/URI standards process is left unmolested and we will have probably done the right thing. There doesn't appear to be any compelling reason why this byte range thing has to be implemented as a change to these existing standards instead of some private, CGI-based implementation. ----------------------------------------------------------------------- Chuck Shotton cshotton@biap.com http://www.biap.com/ cshotton@oac.hsc.uth.tmc.edu "I am NOT here."
Received on Thursday, 18 May 1995 15:59:10 UTC