- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 16 Jan 2011 20:37:41 +0100
- To: ietf-http-wg@w3.org
- CC: draft-bryan-metalinkhttp@tools.ietf.org
On 25.09.2010 16:29, Peter Pöml wrote: > Hi, > > a few thoughts regarding version 18 of http://tools.ietf.org/html/draft-bryan-metalinkhttp > ... Below is my feedback (nothing serious, I'd say): Abstract This document specifies Metalink/HTTP: Mirrors and Cryptographic Hashes in HTTP Headers, a different way to get information that is ...header fields... (yes, bean counting...) usually contained in the Metalink XML-based download description format. Metalink/HTTP describes multiple download locations (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures, and other information using existing standards for HTTP headers. Clients can transparently use this information to make file transfers more robust and reliable. maybe strike "transparently"? ... 1. Introduction Metalink/HTTP is an alternative representation of Metalink information, which is usually presented as an XML-based document format [RFC5854]. Metalink/HTTP attempts to provide as much functionality as the Metalink/XML format by using existing standards such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230], and ETags [RFC2616]. Metalink/HTTP is used to list information about You may want to expand ETags once to "Entity Tags". a file to be downloaded. This can include lists of multiple URIs (mirrors), Peer-to-Peer information, cryptographic hashes, and digital signatures. ... This document describes a mechanism by which the benefit of mirrors can be automatically and more effectively realized. All the information about a download, including mirrors, cryptographic hashes, digital signatures, and more can be transferred in coordinated HTTP Headers. This Metalink transfers the knowledge of Incomplete sentence? "This Metalink transfers..." Or maybe state once that you call the thing described before "Metalink". the download server (and mirror database) to the client. Clients can fallback to other mirrors if the current one has an issue. With this knowledge, the client is enabled to work its way to a successful download even under adverse circumstances. All this is done transparently to the user and the download is much more reliable and Doing it totally transparently might be a problem with respect to user privacy. Maybe the user doesn't *want* to hit a particular server? ... [[ Discussion of this draft should take place on IETF HTTP WG mailing list at ietf-http-wg@w3.org or the Metalink discussion mailing list located at metalink-discussion@googlegroups.com. To join the list, visit http://groups.google.com/group/metalink-discussion . ]] This should go on the front page as "Editorial Note". ... 1.2. Examples A brief Metalink server response with ETag, mirrors, .metalink, OpenPGP signature, and a cryptographic hash of the whole file: Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5=" Link: <http://www2.example.com/example.ext>; rel="duplicate" Link: <ftp://ftp.example.com/example.ext>; rel="duplicate" Link: <http://example.com/example.ext.torrent>; rel="describedby"; type="application/x-bittorrent" Link: <http://example.com/example.ext.metalink>; rel="describedby"; type="application/metalink4+xml" Link: <http://example.com/example.ext.asc>; rel="describedby"; type="application/pgp-signature" Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ== Note: there's no need quote the relation type here. Note: it's unfortunate that there doesn't seem to be a registered media type for torrent files. ... Metalink resources include a Link header [RFC5988] to present a list of mirrors in the response to a client request for the resource. Metalink servers MUST include the cryptographic hash of a resource via Instance Digests in HTTP [RFC3230]. Valid algorithms are found in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest Algorithm Values" at http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml . Surplus whitespace. Maybe put the URI into angle brackets. Metalink servers are HTTP servers with one or more Metalink resources. Metalink servers MUST support the Link header for listing mirrors and MUST support Instance Digests in HTTP [RFC3230]. Mirror and cryptographic hash information provided by the originating Metalink server MUST be considered authoritative. Metalink servers I have a problem with "MUST be considered authorative" when there's no clear explanation what "authorative". Is this really a conformance requirement? and their associated mirror servers are RECOMMENDED to all share the I think it's easier to read when you use SHOULD instead of RECOMMENDED (this applies to many more places). same ETag policy (ETag Synchronization), i.e. based on the file contents (cryptographic hash) and not server-unique filesystem metadata. The emitted ETag MAY be implemented the same as the As the term "Etag policy" is important, it might make sense to introduce it more formally. Instance Digest for simplicity. Metalink servers MAY offer Metalink/ XML documents that contain cryptographic hashes of parts of the file and other information. I'd change "MAY" to "can" here and in many more places. Use MAY (and friends) only to explain conformance requirements. ... Metalink clients use the mirrors provided by a Metalink server with Link header [RFC5988]. Metalink clients MUST support HTTP and are "with *a* link header"? ... A brief Metalink server response with two mirrors only: Link: <http://www2.example.com/example.ext>; rel="duplicate"; pri=1; pref=1 Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"; pri=2; geo="gb"; depth=1 [[Some organizations have many mirrors. Only send a few mirrors, or only use the Link header if Want-Digest is used?]] RFC 5988 really doesn't say who can define extension parameters. It probably should. Mark? ... This is purely an expression of the server's preferences; it is up to the client what it does with this information, particularly with reference to how many servers to use at any one time. A client MUST respect the server's priority ordering, however. What does it mean to "respect" it? Why is this a MUST? ... Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2 "Entries for a mirror server can have..." ... There are two types of mirror servers: preferred and normal. Preferred mirror servers are HTTP mirror servers that MUST share the same ETag policy as the originating Metalink server. Preferred mirrors make it possible to detect early on, before data is transferred, if the file requested matches the desired file. Note: that also could be achieved by introducing a new conditional header for the digest, or by using the extension points in the WebDAV "If" header. Preferred HTTP mirror servers have a "pref" value of 1. By default, if unspecified then mirrors are considered "normal" and do not necessarily share the same ETag policy. FTP mirrors, as they do not emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash] allows for FTP mirrors to be coordinated and provide file hashes). If you need only "1", then this parameter may not need a value at all (the grammar allows that). ... [[Suggestion: In order for clients to identify servers that have coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g. ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5=" ]] I think this may be hard to deploy (changing etags as opposed to adding metadata). ... Mirror servers MAY have a "depth" value, where "depth=0" is the default. A value of 0 means ONLY that file is mirrored. A value of 1 means that file and all other files and subdirectories in the directory are mirrored. A value of 2 means the directory above, and all files and subdirectories, are mirrored. For each higher value, another directory closer to the root is mirrored. This probably should be rephrased in terms of URI path segments. ... 6. Cryptographic Hashes of Whole Files Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for files they describe with mirrors. Mirror servers SHOULD as well. Thus, when the Digest is missing, the Link headers should be ignored? must? ... 7. Client / Server Multi-source Download Interaction Metalink clients begin a download with a standard HTTP [RFC2616] GET request to the Metalink server. A Range limit is optional, not required. Alternatively, Metalink clients can begin with a HEAD request to the Metalink server to discover mirrors via Link headers. So returning them on HEAD is REQUIRED, right? ... Downloads from mirrors that do not have the same file size as the Metalink server MUST be aborted. It this is a MUST it needs more details (closing the connection?). Otherwise simply state that such as response needs to be considered unusable and leave it to the client how to deal with it. ... Once the download has completed, the Metalink client MUST verify the cryptographic hash of the file. And then do what if it fails? ... The size of chunks chosen by the client should be sufficiently large that the chunk request headers and reponse headers represent neglible overhead, and sufficiently large that they can be pipelined effectively without needing a very high rate of chunk requests. At the same time, the amount of time wasted waiting for the last chunk to download from the last server after all the other servers have finished should be minimized. Thus we currently recommend that a chunk size of at least 10KBytes should be used. If the file being transfered is very large, or the download speed very high, this can be increased to perhaps 1MByte. As network bandwidths increase, we expect these numbers to increase appropriately, so that the time to transfer a chunk remains significantly larger than the latency of requesting a chunk from a server. Wow. This appears to ignore the overhead of Range requests on the *server*. Note that sometimes, content is not served directly from the filesystem, and implementing Range may not be possible using seeks. Now one could argue that servers suffering from the problem should not support this in the first place, but still... Given the file sizes for which parallel downloads make any sense today, is it *really* a good idea to recommend 10K segments? 9. IANA Considerations Accordingly, IANA has made the following registration to the Link Relation Type registry. ... No, they haven't yet :-) Just state what they are supposed to do, the RFC Editor will rephrase this on publication anyway. ... Best regards, Julian
Received on Sunday, 16 January 2011 19:38:37 UTC