- From: Julian Reschke <julian.reschke@gmx.de>
- Date: Sun, 16 Jan 2011 20:37:41 +0100
- To: ietf-http-wg@w3.org
- CC: draft-bryan-metalinkhttp@tools.ietf.org
On 25.09.2010 16:29, Peter Pöml wrote:
> Hi,
>
> a few thoughts regarding version 18 of http://tools.ietf.org/html/draft-bryan-metalinkhttp
> ...
Below is my feedback (nothing serious, I'd say):
Abstract
This document specifies Metalink/HTTP: Mirrors and Cryptographic
Hashes in HTTP Headers, a different way to get information that is
...header fields... (yes, bean counting...)
usually contained in the Metalink XML-based download description
format. Metalink/HTTP describes multiple download locations
(mirrors), Peer-to-Peer, cryptographic hashes, digital signatures,
and other information using existing standards for HTTP headers.
Clients can transparently use this information to make file transfers
more robust and reliable.
maybe strike "transparently"?
...
1. Introduction
Metalink/HTTP is an alternative representation of Metalink
information, which is usually presented as an XML-based document
format [RFC5854]. Metalink/HTTP attempts to provide as much
functionality as the Metalink/XML format by using existing standards
such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230],
and ETags [RFC2616]. Metalink/HTTP is used to list information about
You may want to expand ETags once to "Entity Tags".
a file to be downloaded. This can include lists of multiple URIs
(mirrors), Peer-to-Peer information, cryptographic hashes, and
digital signatures.
...
This document describes a mechanism by which the benefit of mirrors
can be automatically and more effectively realized. All the
information about a download, including mirrors, cryptographic
hashes, digital signatures, and more can be transferred in
coordinated HTTP Headers. This Metalink transfers the knowledge of
Incomplete sentence? "This Metalink transfers..." Or maybe state once
that you call the thing described before "Metalink".
the download server (and mirror database) to the client. Clients can
fallback to other mirrors if the current one has an issue. With this
knowledge, the client is enabled to work its way to a successful
download even under adverse circumstances. All this is done
transparently to the user and the download is much more reliable and
Doing it totally transparently might be a problem with respect to user
privacy. Maybe the user doesn't *want* to hit a particular server?
...
[[ Discussion of this draft should take place on IETF HTTP WG mailing
list at ietf-http-wg@w3.org or the Metalink discussion mailing list
located at metalink-discussion@googlegroups.com. To join the list,
visit http://groups.google.com/group/metalink-discussion . ]]
This should go on the front page as "Editorial Note".
...
1.2. Examples
A brief Metalink server response with ETag, mirrors, .metalink,
OpenPGP signature, and a cryptographic hash of the whole file:
Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
Link: <http://www2.example.com/example.ext>; rel="duplicate"
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
Link: <http://example.com/example.ext.torrent>; rel="describedby";
type="application/x-bittorrent"
Link: <http://example.com/example.ext.metalink>; rel="describedby";
type="application/metalink4+xml"
Link: <http://example.com/example.ext.asc>; rel="describedby";
type="application/pgp-signature"
Digest: SHA-256=MWVkMWQxYTRiMzk5MDQ0MzI3NGU5NDEyZTk5OWY1ZGFmNzgyZTJlO
DYzYjRjYzFhOTlmNTQwYzI2M2QwM2U2MQ==
Note: there's no need quote the relation type here.
Note: it's unfortunate that there doesn't seem to be a registered media
type for torrent files.
...
Metalink resources include a Link header [RFC5988] to present a list
of mirrors in the response to a client request for the resource.
Metalink servers MUST include the cryptographic hash of a resource
via Instance Digests in HTTP [RFC3230]. Valid algorithms are found
in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest
Algorithm Values" at
http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml .
Surplus whitespace. Maybe put the URI into angle brackets.
Metalink servers are HTTP servers with one or more Metalink
resources. Metalink servers MUST support the Link header for listing
mirrors and MUST support Instance Digests in HTTP [RFC3230]. Mirror
and cryptographic hash information provided by the originating
Metalink server MUST be considered authoritative. Metalink servers
I have a problem with "MUST be considered authorative" when there's no
clear explanation what "authorative". Is this really a conformance
requirement?
and their associated mirror servers are RECOMMENDED to all share the
I think it's easier to read when you use SHOULD instead of RECOMMENDED
(this applies to many more places).
same ETag policy (ETag Synchronization), i.e. based on the file
contents (cryptographic hash) and not server-unique filesystem
metadata. The emitted ETag MAY be implemented the same as the
As the term "Etag policy" is important, it might make sense to introduce
it more formally.
Instance Digest for simplicity. Metalink servers MAY offer Metalink/
XML documents that contain cryptographic hashes of parts of the file
and other information.
I'd change "MAY" to "can" here and in many more places. Use MAY (and
friends) only to explain conformance requirements.
...
Metalink clients use the mirrors provided by a Metalink server with
Link header [RFC5988]. Metalink clients MUST support HTTP and are
"with *a* link header"?
...
A brief Metalink server response with two mirrors only:
Link: <http://www2.example.com/example.ext>; rel="duplicate";
pri=1; pref=1
Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
pri=2; geo="gb"; depth=1
[[Some organizations have many mirrors. Only send a few mirrors, or
only use the Link header if Want-Digest is used?]]
RFC 5988 really doesn't say who can define extension parameters. It
probably should. Mark?
...
This is purely an expression of the server's preferences; it is up to
the client what it does with this information, particularly with
reference to how many servers to use at any one time. A client MUST
respect the server's priority ordering, however.
What does it mean to "respect" it? Why is this a MUST?
...
Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2
"Entries for a mirror server can have..."
...
There are two types of mirror servers: preferred and normal.
Preferred mirror servers are HTTP mirror servers that MUST share the
same ETag policy as the originating Metalink server. Preferred
mirrors make it possible to detect early on, before data is
transferred, if the file requested matches the desired file.
Note: that also could be achieved by introducing a new conditional
header for the digest, or by using the extension points in the WebDAV
"If" header.
Preferred HTTP mirror servers have a "pref" value of 1. By default,
if unspecified then mirrors are considered "normal" and do not
necessarily share the same ETag policy. FTP mirrors, as they do not
emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash]
allows for FTP mirrors to be coordinated and provide file hashes).
If you need only "1", then this parameter may not need a value at all
(the grammar allows that).
...
[[Suggestion: In order for clients to identify servers that have
coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g.
ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5="
]]
I think this may be hard to deploy (changing etags as opposed to adding
metadata).
...
Mirror servers MAY have a "depth" value, where "depth=0" is the
default. A value of 0 means ONLY that file is mirrored. A value of
1 means that file and all other files and subdirectories in the
directory are mirrored. A value of 2 means the directory above, and
all files and subdirectories, are mirrored. For each higher value,
another directory closer to the root is mirrored.
This probably should be rephrased in terms of URI path segments.
...
6. Cryptographic Hashes of Whole Files
Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for
files they describe with mirrors. Mirror servers SHOULD as well.
Thus, when the Digest is missing, the Link headers should be ignored?
must?
...
7. Client / Server Multi-source Download Interaction
Metalink clients begin a download with a standard HTTP [RFC2616] GET
request to the Metalink server. A Range limit is optional, not
required. Alternatively, Metalink clients can begin with a HEAD
request to the Metalink server to discover mirrors via Link headers.
So returning them on HEAD is REQUIRED, right?
...
Downloads from mirrors that do not have the same file size as the
Metalink server MUST be aborted.
It this is a MUST it needs more details (closing the connection?).
Otherwise simply state that such as response needs to be considered
unusable and leave it to the client how to deal with it.
...
Once the download has completed, the Metalink client MUST verify the
cryptographic hash of the file.
And then do what if it fails?
...
The size of chunks chosen by the client should be sufficiently large
that the chunk request headers and reponse headers represent neglible
overhead, and sufficiently large that they can be pipelined
effectively without needing a very high rate of chunk requests. At
the same time, the amount of time wasted waiting for the last chunk
to download from the last server after all the other servers have
finished should be minimized. Thus we currently recommend that a
chunk size of at least 10KBytes should be used. If the file being
transfered is very large, or the download speed very high, this can
be increased to perhaps 1MByte. As network bandwidths increase, we
expect these numbers to increase appropriately, so that the time to
transfer a chunk remains significantly larger than the latency of
requesting a chunk from a server.
Wow. This appears to ignore the overhead of Range requests on the
*server*. Note that sometimes, content is not served directly from the
filesystem, and implementing Range may not be possible using seeks. Now
one could argue that servers suffering from the problem should not
support this in the first place, but still...
Given the file sizes for which parallel downloads make any sense today,
is it *really* a good idea to recommend 10K segments?
9. IANA Considerations
Accordingly, IANA has made the following registration to the Link
Relation Type registry.
...
No, they haven't yet :-) Just state what they are supposed to do, the
RFC Editor will rephrase this on publication anyway.
...
Best regards, Julian
Received on Sunday, 16 January 2011 19:38:37 UTC