W3C home > Mailing lists > Public > ietf-http-wg@w3.org > January to March 2011

Re: draft-bryan-metalinkhttp-18.txt

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 16 Jan 2011 20:37:41 +0100
Message-ID: <4D334905.5020601@gmx.de>
To: ietf-http-wg@w3.org
CC: draft-bryan-metalinkhttp@tools.ietf.org
On 25.09.2010 16:29, Peter Pml wrote:
> Hi,
> a few thoughts regarding version 18 of http://tools.ietf.org/html/draft-bryan-metalinkhttp
> ...

Below is my feedback (nothing serious, I'd say):


    This document specifies Metalink/HTTP: Mirrors and Cryptographic
    Hashes in HTTP Headers, a different way to get information that is

...header fields... (yes, bean counting...)

    usually contained in the Metalink XML-based download description
    format.  Metalink/HTTP describes multiple download locations
    (mirrors), Peer-to-Peer, cryptographic hashes, digital signatures,
    and other information using existing standards for HTTP headers.
    Clients can transparently use this information to make file transfers
    more robust and reliable.

maybe strike "transparently"?


1.  Introduction

    Metalink/HTTP is an alternative representation of Metalink
    information, which is usually presented as an XML-based document
    format [RFC5854].  Metalink/HTTP attempts to provide as much
    functionality as the Metalink/XML format by using existing standards
    such as Web Linking [RFC5988], Instance Digests in HTTP [RFC3230],
    and ETags [RFC2616].  Metalink/HTTP is used to list information about

You may want to expand ETags once to "Entity Tags".

    a file to be downloaded.  This can include lists of multiple URIs
    (mirrors), Peer-to-Peer information, cryptographic hashes, and
    digital signatures.


    This document describes a mechanism by which the benefit of mirrors
    can be automatically and more effectively realized.  All the
    information about a download, including mirrors, cryptographic
    hashes, digital signatures, and more can be transferred in
    coordinated HTTP Headers.  This Metalink transfers the knowledge of

Incomplete sentence? "This Metalink transfers..." Or maybe state once 
that you call the thing described before "Metalink".

    the download server (and mirror database) to the client.  Clients can
    fallback to other mirrors if the current one has an issue.  With this
    knowledge, the client is enabled to work its way to a successful
    download even under adverse circumstances.  All this is done
    transparently to the user and the download is much more reliable and

Doing it totally transparently might be a problem with respect to user 
privacy. Maybe the user doesn't *want* to hit a particular server?


    [[ Discussion of this draft should take place on IETF HTTP WG mailing
    list at ietf-http-wg@w3.org or the Metalink discussion mailing list
    located at metalink-discussion@googlegroups.com.  To join the list,
    visit http://groups.google.com/group/metalink-discussion . ]]

This should go on the front page as "Editorial Note".


1.2.  Examples

    A brief Metalink server response with ETag, mirrors, .metalink,
    OpenPGP signature, and a cryptographic hash of the whole file:

    Etag: "thvDyvhfIqlvFe+A9MYgxAfm1q5="
    Link: <http://www2.example.com/example.ext>; rel="duplicate"
    Link: <ftp://ftp.example.com/example.ext>; rel="duplicate"
    Link: <http://example.com/example.ext.torrent>; rel="describedby";
    Link: <http://example.com/example.ext.metalink>; rel="describedby";
    Link: <http://example.com/example.ext.asc>; rel="describedby";

Note: there's no need quote the relation type here.

Note: it's unfortunate that there doesn't seem to be a registered media 
type for torrent files.


    Metalink resources include a Link header [RFC5988] to present a list
    of mirrors in the response to a client request for the resource.
    Metalink servers MUST include the cryptographic hash of a resource
    via Instance Digests in HTTP [RFC3230].  Valid algorithms are found
    in the IANA registry named "Hypertext Transfer Protocol (HTTP) Digest
    Algorithm Values" at
    http://www.iana.org/assignments/http-dig-alg/http-dig-alg.xhtml .

Surplus whitespace. Maybe put the URI into angle brackets.

    Metalink servers are HTTP servers with one or more Metalink
    resources.  Metalink servers MUST support the Link header for listing
    mirrors and MUST support Instance Digests in HTTP [RFC3230].  Mirror
    and cryptographic hash information provided by the originating
    Metalink server MUST be considered authoritative.  Metalink servers

I have a problem with "MUST be considered authorative" when there's no 
clear explanation what "authorative". Is this really a conformance 

    and their associated mirror servers are RECOMMENDED to all share the

I think it's easier to read when you use SHOULD instead of RECOMMENDED 
(this applies to many more places).

    same ETag policy (ETag Synchronization), i.e. based on the file
    contents (cryptographic hash) and not server-unique filesystem
    metadata.  The emitted ETag MAY be implemented the same as the

As the term "Etag policy" is important, it might make sense to introduce 
it more formally.

    Instance Digest for simplicity.  Metalink servers MAY offer Metalink/
    XML documents that contain cryptographic hashes of parts of the file
    and other information.

I'd change "MAY" to "can" here and in many more places. Use MAY (and 
friends) only to explain conformance requirements.


    Metalink clients use the mirrors provided by a Metalink server with
    Link header [RFC5988].  Metalink clients MUST support HTTP and are

"with *a* link header"?


    A brief Metalink server response with two mirrors only:

    Link: <http://www2.example.com/example.ext>; rel="duplicate";
    pri=1; pref=1
    Link: <ftp://ftp.example.com/example.ext>; rel="duplicate";
    pri=2; geo="gb"; depth=1

    [[Some organizations have many mirrors.  Only send a few mirrors, or
    only use the Link header if Want-Digest is used?]]

RFC 5988 really doesn't say who can define extension parameters. It 
probably should. Mark?


    This is purely an expression of the server's preferences; it is up to
    the client what it does with this information, particularly with
    reference to how many servers to use at any one time.  A client MUST
    respect the server's priority ordering, however.

What does it mean to "respect" it? Why is this a MUST?


    Mirror servers MAY have a "geo" value, which is a [ISO3166-1] alpha-2

"Entries for a mirror server can have..."


    There are two types of mirror servers: preferred and normal.
    Preferred mirror servers are HTTP mirror servers that MUST share the
    same ETag policy as the originating Metalink server.  Preferred
    mirrors make it possible to detect early on, before data is
    transferred, if the file requested matches the desired file.

Note: that also could be achieved by introducing a new conditional 
header for the digest, or by using the extension points in the WebDAV 
"If" header.

    Preferred HTTP mirror servers have a "pref" value of 1.  By default,
    if unspecified then mirrors are considered "normal" and do not
    necessarily share the same ETag policy.  FTP mirrors, as they do not
    emit ETags, are considered "normal". ([draft-ietf-ftpext2-hash]
    allows for FTP mirrors to be coordinated and provide file hashes).

If you need only "1", then this parameter may not need a value at all 
(the grammar allows that).


    [[Suggestion: In order for clients to identify servers that have
    coordinated ETag policies, the ETag MUST begin with "Metalink:", e.g.

    ETag: "Metalink:SHA=thvDyvhfIqlvFe+A9MYgxAfm1q5="


I think this may be hard to deploy (changing etags as opposed to adding 


    Mirror servers MAY have a "depth" value, where "depth=0" is the
    default.  A value of 0 means ONLY that file is mirrored.  A value of
    1 means that file and all other files and subdirectories in the
    directory are mirrored.  A value of 2 means the directory above, and
    all files and subdirectories, are mirrored.  For each higher value,
    another directory closer to the root is mirrored.

This probably should be rephrased in terms of URI path segments.


6.  Cryptographic Hashes of Whole Files

    Metalink servers MUST provide Instance Digests in HTTP [RFC3230] for
    files they describe with mirrors.  Mirror servers SHOULD as well.

Thus, when the Digest is missing, the Link headers should be ignored? 


7.  Client / Server Multi-source Download Interaction

    Metalink clients begin a download with a standard HTTP [RFC2616] GET
    request to the Metalink server.  A Range limit is optional, not
    required.  Alternatively, Metalink clients can begin with a HEAD
    request to the Metalink server to discover mirrors via Link headers.

So returning them on HEAD is REQUIRED, right?

    Downloads from mirrors that do not have the same file size as the
    Metalink server MUST be aborted.

It this is a MUST it needs more details (closing the connection?). 
Otherwise simply state that such as response needs to be considered 
unusable and leave it to the client how to deal with it.

    Once the download has completed, the Metalink client MUST verify the
    cryptographic hash of the file.

And then do what if it fails?

    The size of chunks chosen by the client should be sufficiently large
    that the chunk request headers and reponse headers represent neglible
    overhead, and sufficiently large that they can be pipelined
    effectively without needing a very high rate of chunk requests.  At
    the same time, the amount of time wasted waiting for the last chunk
    to download from the last server after all the other servers have
    finished should be minimized.  Thus we currently recommend that a
    chunk size of at least 10KBytes should be used.  If the file being
    transfered is very large, or the download speed very high, this can
    be increased to perhaps 1MByte.  As network bandwidths increase, we
    expect these numbers to increase appropriately, so that the time to
    transfer a chunk remains significantly larger than the latency of
    requesting a chunk from a server.

Wow. This appears to ignore the overhead of Range requests on the 
*server*. Note that sometimes, content is not served directly from the 
filesystem, and implementing Range may not be possible using seeks. Now 
one could argue that servers suffering from the problem should not 
support this in the first place, but still...

Given the file sizes for which parallel downloads make any sense today, 
is it *really* a good idea to recommend 10K segments?

9.  IANA Considerations

    Accordingly, IANA has made the following registration to the Link
    Relation Type registry.


No, they haven't yet :-) Just state what they are supposed to do, the 
RFC Editor will rephrase this on publication anyway.


Best regards, Julian
Received on Sunday, 16 January 2011 19:38:37 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 1 March 2016 11:10:56 UTC